Conclusions
We implemented the system described in Section 4 and tested it on two domains: aerial images of Fort Hood, TX, and facial images of cats and dogs. For the cat and dog data (shown in Figure 3), the images were already small (64x64 pixels) and hand registered, so the selective attention mechanism was disabled. For the Fort Hood data, each source image is 1000x1000 pixels and contains approximately 10,000 corners (i.e. possible attention points). We randomly selected 100 points on each of four object types for further processing. Similarly, we randomly selected 400 attention points on another, non-overlapping image for testing. Figure 4 shows example attention windows for each type of object (two building styles, paved parking lots, and unpaved parking areas).

Our model of expert object recognition uses only unsupervised learning, so no object labels were provided during training. During testing, the system retrieves a cluster and stored image for every attention window. Since clusters do not correspond to semantic labels, the cluster response is not evaluated. A trial is a success if the retrieved instance match is of the same object type as the test window.

The biomimetic model clearly outperforms PCA, which is reassuring, since it uses PCA as its final step. It would have been disappointing if all the additional mechanisms failed to improve performance!

The more interesting question is why the system performs better. Figure 6 shows the results from a credit assignment experiment on the cat and dog data where system components isolated. In the baseline system, an image pyramid is computed for each image, and a single PCA is computed for pixels in the pyramid. In other words, the Gabor filters, non-accidental transforms and clustering have been disabled. (This is also the baseline for Figure 5.) We then reintroduced the Gabor filters, applying PCA to the energy values produced by the complex cell models. Performance does not improve, in fact it degrades, as shown in Figure 6. Next we reintroduced the Hough transform, so that PCA is applied to the Hough space. Performance improves markedly, approaching the best recognition rates for the system as a whole. This suggests that the LOC model is critical to overall system performance. It also calls into question the need for clustering, since recognition performance is essentially the same with or without it (see Figures 5 & 6). Further experiments confirm that clustering only marginally improves recognition rates when the number of subspace dimensions is large (see Figure 7). What clustering does is force the images stored in a subspace to be similar, allowing for more compression. As a result, peak recognition performance is reached with fewer subspace dimensions, as shown iconically at the bottom of Figure 7. Clustering therefore improves the system’s ability to compress visual memories.

The most surprising result so far from our model of expert object recognition is the performance of the Hough transform with PCA. Most appearance-based methods apply PCA to raw images or to the results of simple image operations (e.g. image differences). We observe a significant benefit, however, from applying PCA to the output of a Hough transform in two domains, even though cat and dog faces have few straight lines. We do not observe the same benefit when PCA is applied to the outputs of the Gabor filters. We hypothesize that the recognition rate increases because the Hough transform makes collinearity (a non-accidental property) explicit. We also observe that clustering to create localized PCA projections improves compression more than recognition. This may only be true for instance matching tasks; in classification tasks the PCA subspace represent an underlying class probability distribution, and Mahalanobis distances are meaningful. Localized PCA subspaces may therefore improve the recognition rate. In instance matching, however, clustering improves compression but not recognition.

Finally, our work suggests that the LOC needs to be studied more closely. The LOC determines the overall recognition rate for our computational model, yet we have less information about it than any other anatomical component of the system. We cannot even be sure that the results reported by Kourtzi and Kanwisher [17] and Biederman [1] apply in the special case of expert recognition. More studies are needed.