We implemented the system described in Section 4 and tested it on two domains:
aerial images of Fort Hood, TX, and facial images of cats and dogs. For the cat and
dog data (shown in Figure 3), the images were already small (64x64 pixels) and hand
registered, so the selective attention mechanism was disabled. For the Fort Hood
data, each source image is 1000x1000 pixels and contains approximately 10,000
corners (i.e. possible attention points). We randomly selected 100 points on each of
four object types for further processing. Similarly, we randomly selected 400
attention points on another, non-overlapping image for testing. Figure 4 shows
example attention windows for each type of object (two building styles, paved
parking lots, and unpaved parking areas).
Our model of expert object recognition uses only unsupervised learning, so no
object labels were provided during training. During testing, the system retrieves a
cluster and stored image for every attention window. Since clusters do not
correspond to semantic labels, the cluster response is not evaluated. A trial is a
success if the retrieved instance match is of the same object type as the test window.
The biomimetic model clearly outperforms PCA, which is
reassuring, since it uses PCA as its final step. It would have been disappointing if all
the additional mechanisms failed to improve performance!
The more interesting question is why the system performs better. Figure 6 shows
the results from a credit assignment experiment on the cat and dog data where system
components isolated. In the baseline system, an image pyramid is computed for each
image, and a single PCA is computed for pixels in the pyramid. In other words, the
Gabor filters, non-accidental transforms and clustering have been disabled. (This is
also the baseline for Figure 5.) We then reintroduced the Gabor filters, applying PCA
to the energy values produced by the complex cell models. Performance does not
improve, in fact it degrades, as shown in Figure 6. Next we reintroduced the Hough
transform, so that PCA is applied to the Hough space. Performance improves
markedly, approaching the best recognition rates for the system as a whole. This
suggests that the LOC model is critical to overall system performance. It also calls
into question the need for clustering, since recognition performance is essentially the
same with or without it (see Figures 5 & 6).
Further experiments confirm that clustering only marginally improves recognition
rates when the number of subspace dimensions is large (see Figure 7). What
clustering does is force the images stored in a subspace to be similar, allowing for
more compression. As a result, peak recognition performance is reached with fewer
subspace dimensions, as shown iconically at the bottom of Figure 7. Clustering
therefore improves the system’s ability to compress visual memories.
The most surprising result so far from our model of expert object recognition is the
performance of the Hough transform with PCA. Most appearance-based methods
apply PCA to raw images or to the results of simple image operations (e.g. image
differences). We observe a significant benefit, however, from applying PCA to the
output of a Hough transform in two domains, even though cat and dog faces have few
straight lines. We do not observe the same benefit when PCA is applied to the
outputs of the Gabor filters. We hypothesize that the recognition rate increases
because the Hough transform makes collinearity (a non-accidental property) explicit.
We also observe that clustering to create localized PCA projections improves
compression more than recognition. This may only be true for instance matching
tasks; in classification tasks the PCA subspace represent an underlying class
probability distribution, and Mahalanobis distances are meaningful. Localized PCA
subspaces may therefore improve the recognition rate. In instance matching,
however, clustering improves compression but not recognition.
Finally, our work suggests that the LOC needs to be studied more closely. The
LOC determines the overall recognition rate for our computational model, yet we
have less information about it than any other anatomical component of the system.
We cannot even be sure that the results reported by Kourtzi and Kanwisher [17] and
Biederman [1] apply in the special case of expert recognition. More studies are
needed.
|
|