Together, the early vision system and the LOC form a feature extraction subsystem,
with the early vision system computing Gabor features and the LOC transforming
them into non-accidental feature vectors, as shown in Figure 1. Similarly, the
fusiform gyrus and right inferior frontal gyrus combine to form a feature-based
appearance matching subsystem.
The appearance-based matching system is divided into two components: an
unsupervised clustering system and a subspace projection system. This is motivated
by the psychological observation that categorical and instance level recognition
cannot be disassociated, and the mathematical observation that subspace projection
methods exploit the commonality among images to compress data. If the images are
too diverse, for example pictures of faces, pets, and chairs, then there is no
commonality for the subspaces to exploit.
To avoid this, we model the fusiform gyrus as an unsupervised clustering system,
and the right inferior frontal gyrus as a subspace matching system. This anatomical
mapping is partly for simplicity; the exact functional division between these structures
is not clear. Lesion studies associate the right inferior frontal lobe with visual
memory [20], and rTMS and PET data suggest that these memories are compressed
images [16]. Since compressed memories are stored in the frontal gyrus, it is easy to
imagine that they are matched there as well, perhaps using an associative network. At
the same time, clustering is the first step that is unique to expert recognition and the
fusiform gyrus is the first anatomically unique structure on the expert pathway, so it
makes sense to associate clustering with the fusiform gyrus. Where images are
projected into cluster-specific subspaces is not clear however; it could be in either
location, or both.
It is important to note that the categories learned by the clustering mechanism in
the fusiform gyrus are non-linguistic. The images in a cluster do not need to be of the
same object type or viewpoint, nor do all images of one object need to appear in one
cluster. Clustering simply divides the training data into small groups of similar
samples, so that PCA can fit a unique subspace to each group. This is similar to the
localized subspace projection models in [7, 13]. We have implemented K-Means and
an EM algorithm for mixtures of PCA analyzers similar to [32]. Surprisingly, so far
we get the best results by using K-Means and overestimating the number of clusters
K, possibly because non-symmetric Gaussians can be estimated by collections of
symmetric ones.
|
|