Step 2: Non-acidental Feature Transformations in the Lateral Occipital Complex (LOC)
Computational models of the early visual system have a long history [26]. Of particular interest are functional models of simple and complex cells in V1. Through single cell recordings, Pollen and others have shown that the outputs of cells in V1 can be directly modeled in terms of visual stimuli, combining the effects of retinal, LGN and V1 processing. Simple cell responses in V1 can be modeled as Gabor filters of the stimulus, parameterized by location, orientation, scale and phase. Complex cell responses combine the energy of Gabor filters across phases [28]. Following Pollen, this work models early vision as a bank of multi-scale Gabor filters. Our system computes an image pyramid from the input, convolves it with nonsymmetric even and odd Gabor filters at every 15º of orientation, and computes the resulting energy.

It should be noted that the responses of V1 cells can be modulated by portions of the stimulus outside their classically defined receptive fields [18, 35]. This conflicts with the model of complex cells as Gabor filters, but the first modulation effects do not occur until 80-120ms post stimulus. From ERP studies, it seems unlikely that contextual modulation effects appear soon enough to influence expert recognition. Although the early vision system processes the whole retinal image through a bank of Gabor filters, not all of this information is passed downstream to the ventral and dorsal systems. Instead, a portion of this data is selected by position (and possibly scale or frequency [24]) for further processing. Parkhurst, et al are able to show a positive correlation between human eye tracking and a bottom-up model of attention selection based on color, intensity and orientation. [27]. Maki et al present a model based on image flow, motion and stereo [21]. Unfortunately, the system described in this paper does not yet use a biological model of attention selection. Instead, it runs a corner detector over the image, and successively selects image patches around each corner. In the future, we hope to replace this with the attentional model in the Neuormorphic Vision Toolkit [12] (this is the system evaluated by Parkhurst, et al).