Computational models of the early visual system have a long history [26]. Of
particular interest are functional models of simple and complex cells in V1. Through
single cell recordings, Pollen and others have shown that the outputs of cells in V1
can be directly modeled in terms of visual stimuli, combining the effects of retinal,
LGN and V1 processing. Simple cell responses in V1 can be modeled as Gabor
filters of the stimulus, parameterized by location, orientation, scale and phase.
Complex cell responses combine the energy of Gabor filters across phases [28].
Following Pollen, this work models early vision as a bank of multi-scale Gabor filters.
Our system computes an image pyramid from the input, convolves it with nonsymmetric
even and odd Gabor filters at every 15º of orientation, and computes the
resulting energy.
It should be noted that the responses of V1 cells can be modulated by portions of
the stimulus outside their classically defined receptive fields [18, 35]. This conflicts
with the model of complex cells as Gabor filters, but the first modulation effects do
not occur until 80-120ms post stimulus. From ERP studies, it seems unlikely that
contextual modulation effects appear soon enough to influence expert recognition.
Although the early vision system processes the whole retinal image through a bank
of Gabor filters, not all of this information is passed downstream to the ventral and
dorsal systems. Instead, a portion of this data is selected by position (and possibly
scale or frequency [24]) for further processing. Parkhurst, et al are able to show a
positive correlation between human eye tracking and a bottom-up model of attention
selection based on color, intensity and orientation. [27]. Maki et al present a model
based on image flow, motion and stereo [21]. Unfortunately, the system described in
this paper does not yet use a biological model of attention selection. Instead, it runs a
corner detector over the image, and successively selects image patches around each
corner. In the future, we hope to replace this with the attentional model in the
Neuormorphic Vision Toolkit [12] (this is the system evaluated by Parkhurst, et al).
|
|