Computational principles for an autonomous active vision system
Sherbakov, Lena Oleg
MetadataShow full item record
Vision research has uncovered computational principles that generalize across species and brain area. However, these biological mechanisms are not frequently implemented in computer vision algorithms. In this thesis, models suitable for application in computer vision were developed to address the benefits of two biologically-inspired computational principles: multi-scale sampling and active, space-variant, vision. The first model investigated the role of multi-scale sampling in motion integration. It is known that receptive fields of different spatial and temporal scales exist in the visual cortex; however, models addressing how this basic principle is exploited by species are sparse and do not adequately explain the data. The developed model showed that the solution to a classical problem in motion integration, the aperture problem, can be reframed as an emergent property of multi-scale sampling facilitated by fast, parallel, bi-directional connections at different spatial resolutions. Humans and most other mammals actively move their eyes to sample a scene (active vision); moreover, the resolution of detail in this sampling process is not uniform across spatial locations (space-variant). It is known that these eye-movements are not simply guided by image saliency, but are also influenced by factors such as spatial attention, scene layout, and task-relevance. However, it is seldom questioned how previous eye movements shape how one learns and recognizes an object in a continuously-learning system. To explore this question, a model (CogEye) was developed that integrates active, space-variant sampling with eye-movement selection (the where visual stream), and object recognition (the what visual stream). The model hypothesizes that a signal from the recognition system helps the where stream select fixation locations that best disambiguate object identity between competing alternatives. The third study used eye-tracking coupled with an object disambiguation psychophysics experiment to validate the second model, CogEye. While humans outperformed the model in recognition accuracy, when the model used information from the recognition pathway to help select future fixations, it was more similar to human eye movement patterns than when the model relied on image saliency alone. Taken together these results show that computational principles in the mammalian visual system can be used to improve computer vision models.