What draws people's attention and gaze when they view visual displays? How do they detect objects of interest amidst clutter?...
What draws people's attention and gaze when they view complex visual displays? What are the underlying neural computations that enable us to detect objects of interest amidst clutter? My research aims at developing a behavioral and computational understanding of how different factors like task demands, visual salience and reward value of objects compete with each other for controlling attention and gaze. I explore these questions using a mix of applied theory (Bayesian statistics, Signal Detection theory, Statistical Decision theory, models of visual saliency) and experiments (eye tracking, psychophysics).
Christof Koch (advisor,Caltech), Pietro Perona (advisor,Caltech), Antonio Rangel (Caltech), Wei Ji Ma (Baylor), Jeff Beck (Gatsby,London), Alex Pouget (U.Rochester), Riccardo Pedersini (Harvard), Todd Horowitz (Harvard), Jeremy Wolfe (Harvard), Mili Milosavljevic (Caltech), Laurent Itti (PhD advisor,USC)
V. Navalpakkam, C. Koch, P. Perona, Homo Economicus in Visual Search , In: Journal of Vision, 9(1):31, 1-16, 2009.
Abstract: How do reward outcomes affect early visual performance? Previous studies found a suboptimal influence, but they ignored the non-linearity in how subjects perceived the reward outcomes. In contrast, we find that when the non-linearity is accounted for, humans behave optimally and maximize expected reward. Our subjects were asked to detect the presence of a familiar target object in a cluttered scene. They were rewarded according to their performance. We systematically varied the target frequency and the reward/penalty policy for detecting/missing the targets. We find that 1) Decreasing the target frequency will decrease the detection rates, in accordance with the literature. 2) Contrary to previous studies, increasing the target detection rewards will compensate for target rarity and restore detection performance. 3) A quantitative model based on reward-maximization accurately predicts human detection behavior in all target frequency and reward conditions; thus, reward schemes can be designed to obtain desired detection rates for rare targets. 4) Subjects quickly learn the optimal decision strategy; we propose a neurally plausible model that exhibits the same properties. Potential applications include designing reward schemes to improve detection of life-critical, rare targets (e.g., cancers in medical images).
V. Navalpakkam, L. Itti, Search goal tunes visual features optimally , In: Neuron, Vol. 53, No. 4, pp. 605-617, Feb 2007.
Abstract: How does a visual search goal modulate the activity of neurons encoding different visual features (e.g., color, direction of motion)? Previous research suggests that goal-driven attention enhances the gain of neurons representing the target's visual features. Here, we present mathematical and behavioral evidence that this strategy is suboptimal and that humans do not deploy it. We formally derive the optimal feature gain modulation theory, which combines information from both the target and distracting clutter to maximize the relative salience of the target. We qualitatively validate the theory against existing electrophysiological and psychophysical literature. A surprising prediction is that it is sometimes optimal to enhance nontarget features. We provide experimental evidence toward this through psychophysics experiments on human subjects, thus suggesting that humans deploy the optimal gain modulation strategy.
Also see preview entitled Paying Attention to Neurons with Discriminating Taste by A. Pouget and D. Bavelier, In: Neuron 2007, Vol. 53, No. 4, pp. 473-475, Feb 2007.
Also see Faculty of 1000 Biology evaluation
V. Navalpakkam, L. Itti, An Integrated Model of Top-down and Bottom-up Attention for Optimal Object Detection , In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2006.
Abstract: Integration of goal-driven, top-down attention and image-driven, bottom-up attention is crucial for visual search. For instance, in robot navigation, it is important to detect goal-relevant targets like road signs and landmarks, and to simultaneously notice unexpected visual events like sudden obstacles and accidents. Yet, previous research has mostly focused on models that are purely top-down or bottom-up. Here, we propose a new model that combines both. The bottom-up component computes the visual salience of scene locations in different feature maps extracted at multiple spatial scales. The top-down component uses accumulated statistical knowledge of the visual features of the desired search target and background clutter, to optimally tune the bottom-up maps so as to maximize target detection speed. The results of testing on 600 artificial search arrays and 300 natural scenes show that the model's predictions are consistent with a large body of available literature on human psychophysics of visual search. The promising results suggest that our model may provide good approximation to how humans combine bottom-up and top-down cues such as to optimize visual search behavior.
V. Navalpakkam, L. Itti, Top-down Attention Selection is Fine-grained, In: Journal of Vision, Vol. 6, No. 11, pp. 1180-1193, Oct 2006.
Abstract: Although much is known about the sources and modulatory effects of top-down attentional signals, the information capacity of these signals is less known. Here, we investigate the granularity of top-down attentional signals. Previous theories in psychophysics have provided conflicting evidence on whether top-down guidance is coarse grained (i.e., one gain control term per feature dimension) or fine grained (i.e., multiple gain control terms per dimension). We resolve the conflict by designing new experiments that disentangle top-down from bottom-up contributions, thereby avoiding confounds existing in previous studies. The results of our eye-tracking experiments show that subjects can selectively saccade to items belonging to the relevant feature interval compared with irrelevant intervals within a dimension. This suggests that top-down signals can specify not only the relevant feature dimension but also the relevant feature interval within a dimension. We conclude that top-down signals are fine grained and can specify multiple gain control terms per dimension.
V. Navalpakkam, L. Itti, Optimal cue selection strategy, In: Neural Information Processing Systems (NIPS), 2005.
V. Navalpakkam, M. A. Arbib, L. Itti, Attention and Scene Understanding, In: Neurobiology of Attention, (L. Itti, G. Rees, J. K. Tsotsos Ed.), pp. 197-203, San Diego, CA:Elsevier, 2005.
V. Navalpakkam, L. Itti, Modeling the influence of task on attention, Vision Research, Vol. 45, No. 2, pp. 205-231, 2005. (TOP cited paper in Vision Research since 2005)
Abstract: We propose a computational model for the task-specific guidance of visual attention in real-world scenes. Our model emphasizes four aspects that are important in biological vision: determining task-relevance of an entity, biasing attention for the low-level visual features of desired targets, recognizing these targets using the same low-level features, and incrementally building a visual map of task-relevance at every scene location. Given a task definition in the form of keywords, the model first determines and stores the task-relevant entities in working memory, using prior knowledge stored in long-term memory. It attempts to detect the most relevant entity by biasing its visual attention system with the entitys learned low-level features. It attends to the most salient location in the scene, and attempts to recognize the attended object through hierarchical matching against object representations stored in longterm memory. It updates its working memory with the task-relevance of the recognized entity and updates a topographic taskrelevance map with the location and relevance of the recognized entity. The model is tested on three types of tasks: single-target detection in 343 natural and synthetic images, where biasing for the target accelerates target detection over twofold on average; sequential multiple-target detection in 28 natural images, where biasing, recognition, working memory and long term memory contribute to rapidly finding all targets; and learning a map of likely locations of cars from a video clip filmed while driving on a highway. The models performance on search for single features and feature conjunctions is consistent with existing psychophysical data. These results of our biologically-motivated architecture suggest that the model may provide a reasonable approximation to many brain processes involved in complex task-driven visual behaviors.
V. Navalpakkam, L. Itti, A Goal Oriented Attention Guidance Model, Lecture Notes in Computer Science, Vol. 2525, pp. 453-461, Nov 2002.
V.Navalpakkam, C.Koch, A.Rangel & P.Perona, How do stimulus reward and feature-contrast combine to affect saccadic decisions?
V.Navalpakkam, C.Koch, P.Perona & A.Rangel, What are the mechanisms by which reward affects gaze?
W.J.Ma*, V.Navalpakkam*, J.Beck* & A.Pouget, A neural theory of Bayesian Visual Search (*equal contribution).
R.Pedersini*, V.Navalpakkam*, T.Horowitz & J.Wolfe, Value maximization explains and cures the prevalence effect in visual search (*equal contribution).