Qi Zhao, Ph.D.

Postdoctoral Fellow, Koch Lab
California Institute of Technology

Advisor: Dr. Christof Koch

Email: qzhao AT klab DOT caltech DOT edu
Phone: (626)395-8964
Address: Caltech, MC 216-76
Pasadena, CA 91125

[Home]  [Research]  [Publications]  [Talks]  [Teaching]  [Misc]  [Curriculum Vitae]


Research Interests

Computational Visual Cognition; Neuromorphic Visual Models and Systems; Computer Vision and Statistical Learning; Computational Neuroscience

 


Selected Publications

(for a full list, click here)

1. JoV 2011 Qi Zhao and Christof Koch, "Learning a Saliency Map Using Fixated Locations in Natural Scenes," in Journal of Vision (JoV),Volume 11, Issue 3, Article 9, Pages 1-15, 2011.

2. T-PAMI 2010 Qi Zhao, Zhi Yang and Hai Tao, "Differential Earth Mover's Distance with Its Applications to Visual Tracking," in IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), Volume 32, Issue 2, Pages 274-287, February 2010. [pdf]

3. NIPS 2010 Zhi Yang, Qi Zhao, Edward Keefer and Wentai Liu, "Noise Characterization, Modeling, and Reduction for In Vivo Neural Recording," Advances in Neural Information Processing Systems (NIPS 22), Pages 2160-2168, Vancouver, B.C., Canada, 2010. [pdf]

4. Neurocomputing 2009 Zhi Yang*, Qi Zhao* and Wentai Liu, "Neural Signal Classification Using a Simplified Feature Set with Energy Based Nonparametric Clustering," in Neurocomputing, Volume 73, Issue 1-3, Pages 412-422, December 2009. *Equal authorship. [pdf]

5. CVIU 2009 Qi Zhao and Hai Tao, "A Motion Observable Representation Using Color Correlogram and Its Applications to Visual Tracking," in Computer Vision and Image Understanding (CVIU), Volume 113, Issue 2, Pages 273-290, February 2009. [pdf]

6. NIPS 2009 Zhi Yang, Qi Zhao and Wentai Liu, "Spike Feature Extraction Using Informative Samples," Advances in Neural Information Processing Systems (NIPS 21 Poster Spotlight Presentation), Pages 1865-1872, Vancouver, B.C., Canada, 2009. [pdf]

7. ICCV 2007 Qi Zhao, Shane Brennan and Hai Tao, "Differential EMD Tracking," in IEEE International Conference on Computer Vision (ICCV Oral Presentation), Rio de Janeiro, Brazil, October 2007. [pdf]

8. ICCV 2007 Feng Tang, Shane Brennan, Qi Zhao and Hai Tao, "Co-Tracking Using Semi-Supervised Support Vector Machines," in IEEE International Conference on Computer Vision (ICCV) , Rio de Janeiro, Brazil, October 2007. [pdf]

 


Recent Projects

The Role of Attention in Adapting to Ensembles. It has been shown that visual information in the natural environment can be represented at multiple levels of abstractions, from local details of each element to statistical properties of ensembles. While vision researchers have recently begun to study properties of the visual processing of an ensemble at the statistical level, we do not know whether we adapt to any statistical properties of visual scenes. The prevalence of adaptation in our daily lives and its importance in machine vision applications motivate the question: what is encoded as we adapt to ensembles in the complex natural scenes? Further, attention generally seems to shift processing resources from global to local visual processing. Consequently, what is the role of attention in adaptation to the statistical properties? In this work, we study the influence of attention on the tilt aftereffect induced from an ensemble of oriented Gabor elements. We found that reduced attention facilitates adaptation at an ensemble level. Further experiments suggest a dissociation of attention and consciousness in the adaptation process.

Learning from Bottom-Up Attention Allocation in Natural Scenes. Inspired by the primate visual system, computational saliency models decompose visual input into a set of feature maps across spatial scales. In the standard approach, the feature maps of the pre-specified channels are summed to yield the final saliency map. This work investigates several issues using three recent eye tracking datasets. We learn that certain channels are more predictive than others, and therefore a weighted combination of features improves the ability of the saliency algorithm to predict where people will look. When top-down information is available, such bottom-up task-independent weights serve as prior information and can be combined with top-down knowledge to infer task-specific optimal weights. Furthermore, we demonstrate that nonlinear integration further improves model performance. This raises the question of the extent to which the primate brain takes advantages of such nonlinear integration strategies. Experiments also show that inter-subject differences in feature predictability are negligible. In addition, we model the central fixation bias as a Gaussian process, and derive theoretical bases that justify the approximation of a single kernel to the dynamic Gaussian process. Lastly, the inadequacy of the standard method to judge performance (area under the ROC curve) is discussed, and we show that using three complementary measures provides a comprehensive assessment metric for performance evaluation of saliency models.

Differential Earth Mover's Distance Matching. The Earth Mover's Distance (EMD) is a similarity measure that captures perceptual difference between two distributions. Its computational complexity, however, prevents a direct use in many applications. This work proposes a novel Differential EMD (DEMD) algorithm based on the sensitivity analysis of the simplex method, and offers a speedup at orders of magnitude compared with its brute force counterparts. The DEMD algorithm is discussed and empirically verified in the visual tracking context. The deformations of the distributions for objects at different time instances are accommodated well by the EMD, and the differential algorithm makes the use of EMD in real-time tracking possible. To further reduce the computation, signatures, i.e., variable-size descriptions of distributions, are employed as an object representation. The new algorithm models and estimates local background scenes as well as foreground objects to handle scale changes in a principled way.

Evolving Mean Shift Clustering. This work presents a novel nonparametric clustering algorithm called evolving mean shift (EMS) algorithm. The algorithm iteratively shrinks a dataset and generates well formed clusters in just a couple of iterations. An energy function is defined to characterize the compactness of a dataset and we prove that the energy converges to zero at an exponential rate. The single but critical user parameter, i.e., the bandwidth (also referred to as scale), of the mean shift clustering family is adaptively updated to accommodate the evolving data density and alleviate the contradiction between global and local features. The algorithm has been applied and tested with image segmentation and neural spike sorting, where the improved accuracy can be obtained at a much faster performance, as demonstrated both qualitatively and quantitatively.

Neural Signal Feature Extraction. This work is co-developed through collaborations with the Integrated BioElectronics Research Lab. Most neurons in the brain transfer information by action potentials which can be recorded with microelectrodes. It is very likely that a single electrode records action potentials from several adjacent neurons and thus further signal processing to separate activities of individual neurons is required. We present a new spike feature extraction algorithm that targets real-time spike sorting and facilitates miniaturized microchip implementation. The proposed theoretical framework includes neuronal geometry signatures, noise shaping, and informative sample selection. The new algorithm has been evaluated on synthesized waveforms and experimentally recorded sequences. When compared with many spike sorting approaches our algorithm demonstrates improved speed, accuracy and allows unsupervised execution. A preliminary integrated circuit implementation of the algorithm has been realized and tested.

A Motion Observable Representation Using Color Correlogram. This work presents a special form of color correlogram as representation for object tracking and carries out a motion observability analysis to obtain the optimal correlogram in a kernel based tracking framework. Compared with the color histogram, where the position information of each pixel is ignored, a simplified color correlogram (SCC) representation encodes the spatial information explicitly and enables an estimation algorithm to recover the object orientation. In this paper, based on the SCC representation, the mean shift algorithm is developed in a translation–rotation joint domain to track the positions and orientations of objects. The ability of the SCC in detecting and estimating object motion is analyzed and a principled way to obtain the optimal SCC as object representation is proposed to ensure reliable tracking.

Robust Face Tracking in Real-World Videos. The algorithm is designed and implemented during my 3-month internship (mentors: Sanjiv Kumar and Henry Rowley) at Google Research, NYC in the summer of 2008. The objective of the project is to track faces in large-scale real-world videos with applications to event recognition and face sequence indexing from YouTube data. The main novelty of the method is the use of importance sampling technique to incorporate independent tracking modules to the particle filtering framework in a principled manner. The new algorithm naturally combines the merits of both face-specific and generic trackers while achieving a speedup at tens of times compared with its particle filtering based counterpart.

Real-Time Tracking Using Camera Combo for Remote Collaboration. I participated this project during my 3-month internship (manager: Zhengyou Zhang; mentor: Cha Zhang) at Microsoft Research, Redmond, WA in the summer of 2007. The goal of the project is for personal remote collaboration. A camera combo with one fisheye camera and one Pan-Tilt-Zoom (PTZ) camera is used to capture general objects of interests. The fisheye camera has a wide field of view, and the PTZ camera can pan, tilt and zoom based on analysis of the images captured by the wide angle camera. At the core of the system is a semantic saliency map that overcomes many limitations of low-level saliency maps computed from preliminary image features. The map is used for PTZ camera control with a novel information loss optimization based virtual director. The effectiveness of the proposed method is demonstrated with real-world sequences.

Part based Human Tracking in a Multiple Cue Fusion Framework. This project is developed during my 3-month internship (mentors: Jinman Kang and Wei Hua) at Vidient, Inc., Sunnyvale, CA in the summer of 2005. The objective of this project is for a real time video surveillance system to handle various challenging issues in multiple human tracking such as occlusions, sharp motion changes and multi-person confusions. Toward this goal, we propose to intelligently fuse multiple cues, which include human body decomposition results based on a head detector, color information, and motion information, etc. Part based methods are adopted to provide a second-level information fusion in that parts with bad observability can be compensated by tracking other more visible ones.

 


Past Projects

Ink Cleanup. I was working on this "Digital Ink Cleanup" project during my 4-month internship (mentor: Zhouchen Lin) at Microsoft Research Asia at Beijing, China in 2003. The Digital Ink technology is one of the novelties in Tablet PC. It enables people to truly "write" on computers. We analyzed user inputs and designed algorithms to remove redundant strokes and clarify input words prior to processing by digital ink recognizers. The system is effective in cleaning the ink note as well as increasing the recognition rate.

Texture Mapping on Talking Faces for Portable Devices. This work is part of the Chinese National Science Foundation project "Real-time 3-D Reconstruction of Speech-Driven Expression Animation". I proposed a method using one single frontal view face image for efficient texture mapping on talking faces. The algorithm does not require exact match between the model and the presented texture. Satisfactory mapping can be achieved by interactive adjustment scheme, where users define correspondence for feature locations through editing the key points and their influence regions. Efficiency and realism are well balanced using the new method.

Efficient Belief Propagation for Image Restoration. The Markov Random Field (MRF) theory provides a consistent way for modeling context dependent entities such as image pixels. Trying to solve the image restoration problem in the MRF framework is an optimization problem that is NP hard, and approximation techniques like the belief propagation methods are proposed. The problem of the belief propagation is its inefficiency. In this project, I implemented the efficient belief propagation method proposed by Felzenszwalb and Huttenlocher, applying it to additive noise removal and image inpainting. Further, other methods for additive noise removal like the total variation based, the bilateral based and the mean shift based methods are studied and compared with the efficient belief propagation based one.

Efficient Multiple Object Trajectory Tracking. Most tracking algorithms are based on the maximum a posteriori (MAP) solution of a probabilistic framework called Hidden Markov Model, where the distribution of the object state at current time instance is estimated based on current and previous observations. However, this approach is prone to errors caused by temporal distractions such as occlusion, background clutter and multi-object confusion. Trajectory tracking algorithms seek the optimal state sequence which maximizes the joint state-observation probability. In this research topic, we proposed a probabilistic framework where the trajectory tracking is more mathematically sound. Recovery mechanism is incorporated for this purpose, which prevents the algorithm from being stuck at local maximum. Efforts are also put into improving the efficiency of trajectory tracking by using a hierarchical scheme and other techniques including hypothesis pruning and backward checking.

 


Interesing Staff

The Quest for Consciousness The Game of Go (Wei Qi), US Chess Federation