There is a chicken-egg dilemma on object detection and recognition: before an object is IDENTIFIED, it must be DETECTED; while to DETECT an object, we must develop a system to IDENTIFY it.
To crack this problem, I prefer a two-stage architecture proposed by Rensink, and implemented by Walther. During the first stage, "perceptual blobs", or "proto-objects" are detected by plain detection algorithms. Then identification mechanisms work directly on these proto-objects.
This architecture suggests two things:
- In order to mimic this early stage visual processing, we may not resort to the information that would be available only in late stages.
- The result of the first stage may be crude, in some cases the proto-objects are not actual objects. However, if we expect otherwise, we are asking the detecting system to IDENTIFY - which is obviously unattainable given the computational constrains.
The incentive for Spectral Residual is plain and straightfoward: I am composing an early stage attention model, so this model must be as simple as possible, free of training or hand-lebeling, and most important, no parameter tuning.
Given these extremely tough constrains, I failed to find a common property shared by different targets, but there is one property for backgrounds in spectral domain. By eliminating the homogenous background, the "residual" parts are detected as proto-objects.
Spatial local cues are primary information used in nowadays model. But what makes my spectral residual approach unique, is the SPECTRAL representation. In the field of vision, the spectral representation recieves much less attention than it deserves. As far as I know, the most renowned utilization of the Fourier spectrum is Oliva's Gist of the Scene.
The oblivion of the Fourier spectrum may be due to a long held idea that information behind the amplitude spectrum is only trivial (see Chapter 7 of Computer Vision by Forsyth for reference). But as demonstrated by Spectral Residual, we believe there is gold behind a mess of the Fourier spectrum.