Supplementary information for
Is attention useful for object recognition?
Rutishauser U., Walther D., Koch C., Perona P., 2004.
IEEE Conference on Computer Vision and Pattern Recognition 2004 (CVPR 2004)
, vol2, pp37-44, IEEE press.
(pdf)
(pdf of supplement)
Please download the supplement PDF for a printable version of all the
following figures.
Object recognition in digital photographs
Supp Fig1:
Additional example of object matching (ref to section 4.1 of the paper). The example
shows the image used for learning (left side) and for recognition (right side) as well as the two
matches established. 10 attentional fixations were used for learning and 15 fixations for recognition.
The original resolution used for processing of the images is 1024x1536 pixels.
Supp Fig2:
The 15 most salient patches of one indoor image shown merged as a cumulative image. The background (not selected parts of image) is shown slighlty to provide context.
Supp Fig3:
The 15 most salient patches of one indoor image shown merged as a cumulative image. The background (not selected
parts of image) is shown slighlty to provide context.
The robot video:
Supp Fig4:
An example from the supplementary video provided. In the upper part the original 320x240 frame as recorded by the
camera of the robot is shown and in the lower part the 3 most salient patches as used for learning and
recognition are shown.
There are two versions of the robot video: one with the 3 most salient patches marked (with attention) and one
with 3 random patches. Both versions are as they are used in the experiments described in the paper.
Video of object extraction with attention (files are ~20MB each):
MPEG4 encoded, MPEG1 encoded.
Video of object extraction without attention (random patches, files are ~20MB each):
MPEG4 encoded, MPEG1 encoded.
The video is 5fps and was recorded by an autonomously moving robot not controlled by a human. 3 objects were extracted from every frame. These are shown
in the lower half of the video. Resolution is 340x280 (original resolution of webcam, no rescaling) and stored/processed as ppm (uncompressed, no loss).
Results and explanation: see paper.
03/16/04 -- Ueli Rutishauser