Face Detection with SpikeNET

SpikeNET is a simulator designed for modeling large populations of asynchronously spiking neurons. Using this home- programmed software, we were able to build a range of networks that localize faces in natural images with remarkably good performance.

These models are based on a Rank Order Coding scheme, where the order of spiking over a population of neurons constitutes the code.



The model

Below is an illustration of the architecture used in VanRullen et al, 1998. (BioSystems, in press).The input image is first decomposed into positive and negative contrasts by 2 maps of ON- and OFF-Center cells. At this level, the input contrast intensity is transformed in a firing latency for all the input cells. The relative latencies are decomposed in the next layer by cells selective to an edge at a particular orientation, just like the simple cells in V1. Here only 4 different orientations are shown, whereas the network actually uses 8 such maps. Cells at the next level were trained to respond selectively to the pattern of firing characteristic of facial features, like the eyes or the mouth. Finally, cells in the last map combine these informations and will respond only if the features are simultaneously present at the right locations.

Result of the propagation of an image through the network. Each pixel in the maps displayed corresponds to a neuron. Dark pixels are neurons which have remained silent. Bright pixels are neurons that have emitted a spike, with the grey level corresponding to the latency of the spike (the sooner the brighter). See text above for further details.


Training phase

We used a non-iterative supervised learning procedure. Training has been made using 270 front (+- 30 deg) views of 27 persons (10 views each). Each image was propagated through the network and the patterns obtained in the orientation layer around the right, left eyes and mouth were then averaged, leading to a set of weights for each feature to be learned. The weights obtained are shown below.

The 3 feature-detection maps were then connected to the face-detection map using simple gaussian patterns, leading to a "structural description" of the face in terms of its component features. The connexions used are shown below.



Results

Parameters of the model were first optimized with half of the images (135) used for learning. The model was then tested on 4 different databases. Database 1 is the second half of the learning set. Database 2 is a difficult test set, composed of 130 images of people wearing glasses (88%) or a beard (31%). These first two databases were obtained from the Olivetti Research Laboratory. Database 3 was obtained from a different source (University of Bern) and therefore images in this base have different lighting conditions than in the learning set. Database 4 contains no face images, and was used to determine the error rate of the model.

Results are shown in the table below.

Detection Map

Test DataBase

Mouth Right Eye Left Eye Face
DataBase 1 (135 images) 92.3%(18) 97.8%(97) 95.6%(100) 96.3%(2)
DataBase 2 (130 images) 88.5%(27) 83.1%(88) 80%(87) 73.1%(4)
DataBase 3 (300 images) 91%(89) 92.7%(222) 75%(198) 94%(4)
DataBase 4 (216 images) ---(14) ---(13) ---(12) ---(1)




      mail me


To My Home Page