Face Detection Algorithm Report
Face Detection Algorithm Report
The face is our primary focus of attention in social life playing an important role in conveying identity
and emotions. We can recognize a number of faces learned throughout our lifespan and identify faces
at a glance even after years of separation. This skill is quite robust despite of large variations in visual
stimulus due to changing condition, aging and distractions such as beard, glasses or changes in
hairstyle.
Computational models of face recognition are interesting because they can contribute not only to
theoretical knowledge but also to practical applications. Computers that detect and recognize faces
could be applied to a wide variety of tasks including criminal identification, security system, image
and film processing, identity verification, tagging purposes and human-computer interaction.
Unfortunately, developing a computational model of face detection and recognition is quite difficult
because faces are complex, multidimensional and meaningful visual stimuli.
Face detection is used in many places now a days especially the websites hosting images like picassa,
photobucket and facebook. The automatically tagging feature adds a new dimension to sharing
pictures among the people who are in the picture and also gives the idea to other people about who the
person is in the image. In our project, we have studied and implemented a pretty simple but very
effective face detection algorithm which takes human skin colour into account.
Our aim, which we believe we have reached, was to develop a method of face recognition that is fast,
robust, reasonably simple and accurate with a relatively simple and easy to understand algorithms and
techniques. The examples provided in this thesis are real-time and taken from our own surroundings.
Litrature survey
Face detection techniques have been researched for years and much progress has been
proposed in literature. Most of the face detection methods focus on detecting frontal faces with good
lighting conditions. These methods can be categorized into four types: knowledge-based, feature
invariant, template matching and appearance-based.
Knowledge-based methods use human-coded rules to model facial features, such as two
symmetric eyes, a nose in the middle and a mouth underneath the nose.
Feature invariant methods try to find facial features which are invariant to pose, lighting
condition or rotation. Skin colors, edges and shapes fall into this category.
Template matching methods calculate the correlation between a test image and pre-selected
facial templates.
An approach used to detect objects in general, applicable to human faces as well was
presented by Viola-Jones. This method proved to detect objects extremely rapidly and is comparable
to the best real time face detection systems. Viola and Jones (2004)[2] presented in their research a
new image representation called Integral Image which allows fast calculation of image features to be
used by their detection algorithm. The second step is an algorithm based on AdaBoost which is
trained against the relevant object class to select a minimal set of features to represent the object.
Viola and Jones used features extracted from the training set and AdaBoost algorithm to select the
best feature set and constructing the final classifier which comprises few stages. Each stage consists
of few simple weak classifiers that work together to form a stronger classifier filtering out the
majority of false detections at early stages and producing an adequate final face detector.
Several algorithms are used for face recognition. Some of the popular methods are discussed
here. Face recognition by feature matching is one such method .We have to locate points in the face
image with high information content. We don’t have to consider the face contour or the hair. We have
to concentrate on the center of the face area, as most stable and informative features are found there.
The high informative points in the face are considered around eyes, nose and mouth. To enforce this
we apply Gaussian weighting to the center of the face.
The simplest template-matching approaches represent a whole face using a single template,
i.e., a 2-D array of intensity, which is usually an edge map of the original face image. In a more
complex way of template-matching, multiple templates may be used for each face to account for
recognition from different viewpoints. Another important variation is to employ a set of smaller facial
feature templates that correspond to eyes, nose, and mouth, for a single viewpoint. The most attractive
advantage of template-matching is the simplicity, however, it suffers from large memory requirement
and inefficient matching. In feature-based approaches, geometric features, such as position and width
of eyes, nose, and mouth, eyebrow's thickness and arches, face breadth, or invariant moments, are
extracted to represent a face. Feature-based approaches have smaller memory requirement and a
higher recognition speed than template-based ones do. They are particularly useful for face scale
normalization and 3D head model-based pose estimation. However, perfect extraction of features is
shown to be difficult in implementation [5]. The idea of appearance-based approaches is to project
face images onto a linear subspace of low dimensions. Such a subspace is first constructed by
principal component analysis on a set of training images, with eigenfaces as its eigenvectors. Later,
the concept of eigenfaces were extended to eigenfeatures, such as eigeneyes, eigenmouth, etc. for the
detection of facial features [6]. More recently, fisherface space [7] and illumination subspace [1] have
been proposed for dealing with recognition under varying illumination.
4.6 Viola-Jones object detection framework:
Paul Viola and Michael Jones presented a fast and robust method for face detection which is 15
times quicker than any technique at the time of release with 95% accuracy at around 17 fps. This work
has three key contributions:
For their face detection framework Viola and Jones decided to use simple features based on
pixel intensities rather than to use pixels directly. They motivated this choice by two main factors:
Features can encode ad-hoc domain knowledge, which otherwise would
be difficult to learn using limited training data.
Features-based system operates must faster than a pixel-based system.
They have defined three kinds of Haar-like rectangle features :
Two-rectangle feature was defined as a difference between the sum of
the pixels within two adjacent regions (vertical or horizontal),
Three-rectangle feature was defined as a difference between two outside rectangles and
an inner rectangle between then,
Four-rectangle feature was defined as a difference between diagonal pairs of rectangles.
Figure 4.6.1: Rectangle features example: (A) and (B) show two-rectangle features,
(C) shows three-rectangle feature, and (D) shows four-rectangle feature.
4.6.2.Integral Image:
Maintaining a cumulative row sum at each location x; y, the integral image can be computed
in a single pass over the original image. Once it is computed, rectangle features can be calculated
using only a few accesses to it (see Figure 4.5.2.2):
i. Two-rectangle features require 6 array references,
ii. Three-rectangle features require 8 array references, and
iii. Four-rectangle features require 9 array references.
Figure 4.6.2.2: Calculation example. The sum of the pixels within rectangle D can be
computed as 4 + 1 - (2 + 3), where 1-4 are values of the integral image.
The authors defined the base resolution of the detector to be 24x24. In other words, every
image frame should be divided into 24x24 sub-windows, and features are extracted at all possible
locations and scales for each such sub-window. This results in an exhaustive set of rectangle features
which counts more than 160,000 features for a single sub-window.
The AdaBoost algorithm was introduced in 1995 by Freund and Schapire. The complete set of
features is quite large - 160,000 features per a single 24x24 sub-window. Though computing a single
feature can be done with only a few simple operations, evaluating the entire set of features is still
extremely expensive, and cannot be performed by a real-time application.
In its original form, AdaBoost is used to improve classification results of a learning algorithm
by combining a collection of weak classifiers to form a strong classifier. The algorithm starts with
equal weights for all examples. In each round, the weight are updated so that the misclassified
examples receive more weight.
By drawing an analogy between weak classifiers and features, Viola and Jones decided to use
AdaBoost algorithm for aggressive selection of a small number of good features, which nevertheless
have significant variety.
Practically, the weak learning algorithm was restricted to the set of classification functions,
which of each was dependent on a single feature. A weak classifier h(x; f; p;θ ) was then defined for a
sample x (i.e. 24x24 sub-window) by a feature f, a threshold θ, and a polarity p indicating the
direction of the inequality:
h(x,f, p,θ) = 1 if pf(x) < pθ;
= 0 otherwise......................................(2)
The key advantage of the AdaBoost over its competitors is the speed of learning. For each
feature, the examples are sorted based on a feature value. The optimal threshold for that feature can be
then computed in a single pass over this sorted list.
In their paper [8], Viola and Jones show that a strong classifier constructed from 200 features
yields reasonable results - given a detection rate of 95%, false positive rate of 1 to 14,084 was
achieved on a testing dataset. These results are promising. However, authors realized that for a face
detector to be practical for real applications, the false positive rate must be closer to 1 in 1,000,000.
The straightforward technique to improve detection performance would be to add features to the
classifier. This, unfortunately, would lead to increasing computation time and thus would turn the
classifier into inappropriate for real-time applications.
4.6.4.Detectors Cascade:
There is a natural trade-off between classifier performance in terms of detection rates and its
complexity, i.e. an amount of time required to compute the classification result.
Viola and Jones [2], however, were looking for a method to speed up performance without
compromising quality. As a result, they came up with an idea of detectors cascade (see Figure 4.5.4).
Each sub-window is processed by a series of detectors, called cascade, in the following way.
Classifiers are combined sequentially in the order of their complexity, from the simplest to the most
complex. The processing of a sub-window starts, then, from a simple classifier, which was trained to
reject most of negative (non-face) frames, while keeping almost all positive (face) frames. A sub-
window proceeds to the following, more complex, classifier only if it was classified as positive at the
preceding stage. If any one of classifiers in a cascade rejects a frame, it is thrown away, and a system
proceeds to the next sub-window. If a sub-window is classified as positive by all the classifiers in the
cascade, it is declared as containing a face.
Fig 4.6.4.1 AdaBoost Classifier
Introduction to open cv
Detection
By using the open cv code,we have implemented face detection using ‘viola-jones’ algorithm on
Raspberry pi board.
VI. CONCLUSION:
With the advancement the real time face detection in Raspberry pi is help for building much efficient
application. Moreover such technology can be useful in tracking the lost object under dynamic
environment. Further enhancement of this work can be extended with stereo depth analysis of face
detection using two image sensor interfaced with High speed Processor.
REFERENCES:
[1] P. Viola and M. Jones. Robust Real-time Object Detection. International Journal of Computer
Vision, 57(2):137–154,2002. 2, 4.
[2] M. Gopi Krishna, A. Srinivasulu, Prof (Dr.) T.K.Basak Face Detection System on Ada boost
Algorithm Using Haar Classifiers
[3] M. Turk and A. Pentland, “Eigen faces for recognition,” Journal of Cognitive Neuroscience, vol.
3, no. 1, pp. 71– 86, 1991.
[4] K. Etemad and R. Chellappa, “Discriminant analysis for recognition of human face images,”
Journal of the Optical Society of America A, vol. 14, no. 8, pp. 1724– 1733, 1997.