Object Recognition
Object Recognition
Department of Electronics and Telecommunication Engineering S.V.K.MsNMIMS, Mukesh Patel Schoolof Technology Management and Engineering
MAY 2014
Page i
CERTIFICATE
This is to certify that this B.Tech (Telecommunication) Project report titled Object Recognition Using Template Matching approved by Prof. Aniket Kulkarni is approved by me. It is further certified that, to the best of my knowledge, the report represents work carried out by the student at MPSTME, SVKMs NMIMS, Shirpur campus during the academic year 2013-14 (Project Stage).
2. ________________
2. _______________
Page ii
Acknowledgement
Apart from, the success of our seminar depends largely on the encouragement and guidelines of many others. We take this opportunity to express my gratitude to the people who have been instrumental in the successful completion of this seminar. We would like to show our greatest appreciation to Prof. Aniket Kulkarni, our project guide and Prof. Shashikant Patil Head, EXTC Dept, Dr. M. V. Deshpande (Associate Dean). We cant thank them enough for their tremendous support and help. Without their encouragement and guidance this seminar would not have materialized. We are grateful for the guidance, support and help received from other students who contributed and contributing to this seminar which is vital for the success of this seminar.
Name of the students: Aaruni Bhugul(702) Pulkit Khandelwal(725) Sanyam Mehndiratta (732)
Page iii
Table of Contents
Chapter No.Title Page No.
Acknowledgement Abstract 1. Objective 1.1 Objective 1.2 Motivation and Justification 2. Introduction 2.1 General Framework 3. 4. Literature Review Theoretical Background 4.1 Programming Language 4.2 OS Support 4.3 Background Substrate 4.4 Gaussian Filtering 4.5 Object Detection Algorithm 5. Proposed Technique 5.1 Sliding Window Object Localization 5.2 Template Matching 5.2.1 Template Matching by Cross Correlation 5.2.2 Normalized cross Correlation
Department of Electronics & Telecommunication Engineering, MPSTME, SVKMs NMIMS
iii viii
9 9 10 11 12 14 15 15 17 17 19 24 26 30 32 32
Page iv
6. 7. 8.
Work Done Results Future Work 8.1 Human Action Analysis 8.2 Hardware Implementation 8.3 Modification and Algorithm for PTZ Camera
33 37 45 45 45 45
References
46
Page v
List of Figures
: General Framework : Open CV Framework : Gaussian Filtering : Sliding Window Object Localization : Sliding template image over source image
11 14 18 27 28 29 37 37 38 38 38 39 39 40 40 40 40 40 41
Page vi
Figure 5.2 (b) : Resultant image with maximum match Figure 7.1(a) Figure 7.1(b) Figure 7.1(c) Figure 7.1(d) Figure 7.1(e) Figure 7.1(f) Figure 7.2(a) Figure 7.2(b) Figure 7.2(c) Figure 7.2(d) Figure 7.2(e) Figure 7.2(f) Figure 7.3(a) : Sample 1 original image : Sample 1 template image 1 : Sample 1 template image 2 : Sample 1 image after applying algorithm : Sample 1 resultant window 1 : Sample 1 resultant window 2 : Sample 2 original image : Sample 2 template image 1 : Sample 2 template image 2 : Sample 2 image after applying algorithm : Sample 2 resultant window 1 : Sample 2 resultant window 2 : Sample 3 original image
Figure 7.3(b) Figure 7.3(c) Figure 7.3(d) Figure 7.3(e) Figure 7.3(f) Figure 7.4(a) Figure 7.4(b) Figure 7.4(c) Figure 7.4(d) Figure 7.4(e) Figure 7.4(f)
: Sample 3 template image 1 : Sample 3 template image 2 : Sample 3 image after applying algorithm : Sample 3 resultant window 1 : Sample 3 resultant window 2 : Sample 4 original image : Sample 4 template image 1 : Sample 4 template image 2 : Sample 4 image after applying algorithm : Sample 4 resultant window 1 : Sample 4 resultant window 2
42 42 42 42 42 43 43 43 44 44 44
Page vii
Abstract
A computer vision system has been developed for real-time Motion detection and human motion tracking of 3 D objects including those of variable internal parameters. A fast algorithm based on various algorithms of Template matching like correlation matrix, absolute difference matrix, and their normalized parts have been implemented along with a Template Updating technique using sliding window object localization approach to track the motion of a detected body in the surveillance video. A fast algorithm based on color based differentiation technique is also implemented which tracks the moving object on the basis of its dominant color. Furthermore, a data structure implementation algorithm has been proposed to reject the non-useful areas of a binary image formed after various filtering techniques. The algorithms implemented provide accurate results for the human surveillance. The methods allows for larger frame to frame motion and can robustly track models with degrees of freedom while running on relatively inexpensive hardware. These provide a reasonable compromise between the simplicity of parameterization and the expressive power for subsequent scene understanding. The proposed applications of algorithms implemented in this report could be human motion analysis in visual surveillance, where path of the person is required.
Page viii
Chapter 1 Objective
1.1
Objective
To develop an automated Object detection system for analyzing motion of target object in a video stream from video surveillance
1.2
Object detection, path tracking & Action Recognition are the most active fields of research in the field of Computer Vision & Image Processing. Traditional surveillance systems require human beings to continuously monitor several incoming videos. Surveillance cameras are already prevalent in commercial establishments, while camera outputs are usually recorded in tapes or stored in video archives. Such systems are prone to human errors. Thats why there is need of an automated intelligent system to detect classify and track human motion. Major concern is to detect the required object or required human in a video, which is essentially required in most of real life applications like robotics, defence etc.
The areas where the object detection and human motion analysis systems can be used are: 1.1 For surveillance and monitoring of the people to ensure that they are within the norms 1.2 For Military and Police surveillance. 1.3 In the field of Robotics where path tracing and motion analysis is required. 1.4 Educational & Manufacturing industries.
Page 9
Chapter 2 Introduction
Object Detection, Classification and Tracking is an important task within the field of computer vision. There are three key steps in video analysis:-
Object detection in video streams has been a popular topic in the field of computer vision. Tracking is a particularly important issue in human motion analysis since it serves as a means to prepare data for pose estimation and action recognition. In contrast to human detection, human tracking belongs to a higher-level computer vision problem. However, the tracking algorithms within human motion analysis usually have considerable intersection with motion segmentation during processing.
As one of the most active research areas in computer vision, visual analysis of human motion attempts to detect, track and identify people, and more generally, to interpret human behavior, from image sequences involving humans. Human motion analysis has attracted great interests from computer vision researchers due to its promising applications in many areas such as visual surveillance, perceptual user interface, contentbased image storage and retrieval, video conferencing, athletic performance analysis, virtual reality, etc.
Videos are actually sequences of images, each of which called a frame, displayed in fast
Department of Electronics & Telecommunication Engineering, MPSTME, SVKMs NMIMS
Page 10
enough frequency so that human eyes can percept the continuity of its content. It is obvious that all image processing techniques can be applied to individual frames. Besides, the contents of two consecutive frames are usually closely related. This project is to be implemented on OpenCV (Open source Computer Vision Library), which is an open source C++ library for image processing and computer vision, originally designed by Intel.
Page 11
A general framework [9] for Object detection analysis involves stages such as motion detection with the help of background subtraction and foreground segmentation, object classification, and motion tracking.
The focuses were on three major areas related to interpreting human motion: (a) motion analysis involving human body parts, (b) tracking moving human from a single view or multiple camera perspectives, and (c) recognizing human activities from image sequences.
Collins et al. [13] classified moving object blobs into four classes such as single human, vehicles, human groups and clutter, using two factors, namely area and shape factor.
Bo Wu and Ram Nevatia [14] proposed an approach to automatically track multiple, possibly partially occluded humans in a walking or standing pose from a single camera, which may be stationary or moving. A human body is represented as an assembly of body parts. Part detectors are learned by boosting a number of weak classifiers which are based on edge-let features. Responses of part detectors are combined to form a joint likelihood
Page 12
model that includes an analysis of possible occlusions. The combined detection responses and the part detection responses provide the observations used for tracking. An object is tracked by data association and mean-shift methods. This system can track humans with both inter-object and scene occlusions with static and non-static backgrounds. This method yields good results but at the sake of high computations. Paper does not explore the interaction between detection and tracking. The proposed system works in a sequential way: tracking takes the results of detection as input. How- ever, tracking can be used to facilitate detection. One of the most straightforward ways is to speedup detection by restraining the searching in the neighborhood of prediction by tracking. Liang Xiao[15] talks about two types of Image sequences formed by the moving target one is the static background, the other is the varying background. It states that former case usually occurs in the camera which is in a relatively static state, produces moving image sequences with static background while the latter occurs in the target movement, when camera is also in the relative movement state. It describes two method of moving target detection namely temporal differencing and background subtraction. Temporal differencing can be used for static background while background subtraction is used for changing background. It also talks about optical flow methods but criticizes them for their need of specialized hardware.
Recent years have seen consistent improvements in the task of automated tracking of pedestrians in visual data. The problem of tracking of multiple targets can be viewed as a combination of two intertwined tasks: inference of presence and locations of targets; and data association to infer the most likely tracks. Research in the analysis of objects in general, and humans in particular, has often attempted to leverage the parts that the objects are composed of. Indeed, the state-of-the-art in human detection has greatly benefited from explicit and implicit detection of body parts [17]. A model of spatial relationships between detected parts is learned in an online fashion so as to split pedestrian track lets at points of low confidence.
Department of Electronics & Telecommunication Engineering, MPSTME, SVKMs NMIMS
Page 13
4.1 OpenCV
OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed at real-time computer vision, developed by Intel Russia research center in Nizhny Novgorod, and now supported by Willow Garage and Itseez. It is free for use under the open source BSD license. The library is cross-platform. It focuses mainly on realtime image processing. If the library finds Intel's Integrated Performance Primitives on the system, it will use these proprietary optimized routines to accelerate it.
Department of Electronics & Telecommunication Engineering, MPSTME, SVKMs NMIMS
Page 14
4.1.2 OS support
OpenCV runson Windows,Android, Maemo, FreeBSD, OpenBSD, iOS, BlackBerry
10, Linux and OS X. The user can get official releases from SourceForge, or take the current snapshot under SVN from there. OpenCV uses CMake
Page 15
Page 16
Figure 4.3.(b) The image shows the effect of filtering with a Gaussian of = 2.0 (and kernel size 99).
Figure 4.3.(c) The image shows the effect of filtering with a Gaussian of = 4.0 (and kernel size 1515).
Page 17
Tracking can be divided into various categories according to different criteria. As far as tracked objects are concerned, tracking may be classified into tracking of human body parts such as hand, face, and leg and tracking of whole body. Certainly, tracking can also be grouped according to other criteria such as the dimension of tracking space (2-D vs. 3D), tracking environment (indoors vs. outdoors), the number of tracked human (single human, multiple humans, human groups), the cameras state (moving vs. stationary), the sensors multiplicity (monocular vs. stereo), etc.
Page 18
Page 19
Page 20
Page 21
Page 22
The Algorithm is implemented in OPENCV and the approach used for object tracking is as follows:
1. First a template image is to be loaded. A Template image (T) in the patch image which will be compared to the source image. 2. After that video in which detection is to be done is loaded. 3. After loading a video, matching method is to be applied on the first frame 4. Then an object is detected in the first frame by making rectangular box around the object in the first frame. 5. Gaussian Filters are applied on each consecutive frames of the video. 6. The next objective is to find the object in the image sequence. Foreground detection is done by using sliding window approach followed by template matching which is described later.
Page 23
Load Template
Load Video
Read Frame
Apply Matching Method
Page 24
In sliding-window-based approaches for object detection, sub-images of an input image are tested whether they contain the object of interest. Potentially, every possible subwindow in an input image might contain the object of interest. However, in a VGA image there are already 23;507;020;800 possible sub-windows and the number of possible sub windows grows as n for images of size n _n .We restrict the search space to a subspace R by employing the following constraints. First, we assume that the object of interest retains its aspect ratio. Furthermore, we introduce margins dx and dy between two adjacent sub windows and set dx and dy to be 1/10 of the values of the original bounding box. In order to employ the search on multiple scales, we use a scaling factor s = 1.2a, a {1010} g for the original bounding box of the object of interest. We also consider sub windows with a minimum area of 25 pixels only. |R|= [ ( )][ ( )]
w and h denote the size of the initial bounding box and n and m the width and height of the image.
Page 25
For sliding window we need two primary components: a. Source image (I): The image in which we expect to find a match to the template image. b. Template image (T): The patch image which will be compared to the sorce image. Our goal is to detect the highest matching area.
Page 26
To identify the matching area, we have to compare the template image against the source image by sliding it.
Page 27
By sliding, we mean moving the patch one pixel at a time (left to right, up to down). At each location, a metric is calculated so it represents how good or bad the match at that location is (or how similar the patch is to that particular area of the source image). For each location of T over I, you store the metric in the resultmatrix (R). Each location in R contains the match metric.
The
image
above
is
the
result R of
sliding
the
patch
with
metric TM_CCORR_NORMED. The brightest locations indicate the highest matches. As you can see, the location marked by the red circle is probably the one with the highest value, so that location (the rectangle formed by that point as a corner and width and height equal to the patch image) is considered the match. In practice, we use the function minMaxLocto locate the highest value (or lower, depending of the type of matching method) in the R matrix
Page 28
a) Source Histogram (I): The histogram of image in which we expect to find a match to the template image histogram. b) Template Histogram (T): The histogram of patch image which will be compared to the template image histogram.
The goal is to detect the highest matching area. To identify the matching area, the template image histogram is compared against the source image histogram by sliding it using sliding window approach explained in previous topic.
For each location of T over I, you store the metric in the result matrix(R). We use following methods [9] for matching:-
( (
))
( ( ( (
)) ))
R(x,y)=
) (
Page 29
c.
( (
))
) ))
) (
( (
))
) ))
) (
Page 30
Then the location with higher matching probability is localized and a rectangle is drawn around the area corresponding to the highest match and objected is detected.
Template matching techniques [3] attempt to answer some variation of the following question: Does the image contain a specified view of some feature, and if so, where? The use of cross correlation for template matching is motivated by the distance measure. The resulting correlation term c(u,v) is a measure of the similarity between the image and the feature.
Page 31
the image before cross correlation. In a transform domain implementation the filtering can be conveniently added to the frequency domain processing, but selection of the cutoff frequency is problematic a low cut-off may leave significant image energy variations, whereas a high cut-off may remove information useful to the match.
Normalized cross correlation overcomes these difficulties by normalizing the image and template vectors to unit length, yielding a cosine-like correlation coefficient.
Page 32
In sliding-window-based approaches for object detection, sub images of an input image are tested whether they contain the object of interest. Potentially, every possible sub window in an input image might contain the object of interest.
The template used in the previous iteration is no more useful to us because with the motion of the moving body, the template might not match any area after a few frames have passed in further iterations.
Moreover a moving body might change its angle of orientation towards the camera when the next few frames are read.
To overcome these shortcomings the template update approach comes in quite handy. Whenever the template is matched with a certain area in a frame, the detected area is bounded by a rectangle whose size as same as the size of the template. This rectangle is then cropped from the frame and the cropped image becomes our new template in the next iteration. This approach where at every frame our template is updated gives accurate results until and unless the frames are missed or the motion is so rapid that matching a
Department of Electronics & Telecommunication Engineering, MPSTME, SVKMs NMIMS
Page 33
template fails in the very next frame. These conditions are rarely observed in our day to day life so template matching and update technique tracks the path of a human very accurately. In case of multiple human motions tracking this approach is quite useful as it distinguishes between two blobs directly on the basis of template matching and updating. Various features like orientation, area, color, contrast etc come into play when template matching is used as the area most alike would obviously give the minimum difference. This difference is plot in terms of grey scale and is shown in the results. The following color based approach can be said to be a sub-part of this approach but the time reduction in tracking the motion that we achieve with color based approach is quite good
Page 34
Chapter 7 Results
7.1 Results and Discussions
The six approaches for template matching which have been described provide different results in different scenarios. Some are more accurate in one while others are more accurate in the other. So there is no trade off. Here 4 sample results are shown with original frame image and initial templates. First image is the frame input from the video and template based and updating algorithm searches for the templates of the faces provided in the beginning and being updated at each frame.
Page 35
a) Original Image
b) Template 1 c) Template 2
Page 36
e) Updated template 1
f) Updated template 2
g) Resultant window 1
h) Resultant window 2
Department of Electronics & Telecommunication Engineering, MPSTME, SVKMs NMIMS
Page 37
a) Original image
b) template 1 c) template 2
Page 38
e) Updated template 1
f) Updated template 2
g) Resultant window 1
h) Resultant window 2
Department of Electronics & Telecommunication Engineering, MPSTME, SVKMs NMIMS
Page 39
a) Original image
b) template 1 c) template 2
Page 40
e) Updated template 1
f) Updated template 2
g) Resultant window 1
h) Resultant window 2
Department of Electronics & Telecommunication Engineering, MPSTME, SVKMs NMIMS
Page 41
a) Original image
b) Template 1
c) Template 2
Page 42
e) Updated template 1
f) Updated template 2
g) Resultant window 1
h) Resultant window 2
Department of Electronics & Telecommunication Engineering, MPSTME, SVKMs NMIMS
Page 43
Different approaches of human behavior analysis will be studied and implemented. Some of these are Action Recognition, Stick figure model, 3-D & 2-D contours.
Page 44
9. References
[1] Ff. R.T. Collins, A.J. Lipton, T. Kanade, Introduction to the special sectionon video surveillance, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000) 745746
[2] A. R. Francois and G. G. Medioni. Adaptive colour background modeling for realtime segmentation of video streams. In Proceedings of the International Conference on Imaging Science, Systems, and Technology, pages 227{232, 1999. [3] R.O. Duda and P.E.Hart, Pattern Classification and Scene Analysis, New York: Wiley, 1973.
[4] R. C. Gonzalez and R. E. Woods, Digital Image Processing (third edition), Reading, Massachusetts: Addison-Wesley, 1992
[5] G. R. Bradski and J. Davis, Motion Segmentation and Pose Recognition with Motion History Gradients, Machine Vision and Applications, 2002
[6] D. Meyer, J. Denzler, H. Niemann, Model based extraction of articulated objects in image sequences, Proceedings of the Fourth International Conference on Image Processing, 1997
[7] R. Brunelli. Template Matching Techniques in Computer Vision: Theory and Practice.Wiley Publishing, 2009 [8] W. C. Abraham and A. Robins. Memory retentionthe synaptic stability versus plasticity dilemma. Trends in neurosciences, 28(2):7378, Feb. 2005.
[9] OpenCV, Learning. "Computer vision with the OpenCV library." GaryBradski, Adrian Kaehler(2008).
[10] Wang, Liang, Weiming Hu, and Tieniu Tan. "Recent developments in human
Department of Electronics & Telecommunication Engineering, MPSTME, SVKMs NMIMS
Page 45
[11] J.K. Aggarwal, Q. Cai, Human motion analysis: a review, Proceedings of the IEEE Workshop on Motion of Non-Rigid and Articulated Objects, 1997, pp. 90102 [13] Lipton, Alan, et al. A system for video surveillance and monitoring. Vol. 2. Pittsburg: Carnegie Mellon University, the Robotics Institute, 2000.
[14] Bo Wu and Ram Nevatia, Detection and Tracking of Multiple, Partially Occluded Humans by Bayesian Combination of Edgelet based Part Detectors, Tenth IEEE International Conference, Computer Vision, 2005. ICCV 2005.
[15] Liang Xiao and Tong-qiang Li, Moving Object Detection and Tracking, 2010
[16] SourabhKhire and JochenTeizer, Object Detection and Tracking, Information and Automation (ICIA), IEEE International Conference 2008
[17] Mikolajczyk, K., Schmid, C., Zisserman, A.: Human detection based on a probabilistic assembly of robust part detectors. In: ECCV. (2004)
[18] Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.:Object detection with discriminatively trained part-based models. PAMI 32 (2010) 1627 1645
[19] Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: ICCV. (2009)
[20] Tian, T.P., Sclaroff, S.: Fast globally optimal 2d human detection with loopy graph models. In: CVPR. (2010)
Page 46