Final Year Project Report
Final Year Project Report
INTRODUCTION
By the sudden growth of intelligent traffic and road links, the Multi-Vehicle detection and
counting has become an important technique for gathering data about traffic and plays the
vital role in intelligent traffic management and control of the highway. With the popular
installation of traffic monitoring cameras a large database of traffic video can be obtained for
analysis.
These traffic management systems not only reduce delays and blockage due to traffic but also
plays a significant role in solving major road problems like - Identification of accidents and
vehicles moving on incorrect lanes, Checking that the traffic police is properly performing
their duty and showing traffic flow data. Modern technologies using deep learning have a
strong potential to replace these hardware-based systems in a cost effective manner with less
manpower and resources.
The necessary things needed which must be kept in mind while designing these models are
one must compare it with the previous research which are done on these traffic challenges,
understanding the various methods there accuracy and performance statistics in various
weather conditions like heavy rainfall, dusty weather and dense fog. On the other hand,
performance also decreases by the shadows formed by tall buildings, dense clouds. Keeping
1
all these challenges in mind an efficient dataset and training algorithm should be selected to
give contribution to the society.
To resolve these difficulties various models are put forward which can precisely detect and
count the number of vehicles in different conditions which helps in solving the real time
problems in day to day life.
1.1 Motivation
Vehicle Detection and Counting is a major need in this modern technology. It plays an
important role in civilian and military applications. Even though, there are so many
prediction techniques that are available in software engineering there is a need for stable
methodology
The advancement of road links and the increase in the number of vehicles like self-driving
cars, electric scooters in recent years, there is a need for modern technologies which can
solve traffic problems quickly.
An efficient model is needed for counting vehicles in parking lanes and also in collecting
taxes and parking charges from the vehicles efficiently.
The distractions are very common in the traffic videos of urban areas where vehicles do not
follow the rules and lanes systematically. This creates confusion in counting the vehicles and
this should be removed to make the model useful in society.
As a result, the motivation of this project is to propose a solution that will effectively detect
and count vehicles by training the dataset using object detection and counting algorithms to
increase the performance and efficiently detect and count vehicles.
2
1.2 Problem Statement
Vehicle detection and counting is now playing an important role in traffic management and
monitoring and requires for controlling traffic in a cost effective way with less manpower.
Our problem statement focuses on detecting different kinds of vehicles like buses, cars,
trucks, bikes in the given video and applying a suitable tracking algorithm to count the
vehicles in the given video frame efficiently.
● The main goal of the project is to detect the different types of vehicles in the given traffic
video using deep learning algorithms.
● To count the different types of vehicles like (car, truck, bus, bike) in the given traffic video
using tracking algorithms.
The Scope of the project is that it is useful in verifying the amount collected on tolls, this can
be achieved by installing a camera on the roadside . It can also be useful in parking
management. We can achieve our scope by:
- Collecting the Dataset : Once the dataset are available, the dataset is preprocessed and
cleaned from the original dataset. The selection of the right methods is very important
as they will improve the overall efficiency of the model.
- Using Object Detection Methods: Once the dataset is preprocessed the object
detection algorithms like (Yolo, SSD) is selected and used for training the dataset and
it helps in classification of vehicles in the given video.
3
- Applying the Tracking Algorithm: After the various vehicles are detected from the
trained dataset the tracking algorithms like (Deep Sort, ORB) are applied which will
help in counting the vehicles in the given video. For doing so, certain filters and
wrapper methods are applied.
Various experiments were conducted on different datasets. The outcome showed clear
improvements obtained using the object detection and tracking algorithms. To make it simple,
researchers have suggested various methods for vehicle detection and counting in real time.
Yang et al. [12], proposed a vehicle detection which is using the method of background
subtraction. They use a detection method with the technique of low-rank decomposition. It
contributes the favorable result on constant scenes but its performance decreases when the
background scenes change rapidly. Also, the vehicle counting process is still difficult, and it
is important to deal with partial blockage of the objects and variation of brightness, contrast
of the images. In future, the paper needs to improve on accurately detecting the object .
Abdelwahab et al. [11], proposed a different approach to count the vehicles using R-CNN and
tracking the vehicle using the KLT (KanadeLucas-Tomasi) tracker. Combining these two
methods it shows better performance on the trained dataset.
Zhe Dai et al. [2], also bring forward a vehicle counting framework in which there are three
stages: object detection was done using yolov3, object tracking with the help of KCF algo
and trajectory processing using region encoding method. It shows an result of object
detection with an accuracy of 87.6% in the high traffic and weather conditions.
Adson M. Santos et al [1], designs a system that use YOLOv3 for object detection and Deep
SORT for multiple object tracking algorithms; it showed an average accuracy of 99.15% in
the global count on GRAM and CD2014 datasets respectively. It can also count the vehicles
more efficiently.
4
Zuraimi et al. [13], also suggested a model using TensorFlow and you only look once (yolo)
for detecting vehicles in real time. Combining these two and other needed dependencies the
given paper compares the previous versions of yolo and picks yolov4 for implementation.
Furthemore, the system uses DeepSORT algorithm to help count the number of vehicles
passing in the video effectively. This paper concluded, the best model among available
YOLO models is Yolov4 which has achieved results with 82.08% AP50 using the custom
dataset.
● Chapter 1 presents the research problem, research objectives, justifying the need for
carrying out the research work and outlines the main contributions arising from the
work undertaken.
● Chapter 2 provides the essential background and context for this thesis.
● Chapter 3 provides the details of system architectural design and methodology
● Chapter 4 explains the implementation details and results obtained.
● Chapter 5 summarizes the report and briefs the future aspects.
This chapter is the basic building unit for execution of our project. It briefly introduced the
research problem, research objectives, scope of the project, previous related work and the
proposed solution framework. The next chapter examines the pertinent literature most
relevant to our research
5
CHAPTER 2
Literature Survey
This chapter focuses mainly on the review of the Real time object detection and various
tracking approaches that have already been implemented, emphasizing on:
Object detection is a digital approach for recognizing and to track down the items in a video
or image. It, in general, produces bounding boxes around the objects in an image, to locate
the things in a specified context. Image recognition and object detection are mistaken most
of the time.
6
The above image is classified through image recognition. The word "dog" refers to a dog in
an image. While, object detection creates a box around each dog and labels it "dog."The
detection method is used to anticipate the position of each object with proper labeling .It
provides extra information about an image in this manner.
Object tracking deals with the process of object detection. The overview of the steps followed
are:
● Object detection is a technique of detecting and classifying the object using a suitable
algorithm by creating a bounding boxes around it.
● Giving each object its own identification by assigning a unique Id.
● Following the labelled item when it moves across the frames and storing the essential
data.
To deal with the challenge of selecting a large number of areas, Ross Girshick et al. suggested
an approach in which we make use of a selective search algorithm to extract only 2000
regions from an image. Only 2000 regions need to be focused. These 2000 candidate region
7
suggestions are squared and input into CNN, which resulted in a 4096-dimensional feature
vector. The CNN deals with the feature extractor, which is fed into an SVM to estimates the
presence of the object within that candidate region proposal.
1. Add all bounding boxes irrespective of segmented parts to the group of regional
proposals
2. Create groups of similar segments.
3. Start from step 1
Larger segments are produced and added to the list of proposed regions with each iteration.
As a result, we use a bottom-up method to develop region suggestions, starting with smaller
areas and working our way up.
8
2.2.2 Problems with R-CNN
● Training time is very large because each image requires the classification of 2000
area recommendations.
● Since each image take 47 seconds for processing,it is not suitable for realtime
detecting
● Since the Selective algorithm is the same , no learning is taking place. This may
result in the creation of poor candidate region suggestions.
The same author came up with a new model to overcome the shortcomings of the previous
model, Fast R-CNN- a better object detection system. The working of the algorithm is quite
similar to the R-CNN .We give the CNN the input image to produce a convolutional feature
map. We select the region of proposals from the convolutional feature map, bind them into
squares, then organize them into a fixed size using a RoI pooling layer so that they may be
fed into a fully connected layer. feature vector.
9
2.3.1 Comparison between R-CNN, Fast R-CNN and SPP net
Fast R-CNN is considerably more precise in training and testing than R-CNN, as shown in
the graphs above. When comparing the performance of Fast R-CNN, using region proposals
on the whole declines the performance compared to not utilizing region proposals.
Although, even Fast RCNN has certain limitations yet it also brings forward a selective
search to locate the RoI, which is a time-consuming approach. It takes 2 seconds per image
in detecting objects, which is comparatively faster than RCNN. However, when dealing with
big real-world datasets, even a Fast RCNN becomes slow.
However, another object detection technique performed better than Fast RCNN. And I have a
feeling you won't be surprised by the name.
To identify region proposals, algorithms (R-CNN and Fast R-CNN) use selective search.
Selective search is a time consuming process that reduces the network
10
[Link], Shaoqing Ren et al. creates an object identification algorithm that
comes up with the limitations of selective search [Link] image is inserted into a CNN,
resulting in a convolutional feature map, same as Fast R-CNN. This model makes use of
featuremap besides selective search algorithms on the feature map to locate them. A RoI
pooling layer is then used to reshape the anticipated region proposals, which is subsequently
utilized to divide the images within the proposed region.
CNN is used in the YOLO method to detect objects in real time. To detect objects, the
approach just takes a single forward propagation through a neural network, as the name
suggests.. There are several versions of the YOLO algorithm. The YOLO focuses on object
detection in a different way. It inputs the whole image in a single instance and predicts the
bounding box coordinates with the class probabilities.
11
2.5.2 How YOLO works
● Residual blocks
● Bounding box regression
● Intersection Over Union (IOU)
[Link] Residual blocks: The image is first divided into several grids with dimension of SxS.
The figure below depicts a grid of an input image. There are a number of grid cells with
dimensions SxS of the same size as described in image below. Objects that fall within each
grid cell will be considered.
[Link] Bounding box regression: An outline that highlights an object in a picture is known
as a bounding box.
The properties of each bounding box in the image are:
● Width (bw)
● Height (bh)
12
● Class ( person, car, cat, etc.)- This is represented by the letter c.
● Bounding box center (bx,by)
[Link] Intersection over union (IOU): The concept of intersection over union (IOU)
explains how to detect the object when two boxes overlap. IOU is used by YOLO to create an
output box that properly surrounds the items. Each grid cell has a bounding box with a
confidence score The IOU is considered 1 if the predicted and real confidence score is the
same. If the size of the bounding box is different from the actual one then that box is
removed.
13
2.6 SSD (Single Shot Detection)
SSD is a model that detects objects, but what precisely does that imply? Object detection and
picture classification are often confused. In simple terms, image classification identifies the
type of image, whereas object detection identifies the various objects in the image and uses
bounding boxes to indicate where they are in the image. Let's get into SSD now that we've
cleared things up.
Single Shot Detector The model's name exposes the majority of the model's details. Yes,
unlike other models that traverse the image more than once to produce an output detection,
the SSD model identifies the object in a single pass over the input image.
Once an image is identified and located and its initial position is known,to predict its position
in the upcoming frames of a video is termed as “Object tracking”.
Object detection, on the other hand, is a method of creating the bounding boxes around the
images and predicts its position initially. The target image must be visible on the input for
object detection to work. This method is not suitable if it is caused by any interference.
Object tracking came into existence for over two decades, and several methods and ideas
have been developed to improve tracking models' accuracy and efficiency.
14
2.8.1 MDNet
Multi-Domain Net is an object tracking technique that uses enormous amounts of data to
train. Its goal is to learn a wide range of exceptions and various relationships.
MDNet has been taught to study the shared representation of targets from numerous
annotated videos, which means it takes many films from diverse domains.
Pretraining: The network must learn multi-domain representation during pretraining. The
system is trained on many annotated videos to learn representation and dimensional
information in order to accomplish this.
Visual tracking online: The domain-specific layers are removed after pre-training,only the
shared network left. A binary classification layer is introduced during inference and taught or
fine-tuned.
2.8.2 GOTURN
Deep Regression Networks models require offline training. This model works on a general
relationship between object motion and appearance and can be used to manage the objects
other than the training sets.
Due to the reason that they cannot use many videos to increase the efficiency, online tracker
algorithms are not so fast and performance is not upto the mark. GOTURN is a
regression-based technique. In essence, they use only a feed-forward run across the network
to regress straight to track target objects.
2.8.3 DeepSORT
DeepSort is a widely used object tracking algorithm. It's a SORT extension, which uses
online tracking options.
15
SORT is a method that estimates the location of an object based on its previous location using
the Kalman filter. The Kalman filter is highly good at removing occlusions.
After we've covered the principles of SORT, we have deep learning methods to improve the
performance of algorithm. Because deep neural networks can now recognise the features of
the target image, SORT can predict the object's location with significantly higher accuracy.
In paper [1], The YOLOv3 was used for detection of vehicles and deepsort for tracking and
counting the vehicles. These methods are also easy to understand. Thus, the paper contributes
in counting vehicles automatically, it comes with higher speed and thus is beneficial in
achieving traffic information. It concluded that methods used for implementation obtained
high accuracy in comparison to previously proposed methods.
In paper [2], it proposes a vehicle counting framework in which there are three stages: object
detection using yolov3, tracking using KCF algo and trajectory processing using region
16
encoding method. This paper suggests a better tracking algorithm which helps in increasing
its performance in congested areas.
In paper [3],This paper proposes a vehicle counting framework that uses the SSD model for
multi-vehicle detection and correlation matched algorithm for multi-vehicle object tracking
and trajectory optimization algorithm based on the least squares method. The proposed
framework model solves the problem of occlusion and vehicle scale change in the tracking
process.
In paper [4],This paper proposes the uses of FPN and Cascade R-CNN for multi-vehicle
[Link] framework proposes an architecture that enables precise detection and
classification of [Link] model performance is achieved 59.78% for cars.
In paper [5], A model of vehicle identification and counting, that consolidated the deep
learning recognition method YOLOv4 with object tracking method [Link] framework
is important in the field of highway and transport infrastructure management and much better
than the traditional methods. Not good results when performing on real-life videos.
In paper [6],It Proposes the object detection algorithm YOLO v4 and optimized it for vehicle
[Link] are the various scopes of application in IC detection,Crack Detection, Face
detection, [Link] final combined model gives benchmark results with a MAP of 67.7%.
In paper [7],It proposes a brilliant method, combining spatial-visual feature learning and
global 3D state estimation,to track moving vehicles in a 3D world. This framework is useful
in estimating the complete 3D bounding box .This 3D tracking approach can match with the
competitive results with an image only.
In paper [8],It proposed a system to count vehicles by utilizing the various [Link]
experiences wrong Detection and duplicate of vehicles in some cases , To provide
information assisting vehicle counting, traffic flow prediction, and vehicle speed
measurement.
17
In paper [9], It proposed a vehicle detection and tracking method from aerial [Link]
approach is capable of handling both static and moving backgrounds. We use a foreground
detector for static backgrounds, which can overcome tiny variations in the real picture by
updating the model. To calculate motion of camera for moving background, image
registration is used which is helpful in vehicle detection over a specific frame.
In paper [10], we began by obtaining photographs and then working on and performing
various operations on them. Then use the haar cascade for object detection, and see how
different haar cascades are employed for car and bus detection. For further item detection,
many pre-trained hair cascades were used.
As can be concluded from the above-mentioned research papers,it shows variations in the
efficiency of the vehicle detection and tracking which depend on the data sets we choose.
Below is the table (table 2.1) representing the various research papers published and their
conclusions which were studied through literature survey. It gives detailed information of the
methods, the] datasets and the conclusions drawn in the research paper.
1. Counting Vehicle It uses YOLOv3 for GRAM It can detect This proposal achieved an accuracy of
with high precision object and count 99.15% in the global count in
CD2014
in vehicles but
detection and Deep GRAM and CD2014 datasets
unable to
brazilian road SORT for multiple
It also obtained an accuracy of over
classify them
using yolov3 and objects tracking.
90% in
individually
deep sort (2020)
real scenes of Brazilian federal
highways
18
2. Video Based It proposes a vehicle VCD It was unable The obtained result show that the
counting framework to detect accuracy reaching 87.6%, even if the
Vehicle Counting VDD
in which there bikers in the traffic condition is quite complex.
Framework (2019)
street
are three stages
object detection It accuracy
(using yolov3), decreases near
object crowded
places like
tracking (using KCF
algo) and trajectory hospitals and
processing commercial
centers
(using region
encoding method)
3. Video-Based This paper proposes NOHWY This neural The result shows that the proposed
Vehicle Counting a vehicle counting network does vehicle counting method obtains more
for Expressway framework which not generate than 93% accuracy and 25 FPS speed
Based on Vehicle uses the SSD model enough high on vehicle counting based on vehicle
Detection and for multi-vehicle level features tracking.
Correlation-Match detection and to do
ed Tracking correlation matched prediction for
algorithm for small objects.
(2020)
multi-vehicle object So it does
tracking and worse for
trajectory smaller
optimization objects.
algorithm based on
least squares method.
4. Vehicle counting This paper proposes VisDrone Results The model obtained an average
and tracking in the uses of FPN and showed that accuracy of 59.78% for cars when the
2019.
Aerial video feeds Cascade R-CNN for precision for IOU with ground truth was greater than
using Cascade multi vehicle the other four 0.5. The precision dropped for the
RCNN and feature [Link] is classes other categories such as vans and
Pyramid performed simply by resulted from trucks, resulting in an overall average
Networks(2021) measuring the IOU the lack of precision of 20.46%.
19
between detected training
objects in two examples in
subsequent frames. these
categories
compared
with the car
category.
5. Real-time vehicle A tale model of COCO Not good Good overall performance is achieved
detection and vehicle identification results when in terms of tracking accuracy.
OPEN-
counting based and counting ,that performing on
With the combination of yolov4 and
IMAGE
yolo and consolidate the deep real life
DeepSort can be seen to outperforms at
deepsort(2020) learning recognition videos with
least 11% of AP and 12 % of AP50 the
method YOLOv4 regularly
original YOLO v4.
with object tracking changing
method DeepSort brightness and
background
slow moving
vehicles
6. Refining YOLO v4 It Proposes the object UA-DETR DIoU with Results with an MAP of 67.7%
for Vehicle detection algorithm AC NMS makes (10%-point higher than base model) on
Detection(2020) YOLO v4 and benchmark the system the DETRAC-test dataset.
optimized it for dataset less open to
vehicle occlusion due
[Link] v4 to the central
provides higher distance
accuracy and faster
along with an
results so as to
overlap area.
implement real-time
vehicle detection
7. Joint Monocular It proposes an ideal GTA Monocular Model filters out 6 − 8% possible
3D Vehicle framework, 3D tracking mismatching trajectories To analyze its
KITTI
combining visual approach can impact.
20
Detection and feature learning and reach
Tracking.(2019) global 3D state competitive
estimation,to track results with
moving vehicles in a image stream
3D world. only.
9. Vehicle Counting It proposes a system There are Aerial videos The experimental results of 16 aerial
Based on Vehicle based on the platform many videos show that the proposed method
Detection and of UAV It consists of limitations produces more than 90% and 85%
Tracking from vehicle detection, of using accuracy on static-background videos
Aerial Videos multi-vehicle Surveillance and moving-background videos,
(2018) tracking, videos respectively.
multi-vehicle cameras
management, and such as
vehicle counting. problem of
occlusion ,
shadow and
limited
views
10. Vehicle detection It uses the moving Not suitable CCD images It achieve a performance of 85%.
and tracking based tracking function for
AVI videos
on openCV (2020) library and camshaft multi-target
algorithm to tracking
construct a vehicle system.
21
video analysis
system.
This chapter contains the papers that helped us to understand and reach a position where we could
implement different techniques and eventually do their comparative analysis.
22
CHAPTER 3
We make a model which is firstly trained using the dataset collected with the help of
kaggle and manually collected images. And then the dataset is pre-processed and
further annotated in Yolo format. This custom dataset is used to train the Yolov4
model and the trained weights is used in tracking using the deepsort for counting the
vehicles.
23
3.1.2 System Design
This Project is developed in python using Google collab, Tensorflow, and OpenCV. The
Google Colab is an open-source web application that allows anyone to write and achieve
capricious python code through the browser and is mainly well suited in machine learning
data analysis, & [Link] open source plan of Action and software library for
machine learning and artificial [Link] the flexible environment of tools, libraries
and it let the developers develop and deploy ML based applications easily.
We have collected images of different classes which include cars, buses, trucks, and bikes
with the help of Kaggle, robo-flow, and google.
After the collection of data, we filtered the noisy and blurred images for better training of our
model. Meanwhile, Furthermore we also adjusted the brightness, hue, and contrast.
In the next step with the help of CVAT (Computer Vision Annotation Tool), we have created
bounding boxes and annotation and divided our dataset into two parts:-
The training of our model on our local machine is really time taking and requires a lot of
dependencies if we don’t have a powerful GPU. So to avoid this we have chosen to run our
code on Google Colab since it does provide a free GPU and online environment.
We have collected images of different types of vehicles and performed some data
augmentation methods on it like resizing the images, brightness adjustment, color adjustment,
rotation of the images (clockwise/anti-clockwise), cropping the images etc. And created
bounding boxes and annotations on them.
24
We have split our data into two scenes: day and night time and trained them into eight classes
(four classes in each scene, which are motorbike, truck, bus, and bike).
Here are some snapshots of the dataset used for training our model.
The adaptations we do to our data prior to passing it to the algorithm are touched on as
pre-processing. Data preprocessing is mainly a technique for transforming raw data into a
polished data set. To be more explicit, whenever data is received from many sources, it is
collected in raw file, which makes analysis impractical. Data must be formatted properly in
order to achieve finer results from the applied model in Machine Learning or deep learning
applications. The data-preprocessing which we have done includes the Data cleaning which
means that we have deleted the images from the dataset which does not contain any of the
objects in it. Data build up is done to increase the number of images in the dataset. Data
25
augmentation techniques include the cropping of the images, flipping of the images, rotation
of the images, changing the brightness of the images, adjustment of the contrast of the
images,hue and saturation adjustment.
There are different metrics that we have used for measuring the performance of the
training Yolov3. Following are the names of various metrics used in our project:
1. mAP: To estimate object detection models the same as R-CNN & YOLO, the mean
average fidelity(mAP) is used. The mAP contrasts the ground truth bounding box to
the discovered box and returns a score. The higher the value of score, the more exact
the model is in its detections.
2. IOU: Crossing over fusion is an evaluation metric used to measure the correctness of
an object locator on a particular dataset.
26
3.7 Chapter Summary
This chapter ascertain the system design and architecture required for the implementation of the
models. And It also furnishes a detailed knowledge of the proposed procedure used in the project.
27
CHAPTER 4
The project is built in python language using Google Colab. In this project, we have tried to
detect the vehicle in the video dataset and then track the vehicle and then count the total
number of vehicles in the given video dataset. For obtaining results, the use of some python
libraries have been taken.
4.1.1 HardwareRequirements:
4.1.2 SoftwareRequirements:
● The vehicle should be running on the roads and the vehicles classes should be
belonging among the [Link] file.
● There should be enough light present in the testing and training dataset.
28
● IOU and mAP are used as the metrics for our model performance evaluation as in the
case of object detection IOU works best to measure the overlapping of a predicted
bounding box versus actual bounding box of an object.
● In this section we provide the result obtained by the implementation of the chosen
technique and help us to justify the use of the proposed objection detection and
tracking technique.
● The dataset used in training and testing the object detection model consists of 25000
images. Each image consists of the object belonging from the four classes i.e
motorbike, car, bus, truck. And few of the images in the dataset do not consist of any
of the objects belonging from the above-mentioned four classes, so we have deleted
those images from the dataset.
● The performance of object detector and tracker is evaluated on IOU and mAP.
● The dataset we prepared is trained with the help of YOLOv4 model. We have
selected the v4 version of yolo among all the available options. It uses CNN having
twenty four convolution layers, four max-pooling layers and two fully connected
layers.
● The counting of the objects is implemented using Deepsort. This can be achieved with
the help of Kalman filter.
● In terms of looks, features similarity, and movement distance, verified tracks and
detections are evaluated. The association findings of verified tracks and detections are
then generated using the Hungarian method.
29
● The Kalman filter and the motion prediction model are used to update the multiple
tracking in the motion state. Furthermore we build new tracks for unrelated
detections.
Below snapshot showing a code for removing the images from the dataset having no object
within it.
30
The below Snapshot is of the code to divide the dataset in 90% for training and 10% for
testing.
Figure 4.2 Code for dividing the dataset in the training and testing part.
31
The below snapshot is of the code to start the training of our Yolov4 model on custom
dataset and the weights are getting saved on drive
32
The below snapshot shows a code to Copy our trained model in tracking part to tracking part
and running save_model.py in cmd.
Figure 4.4 Code to copy our trained model to the tracking part.
33
Below snapshot shows a code for importing vehicle Counting class in object_tracker.py and
using run to start running. Video is divided into frames and each object in one frame has
assigned some unique id’s.
Figure 4.5 Snapshot of the code to start the counting using Deepsort
34
4.3.2 Results:
Below is the graph plotted between the loss and number of iteration. The graph shows the
two curve one is of blue color and another is of red color. Blue curve shows the loss while the
red curve shows the mean average precision(mAP) at 50% Intersection-over-Union(IOU)
threshold ([email protected])
Figure 4.6 Graph showing the loss and mAP while training the Yolov4 on custom dataset
35
The below snapshot is output of the tracking code which is showing the tracking of object in
each frame of the video dataset.
This chapter contains an introduction about the prerequisites required for implementation. It
also contains the result obtained by applying our model.
36
CHAPTER 5
CONCLUSIONS
5.1 Conclusion
We can see that we can successfully detect and count the vehicles in the given video frames
containing four classes of vehicles: cars, buses, trucks and bikes. After training our dataset on
the YOLOv4 model we obtain an mAP(mean average precision) of 83.80% and we are also
able to detect and count vehicles in bad weather conditions.
Our results also helps us in understanding various deep learning models and choosing Yolov4
and deep sort for implementation and which helps us in obtaining desired outcome and also
come forward with the challenges which need to be improved in our proposed system.
In the future, we plan to work on improving the limitations of our project that the model is
unable to count Indian vehicles like autos which are used widely in India. We also hope to
work on training our model on a dataset containing images of bad weather conditions like
heavy rainfall, dusty weather and dense fog and achieve higher accuracy and performance.
37