0% found this document useful (0 votes)
47 views6 pages

AI-Powered Weapon Detection System

The document presents a multiplatform surveillance system for weapon detection using the YOLOv5 algorithm, aimed at enhancing safety and security in various environments. The system utilizes AI for real-time object detection of firearms, sending alerts to users and authorities upon detection. It demonstrates significant improvements over traditional CCTV systems by minimizing human error and providing rapid response capabilities.

Uploaded by

sale saisannidh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views6 pages

AI-Powered Weapon Detection System

The document presents a multiplatform surveillance system for weapon detection using the YOLOv5 algorithm, aimed at enhancing safety and security in various environments. The system utilizes AI for real-time object detection of firearms, sending alerts to users and authorities upon detection. It demonstrates significant improvements over traditional CCTV systems by minimizing human error and providing rapid response capabilities.

Uploaded by

sale saisannidh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

International Conference on Emerging Technologies (ICET) 2022

Multiplatform Surveillance System for Weapon


Detection using YOLOv5
Osama Rasheed Adam Ishaq Muhammad Asad
Dept. of Electrical Engineering Dept. of Electrical Engineering Dept. of Electrical Engineering
Institute of Space Technology Institute of Space Technology Institute of Space Technology
Islamabad, Pakistan Islamabad, Pakistan Islamabad, Pakistan
[email protected] [email protected] [email protected]

Tufail Sajjad Shah Hashmi


Dept. of Electrical Engineering
Institute of Space Technology
Islamabad, Pakistan
[email protected]
2022 17th International Conference on Emerging Technologies (ICET) | 978-1-6654-5992-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICET56601.2022.10004690

Abstract—In today’s modern world, safety and security are that go beyond just simple stationary cameras and different
big concerns. Daily, banks and shops are robbed, and the ratio alarm sensors. The element of surveillance plays its part in
of robberies is increasing eventually. So, there is a need for video and security and now it extends into the arena of furtive
such a safety system that minimizes the chances of robbery
and helps keep the peace and safety maintained. Mostly CCTV tactics and investigations.
(Closed-circuit Television) surveillance system is used, which is Artificial Intelligence has redefined society in a way we
quite inefficient due to the dependency on the human factor. have never seen before. Technology is clinging to us at every
Adding AI (Artificial Intelligence) i.e., Object Detection in the step of our daily lives right from unlocking our phones to our
system will not only be efficient in discernment but will also day-to-day activities [1]. Initially, the computer only learned
identify the potential threat in much less time. The project uses
the latest object detection technique YOLO (You Only Look to detect the objects of interest but now it detects objects in the
Once) to identify handguns and rifles, with training on the real world and in real-time, matches the results with the correct
data set of 7801 images, through a camera, implemented on annotations, and tries to improve the working capability. The
Raspberry Pi/Jetson Nano. The GUI (Graphical User Interface) more iterations it takes the better it gets and moves closer to
can be operated from a computer system (Web Portal, created or becomes even better than human surveillance.
using HTML-CSS) as well as on a mobile android application
(created using android studio) therefore, giving the project With the adoption of deep learning and CNN (Convolutional
the name of ‘MULTIPLATFORM’. Whenever the weapon is Neural Network), we can classify and localize an object in a
detected, the system sends the screenshot, along with an alert frame captured by a camera, i.e., object detection [1]. Multiple
via NODEMCU (ESP8266) on the web portal of manager/user’s algorithms are available for object detection each having
system, which gives two options; either discard the request with its own efficiency and working capability. However, YOLO
the green button or affirm the threat with the red button.
Once the manager/user confirms the situation as dangerous, the (You Only Look Once) is weighed as the best and fastest
information is sent to nearby concerned authorities i.e., police algorithm available at this time in the world. Most researchers
stations via a message/call using the GSM module. If the alert and AI enthusiasts prefer YOLO due to its diversity and
is discarded, everything resets to its original state. Also, if the lesser time of inference [2]. There are different versions of
manager doesn’t respond to the system within 15 seconds, the YOLO available now, i.e., YOLOv3, YOLOv4, and YOLOv5.
concerned authorities will be notified accordingly. In the end,
a fake scenario of robbery was depicted in which the developed Furthermore, YOLOv5 has its sub-versions i.e., YOLOv5n,
system detected the weapon hence successfully demonstrating the YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x which are
working. nano, small, medium, large, and extra-large respectively [3].
Index Terms—CNN, YOLOv5, Weapon Detection, ESP8266, Unlike other surveillance systems, this system uses the state-
SIM900A of-the-art YOLOv5s model which aids in providing the real-
time detection of weapons and uses IoT to help the user to
analyze the situation and to generate alerts to notify nearby
I. I NTRODUCTION security stations if the need arises. The system also helps
Security systems are used by many people and organizations security agencies, private security organizations, and police
in their homes, offices, banks, and other commercial areas, to mobilize more actively and in a more meaningful way by
meaning that they are aware of some form of overt surveil- providing them with the exact location and details of the place
lance. However, there are some types of surveillance systems of the incident. The real-time accurate detection shows the
high significance of the system. The system will also help
978-1-6654-5992-1/22/$31.00 ©2022 IEEE to save human lives by connecting to concerned authorities

Proc-37

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:33:27 UTC from IEEE Xplore. Restrictions apply.
in time. Initially, the frame is captured via a CCTV camera of the bounding box), Bh and Bw (height and width of the
which is then used to find the weapon; handgun, or rifle in it bounding box), and the “C” is the number of classes.
using a microprocessor which can be a computer with aided  
Pc
GPU or Jetson Nano. After the weapon has been detected alert  Bx 
will be generated on the NodeMCU web server which will  
 By 
indicate to the user that there has been an intruding activity  
 Bh 
by showing the weapon-detected frame. The user then can y= Bw
 (2)
affirm the situation or can make it safe by pressing either of  
 C1 
the two available buttons. Once the affirm ‘Alert’ button is  
 C2 
pressed the Alarm will go off and the text message will be
C3
sent to the concerned authorities i.e., police stations, counter-
terrorists agencies, etc. But if the ’Safe’ button is pressed This is just for a single anchor box, if there are multiple anchor
then the system will reset and discard the frame. The system boxes then the total volume will be multiplied by the total
also saves the detected frames in the given directory hence number of anchor boxes.
providing the facility of accessing the data after the incident. For example, if the grid is 3 by 3, and there are 8 parameters
present in the output vector, so the vector volume is 3x3x8 for
II. L ITERATURE R EVIEW one image having one object. Considering multiple objects or
classes in a single image corresponds to the feature map output
A. Object Detection Algorithms vector of volume as per the total number of objects [8]. Each
object will have its own vector.
There are numerous object detection algorithms available
in the pool of AI today. R-CNN, HOG, Fast R-CNN, R-
FCN, Faster R-CNN, SSD, SPP-net, and YOLO are the
most efficient and popular algorithms that are used by many
researchers around the world [4]. However, after doing a
comparative analysis YOLO’s latest versions show relatively
higher mAP (mean average precision) and FPS (frames per
second), in the detection of an object. According to researchers
at Facebook AI Research, the combined formation of YOLO
is very fast in a way [2]. The basic model processes real-
time images at 45 fps (frames per second), while the smaller
version of the network, Fast YOLO processes 155 fps while
doubling the mAP of other real-time detectors. This algorithm
transcends other identification methods, including R-CNN and
DPM, in which it is produced from natural images to other
domains such as artwork. Previously people used techniques
such as sliding window object detection, R-CNN, Fast-RCNN, Fig. 1. CNN general flow
and Faster R-CNN [5]. After the invention of YOLO in 2015,
it has become common in industries for object detection due Intersection Over Union (IOU) is considered the base pa-
to its speed and accuracy [6]. rameter on which the YOLO works, IOU is just the area of
YOLO works based on CNN. CNN gets an input image intersection over an area of the union of two bounding boxes
and generates a feature map output in the form of a vector, from which one is ground truth and the other is predicted by
which has some dimensions corresponding to the grid size the algorithm. For the prediction to be correct the IOU should
and the parameters i.e., a number of classes, bounding box be equal to or greater than “0.5” kept for standard algorithms
width/height, class number, the coordinates for the center [9].
Areaof Intersection
of the bounding box, probability of the object detected and IOU = (3)
the probabilities of other possible classes. These parameters Areaof U nion
determine the depth of the feature map. The volume of the Fig. 2 left side shows blue (ground truth) and orange (al-
feature vector can be calculated by multiplying the grid with gorithm predicted) bounding boxes, where the yellow lines
the number of parameters [7]. indicate the area of union and the red lines indicate the area
of intersection. The more the IOU value the more accurate the
S × S × (5 + C) (1) prediction is [10].
Fig. 2 right side shows multiple bounding boxes with a
where SxS is the grid size, default is 7x7, and 5 corresponds certain “Pc” value. Non-Max Suppression: the YOLO typically
to a number of parameters i.e., Pc (Probability of the detected uses 19x19 grid cells, there is a good chance that one single
object from 0 to 1), Bx, and By (coordinates of the midpoint object can have multiple predicted bounding boxes, therefore,

Proc-38

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:33:27 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. IOU determination from ground truth and predicted bounding boxes
[11]

this problem is corrected by non-max suppression, for every


bounding box there is a “Pc” value (presence/probability of
object within the bounding box), the non-max suppression
takes the maximum value of “Pc” and consider that box while
comparing the IOU value with the other bounding boxes so, if
it is greater than 0.5, this means that those boxes are bounding
the same object, therefore, discarding those extra boxes. If the Fig. 3. YOLOv5 Architecture [12]
IOU value is less than 0.5 then the algorithm assumes that the
other boxes are bounding some other object but there is an
overlap. difference in file size. YOLOv4 uses .cfg for configuration
Anchor Boxes: If there are multiple objects of different whereas YOLOv5 uses .yaml [7].
classes in a single image then to represent the output vector, The YOLOv5 architecture consists of three parts: Head,
multiple anchor boxes are used, which just replicates the vol- Neck, and Backbone. Backbone comprises of CSPDarknet,
umes of individual object output feature map. The following Neck comprises of PANet, and the Head comprises of the
image shows there are two anchor boxes of different classes YOLO Layer [14]. The CSPDarknet takes the input image for
[7]. feature extraction and then it is given to PANet for feature
fusion after that the head layer i.e., YOLO Layer produces
B. Comparison Between YOLOv4 and YOLOv5 the detection results: class, score, location, and size [15].
Version 4 of YOLO uses the latest BoF (bag of freebies) III. M ETHODOLOGY
and multiple BoS (bag of specials). Each of them improves A. Dataset
the accuracy of the detector with no impact on the inference
time, however, they only affect the training cost, by increasing The dataset comprised of 7801 images of guns having two
it [3]. BoS increases the inference time a little but shows great classes: handgun and rifle. 80-20 split was done for training
improvements in the accuracy of the object. The YOLOv4 used images and validation images, from which training contained
CSP (Cross-Stage Partial connections), the latest backbone for 6240 images with 7347 instances of both labeled weapon
CNN to enhance learning capability. Also, CmBN (Cross mini- classes in which 4384 instances belonged to rifle and 2963
Batch Normalization) is used to segregate the batches into belonged to handgun respectively. The validation set contained
sub-batches. The non-monotonic neural activation function is 1561 images with 1904 instances of both labelled weapon
self-regularized known as Mish-Activation. Self-Adversarial classes in which 1128 belonged to rifle and 776 belonged
Training (SAT) shows a data augmentation approach that to handgun. Dataset was acquired using numerous sources
works in backward and forward stages. A technique is used like google images, movies, and CCTV images [10]. The
for augmentation that combines the four training images into a model was trained on Google Colab using YOLOv5 GitHub
single image, which is called mosaic data augmentation. Drop repository.
block regularization is used for finer regularization in CNN. B. Training Results
CIoUs loss is used to achieve accuracy and speed for bounding
After 3 hours of prolong training on Google Colab using
box regression problems [10]. Whereas version 5 of YOLO is
100 epochs, Image size 640 and batches of 16 we achieved
implemented on PyTorch rather than Darknet, which was used
following results: The Fig. 4 shows precision of handgun and
in previous versions of YOLO. YOLOv4 and YOLOv5 have
rifle with respect to the confidence of the model. Precision
a PA-Net neck and CSP backbone [13]. The improvement in
increased with the increase in confidence. As precision can be
YOLOv5 is of auto-learning bounding box anchors and mosaic
calculated as:
data augmentation [9]. According to Roboflow, while running
on Tesla P100 GPU, 140 FPS was achieved by the YOLOv5 TP
P recision = (4)
model whereas when the same parameters were considered for TP + FP
YOLOv4 only 50 FPS were achieved [6]. Also, YOLOv5 has Where, “TP” is number of true positives, “FP” is number
small-weight files compared to YOLOv4 with around a 90% of false positives. This indicates that the model is predicting

Proc-39

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:33:27 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Precision Curve Fig. 6. PR Curve

good number of true positives when all positive predictions


are considered.
Similarly, with recall as can be seen in Fig. 6, decreases
with the increase in confidence.

Fig. 7. F1 Curve

value i.e., 0.86. F1 is basically harmonic mean of precision and


recall.
Fig. 5. Recall Curve
As it can be seen in TABLE III-B [10] that the model
trained on “YOLOv5s” gave relatively good performance
TP
Recall = (5) than the YOLOv4 and YOLOv3. F1 scores show the system
TP + FN
Here, ”FN” is number of false negatives. This indicates
that if the model’s recall is high then it is predicting the TABLE I
C OMPARATIVE M AP VALUES OF YOLO VERSIONS 3, 4, AND 5
actual objects more and not missing any object present in the
frame. High recall is necessary in implementation of weapon YOLO Versions mAP Values Dataset
detection as the system must identify every possible weapon YOLOv5 87.8% Weapon Dataset
YOLOv4 84.85% Weapon Dataset
in the frame. YOLOv3 77.30% Weapon Dataset
The Fig. 6 shows PR curve which indicates the systems a Models were trained in similar environment

performance at threshold of 0.5 i.e., 50%. The AP (average


precision) of handgun achieved at this threshold is 84.1% and
the AP of rifle is 91.6%. The mAP (mean average precision) performance, YOLOv5s having the highest score. Similarly,
is 87.8%. The sweet point in graph is taken such that the the Precision and Recall of YOLOv5s is greater than the
precision and recall both are high at that spot. previous versions of YOLO. YOLOv5s was preferred from the
The Fig. 7 shows the system performance with respect to the other existing models due to its lesser inference time. Results
confidence values. We see a high F1 score at 0.272 confidence were increased by increasing the epochs i.e., 100.

Proc-40

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:33:27 UTC from IEEE Xplore. Restrictions apply.
TABLE II • Arduino UNO
P RECISION , R ECALL , F1 AND M AP VALUES OF YOLO V ERSIONS 3, 4 • Laptop (CPU: AMD FX-9800P RADEON R7, 12 COM-
AND 5
PUTE CORES 4C+8G, 2.70 GHz)
Measure YOLOv5 YOLOv4 YOLOv3 • SIM 900A GSM Module
Precision 88% 85% 84% • Camera (webcam built-in)
Recall 83% 78% 71%
F1 86% 82% 77% • Alarm
mAP 87.8% 84.85% 77.30% • Red and Green LED
a Models were trained in similar environment
• NPN Transistor
• Resistors 100 Ohms

C. Web Portal and Mobile Application


To deliver alerts of weapon detection a web portal and
a mobile application were developed. Initially NodeMCU
web-server was used to show a basic GUI (Graphical User
Interface) as shown in Fig. 8. WifiManager library was used to
make the system robust, so that user can connect to any local
network of choice. The multi-platform Surveillance System

Fig. 9. Circuit Diagram

The model was deployed on local environment i.e., on laptop


due to unavailability of Jetson Nano, using PyCharm IDE for
Fig. 8. NodeMCU Web-server Python. After the detection of weapon, Arduino UNO was
alerted as it was connected serially with the laptop. After the
offers the service of a Web Portal in the WeSecure system. Arduino gets high “1” value in the 1x2 integer array, The red
The web portal has been designed using HTML, CSS, and led turns on and the green LED turns off. 1x2 integer array
JavaScript. The front end has been designed using the CSS was used to access both the LEDs. Arduino UNO sends a high
and the whole back end programming is done on HTML and value to NodeMCU digital pin which then generates alert on
JavaScript. The JavaScript has been used in the login system the web-server. User has two options on web-server, either to
for the WeSecure portal. User can either generate an alert to discard that alert and reset the system or affirm the alert to
notify the nearby security stations or can make the situation send notifications to nearby authorities in a text message with
safe to reset the system. The only limitation of the web portal location of the site using GSM (Global System for Mobile)
is that it is Static. This prohibits the WeSecure Web Portal to Module. The detected frames are then saved in the respective
get the real-time images of the detected frames. directory in the laptop for future use.
The WeSecure Android Application has been developed in In Fig. 10 the resistors are used as voltage dividers whereas
the Android Studio. Android studio provides the best GUI the transistor is used to pull the reset pin of Arduino to ground
and is user friendly to code and easy to understand. The Java when the user wants to reset the system.
directory has been used for the back-end programming of the
android application and XML is used for designing the front IV. R ESULTS AND D ISCUSSION
end. After logging in, the user can access the detected frames The system used real-time frames, captured by the laptop
of weapons in real time. The directory which is used to show built-in camera. The fps (Frames Per Second) were quite low
the real time images is the Picasso library. Picasso library takes around 1 fps only due to unavailability of the GPU; the system
the URL of the images from a dynamic website/web portal and ran on the laptop’s CPU instead. The FPS can be increased
displays it in the allocated place in the android application. by running the model on a Graphic Card. Fig. 11 and Fig. 12
show the actual frames being displayed after the weapon was
D. Hardware
detected. After the system is initialized, the weapon detection
Hardware consisted of following modules: algorithm finds the weapon from the real-time frames being
• NodeMCU (ESP 8266) Wi-Fi Module captured from the camera. Once the weapon is detected the

Proc-41

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:33:27 UTC from IEEE Xplore. Restrictions apply.
integrating “Multiplatform System” to existing CCTV surveil-
lance networks will aid in more accuracy and independence
from the human factor. The model’s accuracy can be improved
by training the model on such dataset which is relevant to that
particular environment. Also, by using other such cutting-edge
object detection algorithms to be developed in near future.
The project’s scope can be increased to concealed weapon
detection by adding metal detectors or thermal cameras. Thus,
the paper provides the complete initial information regarding
the system, with the help of IoT, surveillance can be made
easy for the user with an additional benefit of increasing the
accuracy. The process of alerting the authorities is made time
efficient with just a click of a button. We hope that our work
would be beneficial in creating a safe and secure environment
for everybody.
Fig. 10. Hardware Picture
VI. ACKNOWLEDGEMENT
algorithm marks bounding box on the weapon and the frame is We want to thank our parents for their support and en-
then saved in a particular directory on the laptop. The saved couragement. Also, we wish to show our appreciation to our
frame is then showed on laptop’s display for the user and colleagues and teachers who were always there when needed.
alerts being generated on the web portal consequently. Now R EFERENCES
as per the user’s input to the application as shown in Fig. 8,
[1] S. Narejo, B. Pandey, C. Rodriguez, M. R. Anjum et al., “Weapon
the system can alert the relevant authorities or reset if declared detection using yolo v3 for smart surveillance system,” Mathematical
”safe”. Problems in Engineering, vol. 2021, 2021.
[2] U. Nepal and H. Eslamiat, “Comparing yolov3, yolov4 and yolov5 for
autonomous landing spot detection in faulty uavs,” Sensors, vol. 22,
no. 2, p. 464, 2022.
[3] “Ultralytics/yolov5: Yolov5 in pytorch onnx coreml tflite.” [Online].
Available: https://github.com/ultralytics/yolov5
[4] N. Ali, S. Ansari, Z. Halim, R. H. Ali, M. F. Khan, and M. Khan,
“Breast cancer classification and proof of key artificial neural network
terminologies,” in 2019 13th International Conference on Mathematics,
Actuarial Science, Computer Science and Statistics (MACS). IEEE,
2019, pp. 1–6.
[5] A. U. Islam, M. J. Khan, M. Asad, H. A. Khan, and K. Khurshid, “ivi-
sion hhid: Handwritten hyperspectral images dataset for benchmarking
hyperspectral imaging-based document forensic analysis,” Data in Brief,
vol. 41, p. 107964, 2022.
[6] J. Solawetz, “What is yolov5? a guide for begin-
ners.” Nov 2022. [Online]. Available: https://blog.roboflow.com/
Fig. 11. Detected Frames yolov5-improvements-and-evaluation
[7] S. Gutta, “Object detection algorithm-yolo v5 architecture,”
Aug 2021. [Online]. Available: https://medium.com/analytics-vidhya/
object-detection-algorithm-yolo-v5-architecture-89e0a35472ef
[8] M. Asad, Z. Halim, M. Waqas, and S. Tu, “An in-ad contents-based
viewability prediction framework using artificial intelligence for web
ads,” Artificial Intelligence Review, vol. 54, no. 7, pp. 5095–5125, 2021.
[9] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Op-
timal speed and accuracy of object detection,” arXiv preprint
arXiv:2004.10934, 2020.
[10] T. S. S. Hashmi, N. U. Haq, M. M. Fraz, and M. Shahzad, “Application
of deep learning for weapons detection in surveillance videos,” in
2021 International Conference on Digital Futures and Transformative
Technologies (ICoDT2). IEEE, 2021, pp. 1–6.
[11] I. Ul Hassan, R. H. Ali, Z. Ul Abideen, T. A. Khan, and R. Kouatly,
“Significance of machine learning for detection of malicious websites
Fig. 12. Concatenated Frames on an unbalanced dataset,” Digital, vol. 2, no. 4, pp. 501–519, 2022.
[12] R. Xu, H. Lin, K. Lu, L. Cao, and Y. Liu, “A forest fire detection system
based on ensemble learning,” Forests, vol. 12, no. 2, p. 217, 2021.
[13] J. Nelson and J. Solawetz, “Responding to the controversy about
V. C ONCLUSION AND F UTURE W ORK yolov5,” Roboflow Blog [online], vol. 2020, 2020.
[14] R. Xu, H. Lin, K. Lu, L. Cao, and Y. Liu, “A forest fire detection system
Our paper demonstrates the working and implementation of based on ensemble learning,” Forests, vol. 12, no. 2, p. 217, 2021.
such a system which proves to be a major requirement in the [15] D. Thuan, “Evolution of yolo algorithm and yolov5: The state-of-the-art
surveillance industry. Not only weapons but any object can be object detention algorithm,” 2021.
detected by training the model with its provided dataset. By

Proc-42

Authorized licensed use limited to: VIT University. Downloaded on December 01,2024 at 06:33:27 UTC from IEEE Xplore. Restrictions apply.

You might also like