Combined Minor
Combined Minor
1. INTRODUCTION
1.1 Facial Recognition Systems
1.2 Workflow
2. BACKGROUND FOR THE PROJECT
2.1 Introduction
2.2 Challenges of traditional methods
2.3 Literature review
2.4 Motivation for the project
2.5 Novelty for the project
2.6 Research Gaps
3. WORKFLOW
3.1 Hardware design workflow
3.2 Key advantages of hardware design
3.3 Integration challenges and solutions
3.4 Image processing
3.5 Image Training
3.6 Image detection
4. IMPLEMENTATION
4.1 Data collection
4.2 Data preprocessing
4.3 ROI based feature extraction
4.4 Model architecture
4.5 Key advantages of implementation
5. RESULTS AND DISCUSSION
5.1 Evaluation metrics
5.2 Training phase
5.3 Testing phase
5.4 Interpretation
6. CONCLUSION AND FUTURE WORK
FIGURE TABLE
Figure 17 Demonstration
1. INTRODUCTION
Facial recognition systems are advanced biometric solutions that identify and verify individuals by
analysing unique facial features. These systems are widely used in security, authentication, and
surveillance due to their contactless and efficient nature. The subsequent sections provide a detailed
overview of facial recognition workflows, including data acquisition, preprocessing, feature
extraction, and classification, along with insights into system performance, challenges, and future
improvements.
1.1 FACIAL RECOGNITION SYSTEMS
The primary purpose of this project is to design and implement a facial recognition system that
enhances security, improves user experience, and demonstrates the practical application of biometric
authentication in various domains. With the growing reliance on digital systems and sensitive data,
traditional security measures such as PINs, passwords, and physical keys have proven insufficient in
addressing modern security challenges. This project aims to address these shortcomings by
leveraging facial recognition technology as a robust, efficient, and user-friendly solution for
authentication and access control.
Another important purpose of the project is to demonstrate the efficiency and convenience of
contactless and non-intrusive authentication. In today’s fast-paced world, users demand solutions that
are not only secure but also seamless and easy to use. This project highlights the ability of facial
recognition technology to meet these demands. Unlike fingerprint scanning or physical tokens, facial
recognition requires no physical interaction with devices, ensuring a hygienic and hassle-free
process. Users can be authenticated simply by looking at a camera, making the system both user-
friendly and time-efficient.
Additionally, the project aims to showcase the versatility of facial recognition technology across a
wide range of applications. From securing financial transactions in the banking sector to enhancing
surveillance systems in public safety, the potential use cases are vast. By implementing a practical
facial recognition system, this project seeks to demonstrate its applicability in real-world scenarios,
such as access control in workplaces, secure device unlocking, and identity verification in law
enforcement. These applications highlight the transformative potential of facial recognition
technology in improving operational efficiency and reducing reliance on vulnerable traditional
methods.
A key focus of the project is also to address potential challenges and limitations associated with
facial recognition technology. Issues such as privacy concerns, data protection, and the ethical use of
biometric data are critical considerations. This project aims to implement best practices for data
handling, ensuring that the system is compliant with privacy regulations and respects users’ rights.
By integrating transparency and security measures, it seeks to build trust in the technology and
promote its responsible use.
Ultimately, this project is driven by the goal of creating a facial recognition system that balances
security, efficiency, and ethical considerations. By providing a secure, accurate, and user-friendly
authentication solution, the project contributes to advancing biometric technology as a reliable
alternative to traditional methods. Through practical implementation, it demonstrates the potential of
facial recognition to revolutionize security systems and meet the evolving needs of modern society.
1.2 WORKFLOW
5
2. BACKGROUND FOR THE PROJECT
Facial recognition systems offer efficient and contactless authentication solutions, with their
applications and workflows discussed in subsequent sections.
2.1 INTRODUCTION
The rapid evolution of technology has fundamentally changed the way people interact with systems
and safeguard sensitive information. In recent years, the growing reliance on digital platforms, cloud
computing, and connected devices has amplified the need for robust security mechanisms.
Traditional security methods such as passwords, PINs, and physical keys, while still widely used,
have shown critical vulnerabilities. Passwords can be easily guessed, hacked, or forgotten, while
physical tokens can be stolen, duplicated, or misplaced. These limitations have driven the demand for
more advanced, reliable, and user-centric solutions, leading to the emergence of biometric
technologies like facial recognition.
Facial recognition technology is based on identifying and verifying individuals by analyzing their
unique facial features. Unlike other biometric methods such as fingerprint or retina scanning, facial
recognition is non-intrusive and does not require physical interaction with devices, making it both
convenient and hygienic. This has made it a preferred choice for many applications, ranging from
personal device security to large-scale surveillance systems. The roots of facial recognition
technology can be traced back to the 1960s, with early research focusing on manually identifying
facial landmarks. Over the decades, advancements in computer vision and artificial intelligence (AI)
have transformed it into a highly sophisticated tool capable of real-time identification and
authentication.
The adoption of facial recognition technology has expanded significantly across various sectors. In
the banking and financial industries, it is used to verify customers' identities, prevent fraud, and
enable secure transactions. Law enforcement agencies rely on it to identify suspects, solve cases, and
enhance public safety. The technology has also found widespread use in consumer electronics, such
as smartphones and laptops, providing users with a seamless way to unlock devices and access
personal data securely.
Despite its growing popularity, the implementation of facial recognition is not without challenges.
Privacy concerns, data security, and ethical considerations are among the most pressing issues
associated with this technology. Unauthorized use of facial data, biases in algorithms, and potential
misuse for surveillance have raised questions about its impact on individual rights and freedoms.
These concerns have spurred research into improving the fairness, accuracy, and transparency of
6
facial recognition systems while ensuring compliance with data protection regulations such as GDPR
and CCPA.
This project is built on the premise of addressing both the opportunities and challenges of facial
recognition technology. It aims to develop a secure, efficient, and user-friendly system for
authentication and access control, highlighting the practical applications of the technology in real-
world scenarios. By integrating advanced AI algorithms and adhering to ethical practices, the project
seeks to demonstrate the potential of facial recognition to enhance security while maintaining user
trust and privacy.
The project's background is rooted in the increasing need for robust security solutions in an
interconnected world. It acknowledges the growing threats to traditional authentication methods and
leverages the unique strengths of facial recognition to create a system that is not only secure but also
adaptable to a wide range of applications. By addressing the challenges and showcasing the benefits,
the project aims to contribute to the responsible advancement of biometric technology.
7
Sr No. Title Year Dataset Pre-Processing Model Results
Techniques
1. Viola, P., & Jones, M. (2001). A 2001 Self- None (as Haar Proposed a highly
Revolutionary Approach to Real- constructed detailed in the Cascade efficient and
Time Object Detection Using dataset of literature) Classifier computationally
Integral Images, Boosted Cascade
faces and non- Multi-site lightweight
Classifiers, and Simple Features.
faces Harmonization framework for
Data object detection,
Normalization capable of
achieving real-time
detection speeds
with high accuracy.
2. Turk, M., & Pentland, A. (1991). 1991 Yale Face PCA for PCA for Applied PCA for
The Birth of Eigenfaces: Principal Database dimensionality dimensionalit face recognition,
Component Analysis-Based reduction y reduction innovatively
Dimensionality Reduction for Face
projecting images
Recognition Applications
Spatial into a reduced
Normalization subspace for
efficient
Temporal classification.
Filtering
3. Zhang, K., Zhang, Z., Li, Z., & Qiao, 2016 Database 92 Functional MTCNN Applied PCA for
Y. (2016). A Comprehensive individuals Connectivity face recognition,
Multitask Model for Simultaneous with Typical Analysis innovatively
Face Detection and Alignment
Development projecting images
Using Cascaded Convolutional
(TD) Augmentation into a reduced
Networks
via DCGAN subspace for
efficient
classification.
4. Ahmad, I., & Aftab, S. (2019). Face 2009 ABIDE Brain Atlas- Bidirectional Accuracy:
Recognition Utilizing MATLAB with based Long Short- CC200 Atlas:
a Hybrid PCA and LDA Framework Parcellation Term Memory 70.92%
for Enhanced Dimensionality
Bidirectional AAL Atlas: 68.72%
Reduction and Discriminative
Long Short-
Power
Spatial- Term Memory
Temporal Data Graph
Representation Convolutional
Networks
(GCNs)
5. Deng, X., Zhang, L., Liu, R., Chen, 2023 ABIDE I/II Atlas-based Bidirectional Accuracy:
Z., & Xiao, B. (2023). ST-ASDNET: A Parcellation Long Short- CC200 Atlas:
BLSTM-FCN-Transformer based Time-Series Term Memory 70.26%.
ASD Classification Model for Time-
Extraction: Transformer AAL Atlas: 67.80%
series fMRI [22].
Extracted Full
signals Convolutional
parcellated Network
8
region to Transformer
create spatial- (FCN-
temporal data Transformer
representation
s.
9
o b. The system demonstrates the feasibility of implementing advanced technologies on
resource-constrained devices.
2. Real-Time Face Detection on Embedded Systems:
o a. The project leverages lightweight algorithms optimized for edge devices to deliver
real-time performance without relying on external servers.
o b. Preprocessing techniques, such as noise reduction and edge detection, enhance
detection accuracy even in constrained environments.
3. Scalable and Modular Design:
o a. The system's modular architecture allows integration with other security systems,
such as solenoid locks and alarm triggers.
o b. It provides a foundation for expanding capabilities, such as emotion recognition or
mask detection, in the future.
This project demonstrates the novelty of combining advanced computer vision techniques with cost-
effective hardware, providing a practical and scalable solution for real-time face detection
applications.
3. WORKFLOW
We have followed a sequential approach from preprocessing the data to implementing the hardware.
The workflow followed in this project is summarized in the below image.
10
Fig. 3 Proposed Methodology
Processor:
The ESP32 microcontroller features dual-core processing power with a clock speed of up to
240 MHz. It supports multiple peripherals and handles communication with external
components efficiently.
Camera Module:
The OV2640 camera provides a resolution of up to 2 megapixels and supports various
11
resolutions like VGA, SVGA, and UXGA. It is optimized for low-light environments and
wide-angle applications, ensuring clear images in diverse conditions.
Onboard Flash:
It includes a built-in flash for illumination in low-light settings, which can be controlled
programmatically. This feature enhances image quality during face detection.
Connectivity:
o Wi-Fi: Enables wireless data transmission to external servers or applications.
Applications:
o Captures real-time facial images.
o Sends processed data to external devices or servers via Wi-Fi for further analysis.
12
o Converts input voltage (ranging from 7-12V) to a stable 5V output suitable for the
ESP32-CAM.
o Protects components from voltage surges or drops.
Capacitors:
o Electrolytic Capacitors: Smooth out the DC supply and minimize ripple voltage.
3. Relay Module
The relay module acts as an intermediary between the ESP32 and high-power hardware components
such as solenoid locks or alarms. It provides electrical isolation and allows the ESP32-CAM to
control devices operating at different voltage levels.
Output Side:
o Switches higher voltage devices (e.g., solenoid locks operating at 12V).
Applications:
o Controls solenoid locks for access management.
13
4. UART TTL Programmer
The UART TTL programmer is a vital tool for flashing firmware onto the ESP32-CAM module and
establishing serial communication for debugging and monitoring.
Purpose:
o Uploads firmware and sketches to the ESP32-CAM.
Connections:
o TX/RX Pins: Connect to the RX/TX pins of the ESP32-CAM for data transmission
and reception.
o GND: Common ground between the ESP32-CAM and the programmer.
Mode Selection:
o The ESP32-CAM must be set to bootloader mode for flashing. This is achieved by
connecting the GPIO0 pin to GND during reset.
Applications:
o Essential for the initial setup and configuration of the ESP32-CAM.
Types:
o Fail-safe locks: Remain locked during power loss.
14
o Fail-secure locks: Unlock automatically during power loss to prevent lockouts.
Power Requirements:
o Requires sufficient current to activate the locking mechanism, provided by the relay
and stabilized power source.
Applications:
o Used in smart doors, cabinets, or secure enclosures.
Design Features:
o Cutouts for the camera lens and LEDs for unobstructed functionality.
Applications:
o Indoor and outdoor setups for smart home security, offices, or industrial use.
15
3.2 KEY ADVANTAGES OF THE HARDWARE DESIGN
Compact and Modular: All components are integrated into a single compact design,
allowing for easy scalability.
Cost-Effective: Uses affordable components like the ESP32-CAM and simple relays.
Energy-Efficient: Optimized power management reduces energy consumption, making it
suitable for continuous operation.
Scalable: Additional peripherals, such as RFID readers or biometric sensors, can be easily
incorporated to enhance functionality.
16
o Environmental factors like walls, metal objects, and electronic noise also contributed
to packet losses.
Solution:
o Antenna Optimization: The ESP32-CAM module was oriented to ensure a direct
line of sight to the Wi-Fi router or access point whenever possible. Additionally, the
antenna's position was carefully adjusted to maximize signal strength.
o External Antenna (Optional): An external IPEX connector and antenna can be used
to improve range and signal reliability significantly.
o Reduced Interference: The system was tested in various locations to identify and
minimize sources of interference. Moving the ESP32-CAM away from high-noise
devices (e.g., microwaves, Bluetooth devices) improved performance.
2. Heat Dissipation
Prolonged operation of the ESP32-CAM, especially under high processing loads (e.g., continuous
face detection), led to overheating. This could degrade performance or even cause system instability.
Issue:
Overheating affected the ESP32 module's performance, resulting in throttling or unexpected
resets during extended usage. The compact design of the ESP32-CAM limited passive
cooling options, leading to heat buildup within the enclosure.
Analysis:
o Heat was generated primarily by the ESP32’s dual-core processor during intensive
computations, Wi-Fi transmissions, and image processing tasks.
o The absence of proper ventilation and heat dissipation mechanisms compounded the
problem.
Solution:
o Thermal Insulation: Thermal pads were placed between the ESP32 module and the
enclosure to transfer heat away from the board.
o Improved Enclosure Design: Ventilation slots were added to the enclosure to
facilitate airflow and dissipate heat more effectively.
o Placement Optimization: The ESP32-CAM was mounted in a well-ventilated area,
reducing heat accumulation.
o Optional Active Cooling: For demanding applications, a small heat sink or microfan
can be attached to the module to enhance cooling further.
17
3. Power Consumption
Consistent power delivery is critical for the ESP32-CAM to maintain stable operation. Power
fluctuations can lead to inconsistent performance, system resets, or damage to the hardware.
Issue:
Voltage drops and spikes caused by inconsistent power delivery affected the ESP32-CAM’s
performance, particularly during high-current operations like image transmission or
activating external peripherals.
Analysis:
o The ESP32-CAM requires a stable 5V supply but is sensitive to variations in the input
voltage.
o Sudden power demands from peripherals, such as activating a relay or solenoid lock,
could momentarily disrupt the power supply.
o Noise in the power line caused additional instability.
Solution:
o Voltage Regulation: A 7805 linear voltage regulator was used to ensure a steady 5V
output. This regulator efficiently handled input voltages between 7V and 12V,
providing a consistent supply to the ESP32-CAM.
o Capacitor Filtering:
Electrolytic capacitors (e.g., 470 μF) were placed across the input and output
of the regulator to smooth out voltage ripples.
Ceramic capacitors (e.g., 0.1 μF) were added to filter high-frequency noise.
o Power Distribution: A separate power line was dedicated to high-current peripherals
like the solenoid lock to reduce the load on the ESP32-CAM’s supply.
o Backup Power: For critical applications, a battery backup or uninterruptible power
supply (UPS) was recommended to handle power outages or voltage dips.
2. Face Cropping: Face cropping involves extracting the region of interest (ROI)
corresponding to the detected face. The bounding box coordinates are used to isolate the face
from the frame. This cropped region is then used for further analysis.
18
3. Image Resizing: Image resizing ensures uniformity by resizing the cropped face image to a
consistent dimension, such as [227, 227]. This step is crucial for accurate classification. A
Haar classifier is used for face detection before resizing.
4. File Naming and Saving: Each pre-processed image is saved with a unique name, like
[Link], [Link], and so on. This makes it easier to label and organize the images for
classification. The files are stored for quick and easy access later.
5. Creating dataset: Organize the saved images into directories or datasets suitable for training
and testing a classification model. Label them according to the classification categories.
Regions of interest
A Region of Interest (ROI) is a specific part of an image where the algorithm focuses to find a
particular object, like a face. Instead of analysing the entire image at once, the classifier looks at
smaller sections of the image to detect patterns or features that match the object it’s trained to
recognize.
How the classifier looks for ROI:
The Haar classifier divides the image into small rectangular windows and analyses these for
specific patterns, called Haar features, which represent variations in pixel intensity, such as
edges, lines, and corners.
When detecting objects like faces, it looks for characteristic patterns, such as dark regions for
eyes, lighter areas for the forehead, and darker regions for the mouth.
To detect objects of different sizes, the algorithm scales the window to examine both small
and large regions.
It uses a cascading process, starting with simple pattern checks (e.g., edges or contrasts) to
quickly discard non-relevant sections.
Only windows that pass initial checks undergo further analysis with more complex patterns,
making the detection process efficient and accurate.
19
About Deep Learning: Deep learning is a subset of machine learning that uses artificial neural
networks to mimic the way humans learn. These networks consist of multiple layers that
automatically extract and learn features from data, making it particularly effective for complex tasks
like image classification, natural language processing, and speech recognition. Unlike traditional
machine learning, deep learning eliminates the need for manual feature extraction because the
network learns the most relevant features directly from raw data.
About Alexnet:
AlexNet is a deep convolutional neural network (CNN) designed for image classification
tasks.
It consists of 8 layers: 5 convolutional layers followed by 3 fully connected layers.
The model uses ReLU (Rectified Linear Unit) activation functions, which helped it achieve
faster training compared to previous models.
It employs data augmentation and dropout to reduce overfitting during training.
Steps involved in training:
1. Load the pretrained AlexNet model: AlexNet is a pre-trained deep convolutional neural
network for image classification. It includes layers optimized for general image datasets, such
as ImageNet. This makes it suitable for various image recognition tasks.
2. Modifying network architecture: The layers of AlexNet are stored in [Link]. The fully
connected layer (layer 23) is replaced with a new one to match the number of classes, such as
fullyConnectedLayer(2) for binary classification. The classification layer (layer 25) is
updated for the custom classification task.
3. Load and Label dataset: The image Datastore function loads all images from the specified
directory (data storage) and automatically assigns labels based on the folder names. This
ensures the data is correctly organized for training.
4. Set Training Options: The optimizer used is Stochastic Gradient Descent with Momentum
(SGDM), which helps the network converge faster and avoid local minima. The initial learn
rate is set to 0.001, controlling how much the weights are adjusted during each update. The
20
max epochs are set to 20, meaning the network will train on the full dataset 20 times. The
mini batch size is 64, specifying how many images are processed together in one training
step.
5. Train the Network: It involves forward propagation, where input images are passed through
the network to make predictions, and backpropagation, where weights are adjusted based on
errors and gradients to minimize classification errors. This process is repeated iteratively to
improve the model's performance.
3.3 IMAGE DETECTION
After capturing and training the images, the face detection process begins by using the trained model,
such as AlexNet, to classify new images. The captured image is first pre-processed, including
resizing to the required dimensions. Then, the model is used to predict the class of the image by
passing it through the network, which processes the features learned during training. The output is
the predicted label or classification, indicating the detected object or face in the image. This process
allows real-time face detection based on the model's learned features and patterns.
21
4. Implementation Details
o Frame Rate: Balanced to support real-time detection without overloading the system.
22
Preprocessing ensures that images captured by the ESP32-CAM are optimized for accurate face
detection.
Image Resizing:
Images are resized to standard dimensions (e.g., 96x96) compatible with the detection model,
ensuring uniformity in input data.
Grayscale Conversion:
Reduces computational complexity by converting RGB images to grayscale, focusing on
facial feature detection without color data.
Noise Reduction:
Techniques like Gaussian filtering are applied to smooth the images, enhancing the signal-to-
noise ratio and reducing detection errors.
Edge Detection:
Preliminary edge detection helps localize facial boundaries, improving the face detection
algorithm's performance.
23
o Uses rectangular features for detecting patterns such as edges and textures.
Implementation:
The classifier processes grayscale images to detect faces by scanning for specific Haar-like
features. A cascade of stages progressively eliminates non-face regions, resulting in fast and
accurate detection.
24
o Loss Function: Binary cross-entropy ensures accurate classification.
Precision:
The fraction of true positives among all positive detections, indicating how reliable the
detected faces are.
25
Recall (Sensitivity):
The fraction of actual faces correctly detected, reflecting the system's ability to identify all
faces.
F1 Score:
The harmonic mean of precision and recall, providing a balanced measure of the model's
performance.
26
5.4 RESULT INTERPRETATION
The results demonstrated the efficiency and reliability of the ESP32-CAM face detection system. A
confusion matrix was generated to further analyze the performance:
True Positives: Faces correctly detected.
False Positives: Non-faces incorrectly identified as faces.
True Negatives: Non-faces correctly identified.
False Negatives: Faces missed by the system.
Limitations:
1. Hardware Constraints: The ESP32-CAM's limited processing power restricted the complexity
of the model architecture.
2. Lighting Sensitivity: Performance slightly degraded under extremely low-light conditions.
3. Generalization: Testing was limited to locally collected data; broader datasets may reveal
additional challenges.
27
Fig. 17 MATLAB face detection and demonstration
28
improve the system's robustness and generalizability. Deploying the system in real-world
settings, such as offices, homes, and public spaces, can provide additional insights into its
performance across different scenarios.
2. Integration with Advanced Models
Exploring more advanced face detection algorithms, such as TensorFlow Lite
implementations or models like YOLO (You Only Look Once), could enhance detection
speed and accuracy. These models, optimized for edge devices, could address complex
scenarios, such as detecting multiple faces in crowded environments.
3. Low-Light Performance Enhancement
Addressing the limitations in low-light conditions by integrating infrared (IR) cameras or
utilizing advanced preprocessing techniques, such as histogram equalization and noise
suppression, can ensure consistent performance in poorly lit environments.
4. Multifunctional System Development
Beyond face detection, the system could be extended to include emotion recognition, mask
detection, or facial attribute analysis. This would broaden the scope of applications, making it
suitable for industries like healthcare, retail, and public safety.
5. Energy Optimization for Long-Term Operation
Optimizing the power management system for long-term use, including integrating solar
panels or high-efficiency batteries, could make the system more sustainable and ideal for
remote locations with limited power access.
6. Real-World Deployment
Developing a comprehensive, end-to-end product that includes hardware, firmware, and user-
friendly software interfaces would facilitate the system's deployment in real-world
applications. For example:
o Access Control: Integrating the face detection system with solenoid locks and cloud-
based authentication for secure smart lock systems.
o Surveillance: Deploying the system in smart security cameras for live monitoring and
automated alerts.
7. Edge Computing and Cloud Integration
Enhancing the system's capabilities by enabling hybrid edge-cloud architectures. The ESP32-
CAM can handle local processing for immediate responses, while the cloud can provide
advanced analytics, storage, and machine learning model updates.
8. Multifaceted Security Integration
The system can be combined with other security modalities, such as RFID and biometric
sensors, to create multi-factor authentication systems for improved access control.
9. Multiclass Classification
Upgrading the system to recognize multiple faces simultaneously or classify specific
attributes (e.g., age group, gender) will provide nuanced insights and expand the system's
utility.
29
10. User Feedback Loop
Incorporating feedback mechanisms to allow users to provide input on incorrect detections
can improve model accuracy over time. The system can learn and adapt dynamically, further
refining its performance.
6.2 Long-Term Vision
The advancements in face detection and related technologies can pave the way for a wide range of
applications across industries. The ultimate goal is to transition from a prototype to a market-ready
solution that combines affordability, scalability, and superior performance. By continuing to innovate
and address current limitations, this system has the potential to become a cornerstone of intelligent
automation and modern security systems.
30