Project Report
Project Report
PROJECT REPORT
ABSTRACT
The rise of Deepfakes, synthetic media generated using Generative Adversarial Networks
(GANs), presents significant challenges in integrity verification due to their increasing
realism. Traditional detection methods based on pixel anomalies are becoming less
effective as Deepfakes evolve. To address this, we propose DeepVision, an algorithm
leveraging insights from medicine, biology, brain engineering, and machine learning to
detect Deepfakes through analysis of eye blinking patterns.
Human eye blinks are influenced by various factors, including gender, age, activity, and
time of day. By collecting and analyzing data on these factors, DeepVision predicts the
number and frequency of eye blinks expected under specific conditions. This approach
offers a unique solution for integrity verification, as eye blinking patterns are involuntary
and reflect cognitive activities and behavioral factors.
We have checked plagiarism for our Project Report for our project a Turnitin. We are
thankful to our mentor Er. Rahul Agarwal for guiding us at this. Below is the digital
receipt. The Plagiarism is approximately 10%.
LIST OF FIGURES
CANDIDATES’ DECLARATION i
ACKNOWLEDGEMENT ii
ABSTRACT iii
PLAGIARISM REPORT iv
LIST OF FIGURES v
LIST OF TABLES vi
LIST OF ABBREVIATIONS vii
1. INTRODUCTION
1. Background of the Problem 1
2. Literature Survey 4
3. Problem Statement 6
7
4. Motivation
8
5. Feasibility
1.1 Background
The proliferation of deepfake videos has emerged as a significant challenge for both
individuals and society at large. Deepfakes, created using Generative Adversarial Networks
(GANs), can manipulate or fabricate video and audio content to appear convincingly real.
This technology is increasingly being used for malicious purposes, including spreading
misinformation, creating non-consensual explicit content, and committing fraud. As a
result, the need for effective deepfake detection has become critical to ensure the integrity
of digital media and protect individuals from potential harm.
Human eye blinking patterns are known to vary significantly according to an individual’s
overall physical conditions, cognitive activities, biological factors, and information
processing levels. For instance, gender and age, the time of day, or a person’s emotional
state or degree of alertness can all influence blinking patterns. DeepVision performs
integrity verification by tracking significant changes in these patterns, using a heuristic
approach based on research results from various fields.
The proposed method involves analyzing the period, frequency, and duration of eye blinks
to detect anomalies. The model has demonstrated an overall average accuracy rate of 80%
in detecting deepfakes across various video types, suggesting it can overcome the
limitations of traditional pixel-based verification algorithms. By incorporating
demographic variables and activity types into the analysis, DeepVision enhances its
detection accuracy, acknowledging that blinking patterns fluctuate with age, gender,
activity engagement, and diurnal cycles.
DeepVision aims to provide a reliable tool for identifying deepfakes, particularly in areas
with limited access to advanced technology or expert analysts. By offering an effective
solution for early detection, DeepVision can help mitigate the impact of deepfakes and
contribute to the overall integrity of digital media. This represents a significant
advancement in the fight against digital misinformation and malicious content, improving
the security and trustworthiness of media consumed by the public.
2
DeepVision offers a robust framework for enhancing the integrity and security of digital
media.
The detection of deepfake videos has become increasingly crucial in today's digital
age, where advancements in Generative Adversarial Networks (GANs)[1][5] have made it
challenging to discern between real and synthetic content. One promising avenue for
deepfake detection lies in analyzing human eye blinking patterns, which are influenced by
a myriad of factors such as gender, age, activity, and time.[2] These factors significantly
impact blinking frequency, with differences observed between adults and infants, as well as
variations throughout the day, particularly peaking around nighttime.
The architecture encompasses a pre-process phase where input data, including gender,
age, activity type (dynamic or static), and time (A.M. or P.M.), is fed into the system as
critical parameters for verifying changes in eye blinks . This pre-processing stage lays the
foundation for subsequent measurements conducted by the Target Detector and Eye
Tracker components.[2]
The Target Detector module within the model utilizes the dlib face detection algorithm
for landmarks localization, pose estimation, and gender recognition. This algorithm's
robustness enhances It’s ability to accurately detect facial features, setting the stage for
precise eye tracking. The Fast-HyperFace algorithm's high detection performance, coupled
3
with its gender recognition capabilities, contributes significantly to the comprehensive
analysis of eye blinking patterns.[2]
Moreover, the integration of the EAR (Eye Aspect Ratio) algorithm[2] within the Eye
Tracker component further refines its ability to detect eye blinks. While the Fast-
HyperFace algorithm excels in overall facial detection, the EAR algorithm specializes in
eye blinking detection. It synergistically leverages the strengths of both algorithms,
effectively enhancing the detection performance and accuracy of identifying deepfakes.
4
3. Preserving Authenticity in Digital Content:
Deepfake technology can be misused to create forged content that can harm
individuals, tarnish reputations, and manipulate public perception. Ensuring the
authenticity and integrity of digital content is paramount for upholding ethical
standards and trust in digital media platforms.
1.4. Motivation
● The motivation behind this deepfake detection project stems from the increasing
prevalence and sophistication of deepfake technology, which poses significant
threats to digital media integrity and trust.
● As deepfake techniques become more accessible and advanced, there is a growing
concern about their potential misuse for spreading misinformation, manipulating
5
public opinion, and causing harm to individuals and organizations. The need for
robust deepfake detection solutions is paramount to mitigate these risks and
preserve the authenticity and credibility of digital content.
● Furthermore, the rapid evolution of deepfake technology underscores the urgency
of developing proactive measures to detect and counteract its malicious use. By
leveraging cutting-edge deep learning algorithms and innovative approaches, this
project aims to empower content creators, consumers, and digital platforms with
effective tools to identify and mitigate the impact of deepfake content.
● Our goal is to provide accessible, affordable, and user-friendly deepfake detection
solutions that can be deployed across various digital platforms and settings. By
enhancing media integrity and trust, we aim to safeguard digital ecosystems and
uphold ethical standards in digital communications.
As with any successful project, it is very crucial to have a plan of whether a project is
feasible from different standpoints. The various possibilities/standpoints can be
summarized as follows.
TECHNICAL:
With the availability of powerful high-level programming languages such as Python, along
with comprehensive support for Machine Learning algorithms.
SOCIAL:
As of now, there is a noticeable gap in the availability of widely adopted solutions
specifically targeting deepfake detection. This project addresses a critical need in the
digital media landscape, making it socially relevant and potentially impactful in combating
misinformation and preserving media integrity.
ECONOMICAL FEASIBILITY:
6
The project's economic feasibility is high due to the utilization of open-source libraries,
publicly available datasets, and cloud-based computing resources. These factors
significantly reduce development costs, making the project financially viable even for
individuals or organizations with limited resources.
SCOPE:
The scope of this project extends to both professional users, such as media organizations
and content creators, as well as general consumers concerned about the authenticity of
digital content. By offering a reliable and accessible deepfake detection solution, the
project aims to enhance trust in digital media platforms and empower users to make
informed decisions regarding the content they consume and share
7
CHAPTER 2
PROPOSED SOLUTION
8
Fig 2.2 Example of artificial generated(deepfake) video as there is only one eyeblink in a minute
Additionally, the proposed solution takes into account the context of the video, such as the
subject's age, gender, and activity (e.g., reading, conversing, or performing physical tasks). This
contextual information allows for more accurate comparisons with expected blinking patterns,
further enhancing the detection capabilities.
The eye blink detection approach offers a non-intrusive and computationally efficient method for
deepfake detection, as it does not require extensive training data or resource-intensive deep
learning models. Instead, it relies on well-established computer vision techniques and statistical
analysis of blinking patterns.
9
• Compute the blink rate (blinks per minute) to assess the naturalness of the blinking
pattern.
Implementation Steps:
10
• Calculate the EAR using the formula, where p1-p6 are specific landmarks.
11
Fig 2.9 EAR graph showing multiple eyeblinks
Use Case:
• Long Videos which are greater than or equal to 15 sec in length.
Fig 2.12 Code showing when to use eye blink detection and when to use CNNs
Introduction:
Deepfake detection is a challenging task due to the rapid advancement of generative adversarial
networks (GANs) and other deep learning techniques used to create realistic manipulated
content. Our deep learning-based solution utilizes state-of-the-art convolutional neural networks
(CNNs) to analyze images and video frames for accurate deepfake detection.
One of the key advantages of this approach is its ability to learn and extract complex features
from the data itself, rather than relying on handcrafted features or specific artifacts. By training
on a diverse dataset of authentic and manipulated videos, the CNN model can learn to distinguish
subtle patterns and anomalies that may be imperceptible to human observers.
Furthermore, the deep learning-based solution can leverage transfer learning techniques, where
pre-trained models on large-scale datasets are fine-tuned on the deepfake detection task. This
approach can significantly improve training efficiency and model performance, especially when
dealing with limited data.
The proposed solution is designed to be highly scalable and adaptable, capable of handling
various types of deepfake manipulations, including face swapping, lip-syncing, and synthetic
video generation. As new deepfake techniques emerge, the model can be retrained on updated
datasets, ensuring its continued effectiveness.
13
Moreover, by analyzing individual video frames and aggregating the results, our solution can
provide frame-level and video-level classifications, allowing for detailed analysis and
localization of manipulated regions within the video.
Methodology:
The proposed deep learning-based solution for deepfake detection leverages the power of
convolutional neural networks (CNNs) to analyze images and video frames. The
methodology consists of the following key steps:
1. Data Collection and Preprocessing: A diverse dataset comprising both authentic and
manipulated videos will be collected. The videos will undergo preprocessing steps,
including resizing, cropping, and pixel value normalization, to standardize the input data
for the model.
2. Model Training: A CNN model will be trained from scratch on the preprocessed
dataset to classify videos as either real or fake. The model architecture will be carefully
designed, incorporating convolutional layers for feature extraction, pooling layers for
downsampling, and dense layers for classification.
3. Prediction: Individual frames from the video will be analyzed using the trained CNN
model to obtain frame-level predictions (real or fake). These frame-level predictions will
then be aggregated using techniques such as majority voting or temporal smoothing to
determine the overall classification of the video as either real or manipulated.
Implementation Steps:
The proposed solution will be implemented following these steps:
1. Data Collection and Preprocessing:
• We gathered a diverse dataset comprising both authentic and manipulated videos.
• The collected videos underwent preprocessing steps, including resizing, cropping,
and normalization of pixel values.
• We split the dataset into training, validation, and testing sets to facilitate model
training and evaluation.
14
• Data augmentation techniques, such as rotation, flipping, and shifting, were
applied to the training data to increase diversity and improve model generalization.
2. Model Architecture:
• We designed a CNN model architecture tailored specifically for image
classification tasks.
• The model incorporates convolutional layers, pooling layers, and dense layers to
enable effective feature extraction and classification.
3. Model Training:
• We compiled the CNN model with an appropriate loss function (e.g., binary cross-
entropy) and optimization algorithm (e.g., Adam, SGD).
• Training commenced from scratch on the training set, with constant monitoring of
performance using the validation set to prevent overfitting.
• Techniques such as early stopping and model checkpointing were implemented to
ensure the preservation of the best-performing model during training.
4. Evaluation:
• We evaluated the trained model's performance on the previously unseen test set,
which was not utilized during training.
• Various performance metrics, including accuracy, precision, recall, and F1-score,
were computed to gauge the model's effectiveness in detecting deepfakes.
• Misclassified examples and potential failure cases were meticulously analyzed to
pinpoint areas for improvement.
15
Fig 2.14 Showing various performance metrics
5. Prediction:
• Upon receiving new video inputs, we subjected the video frames to the same
preprocessing steps as the training data.
• Each video frame underwent prediction by passing through the trained model to
obtain frame-level predictions (real or fake).
• Techniques such as majority voting or temporal smoothing were employed to
aggregate frame-level predictions, thus determining the overall video
classification.
16
CHAPTER 3
TECHNOLOGY ANALYSIS
Keras is a high-level neural networks API, written in Python and capable of running on
top of TensorFlow. Keras allows for easy and fast prototyping, supports both
convolutional networks and recurrent networks, and runs seamlessly on both CPU and
GPU.
In this Deepfake detection project, TensorFlow (with Keras) is used for designing and
training deep learning models. Its extensive support for neural network operations and
easy-to-use syntax makes it an ideal choice for implementing complex architectures
required for detecting deepfakes.
17
3.2 Model Architecture
In this project, CNNs are used to detect deepfakes by learning and identifying patterns
and features that distinguish real images or videos from manipulated ones. Their
capability to automatically learn spatial hierarchies from the input data is critical for
accurately identifying subtle artifacts introduced during the creation of deepfakes.
3.3.1 OpenCV
OpenCV (Open Source Computer Vision Library) is an open-source computer vision and
machine learning software library. It contains a comprehensive set of tools for image
processing and computer vision tasks.
In this project, OpenCV is utilized for pre-processing the video frames, such as resizing,
color space conversion, and frame extraction, which are essential steps before feeding the
data into the neural network for deepfake detection.
3.3.2 dlib
dlib is a modern C++ toolkit containing machine learning algorithms and tools for
creating complex software. It includes a wide range of functionality for face detection,
object detection, and more.
18
Figure 3.3 : dlib
dlib is used in this project for face detection and facial landmark extraction, which are
critical preprocessing steps for ensuring that the regions of interest (faces) are accurately
analyzed for deepfake detection.
3.3.3 NumPy
NumPy is a fundamental package for scientific computing with Python. It provides
support for arrays, matrices, and many mathematical functions.
NumPy is used for handling numerical data and performing operations on arrays, which
is essential for manipulating the image data and performing mathematical operations
required during the data preprocessing and model evaluation phases.
3.3.4 moviepy
moviepy is a Python library for video editing, which allows for basic operations on
videos, such as cutting, concatenating, and processing.
3.3.5 matplotlib
matplotlib is a plotting library for the Python programming language and its numerical
mathematics extension NumPy. It is used for creating static, animated, and interactive
visualizations.
matplotlib is employed in this project for visualizing the results of the deep learning
models, such as plotting training and validation accuracies, losses, and displaying
example frames of detected deepfakes.
3.4.1 React
React is a JavaScript library for building user interfaces, particularly single-page
applications where data changes over time. Developed and maintained by Facebook, it
enables developers to create large web applications that can update and render efficiently
in response to data changes.
20
React is used in this project to build the frontend interface, providing a responsive and
interactive user experience for users to upload videos, receive analysis results, and
visualize the detection process.
3.4.2 HTML
HTML (Hypertext Markup Language) is the standard markup language used for creating
web pages. It provides the structure for web content.
HTML is utilized in conjunction with React to define the structure of the web pages,
ensuring that the content is semantically organized and accessible.
3.4.3 Tailwind CSS
Tailwind CSS is a utility-first CSS framework that provides a low-level, utility-based
approach to styling web pages. It allows developers to build custom designs without
leaving their HTML.
Tailwind CSS is used in this project for styling the frontend interface, providing a clean,
modern, and responsive design with minimal effort.
3.5.1 Python
21
Python is a high-level, interpreted programming language known for its simplicity and
readability. It is widely used in web development, data science, artificial intelligence, and
more.
Python is the primary language used for implementing the backend of this project,
leveraging its extensive libraries and frameworks to handle data processing, model
training, and server-side logic.
3.5.2 Django
Django is a high-level Python web framework that encourages rapid development and
clean, pragmatic design. It provides a robust and scalable foundation for building web
applications.
Django is used in this project to develop the backend server, manage API endpoints,
handle user requests, and interface with the machine learning models for processing and
returning the deepfake detection results.
3.6 Database
22
Figure 3.12 : Google Firestore
In this project, Google Firestore is used to store and manage data such as uploaded
images, uploaded videos, and analysis results. Its real-time capabilities ensure that users
receive immediate feedback and updates on the status of their deepfake detection
requests.
23
CHAPTER 4
ECONOMIC ANALYSIS
Our Deepfake detection project has been developed with a focus on economic efficiency
and accessibility. By leveraging a suite of free and secure technologies, we have ensured
that the project incurs minimal costs while maintaining high standards of functionality
and performance. This strategic choice allows us to deliver a robust solution without
burdening users with additional financial expenses.
4. Database Management:
Our project uses Google Firestore for data storage and management, which also offers a
free tier with sufficient capacity for our application's requirements. Firestore's real-time
data synchronization capabilities enhance user experience without adding costs, as the
free tier covers the basic needs of our project.
6. Long-term Sustainability:
The sustainability of our project is further enhanced by the active communities
supporting the open-source technologies we use. Continuous improvements and updates
from these communities ensure that our project remains secure, efficient, and cutting-
edge, without the need for significant financial investment in proprietary software or
tools.
25
CHAPTER 5
Stable internet connectivity is crucial for uploading and downloading videos seamlessly. A
minimum internet speed of 5 Mbps is required, but a high-speed connection of at least 20 Mbps
is recommended for optimal performance.
The website is compatible with various operating systems, including Windows 10 or later,
macOS, and Linux distributions with kernel version 4.15 or later. To maintain computational
efficiency, the website uses pre-trained weights to make predictions, which helps reduce
processing time. The code's get function is optimized to sequentially take frames until the video
is complete, minimizing lag and enhancing the user experience. Users are advised to use
devices that meet or exceed the recommended specifications to ensure smooth operation,
especially when processing longer videos.
With these basic hardware requirements, users can effectively utilize the frontend website for
Deepfake detection. Meeting or exceeding the recommended specifications will result in
improved performance and a smoother user experience.
25
Fig 5.1 Choosing a file
26
Fig 5.2 Showing result
Result:
CNN Model:
• Precision: The ratio of true positive predictions to the total number of positive
predictions made (i.e., true positives divided by the sum of true and false positives).
• Recall: The ratio of true positive predictions to the total number of actual positive
instances (i.e., true positives divided by the sum of true positives and false negatives).
• F1 Score: The harmonic mean of precision and recall, providing a single metric that
balances both concerns.
27
• Support: The number of actual occurrences of each class in the dataset (i.e., the true
positive cases for each class).
The proposed algorithm consistently showed a significant possibility of verifying the integrity
of Deepfakes and normal videos, accurately detecting Deepfakes in six out of eightvideos
(75%) for GAN developed datasets (FaceForensics++, OpenForensics)
28
CHAPTER 6
CONCLUSION
In this study, we proposed and developed a method to analyze significant changes in eye
blinking, which is a spontaneous and unconscious human function, as an approach to detect
Deepfakes generated using the GANs model. Blinking patterns vary according to an
individual’s gender, age, and cognitive behavior, and fluctuate based on the time of day. Thus,
the proposed algorithm observed these changes using machine learning, several algorithms, and
a heuristic method to verify the integrity of Deepfakes.
To enhance the effectiveness of our method, we implemented different approaches based on the
length of the videos. For videos greater than 15 seconds, we utilized a comprehensive
algorithm that incorporated results from various previous studies. For videos less than 15
seconds, we employed a CNN-based weighted model to ensure accurate detection.
Additionally, we integrated the detection of both images and videos in our algorithm.
The proposed algorithm consistently showed a significant possibility of verifying the integrity
of Deepfakes and normal videos, accurately detecting Deepfakes in six out of eight videos
averaging to (80%). However, a limitation of the study is that blinking is also correlated with
mental illness and dopamine activity. The integrity verification may not be applicable to people
with mental illnesses or problems in nerve conduction pathways.
Despite this limitation, our method can be improved through a number of measures as cyber-
security attack and defense evolve continuously. The proposed algorithm suggests a new
direction that can overcome the limitations of integrity verification algorithms performed only
on the basis of pixels.
29
REFERENCES
[1] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies and M.Niener., "FaceForensics++:
Learning to Detect Manipulated Facial Images", Jan. 2019. [Online] Available: https://
arxiv.org/abs
/1901.08971[9]
[2] DeepVision: Deepfakes detection using human eye blinking pattern TackHyun Jung1 ,
SangWon Kim2, and KeeCheon Kim3
[5] I. Goodfellow et al., "Generative adversarial nets", Proc. Adv. Neural Inf. Process. Syst., pp.
2672-2680, 2014.
[6] Li, Y., Chang, M. C., and Lyu, S., "In ictu oculi: Exposing ai generated fake face videos by
detecting eye blinking", Jun. 2018. [Online] Available: https://arxiv.org/abs/1806.02877
[7] https://github.com/takhyun12/Dataset-of-Deepfakes
30