0% found this document useful (0 votes)
40 views21 pages

Cse400 p2 Group8 (PDF) - 1

This thesis investigates the effectiveness of advanced convolutional neural network (CNN) architectures, specifically ResNet50 and Xception, in real-time detection of AI-generated images, known as deepfakes. The research highlights that ResNet50, when combined with enhanced preprocessing techniques, outperforms Xception in both accuracy and real-time application, addressing the urgent need for reliable detection methods to combat misinformation. The study aims to contribute to the ongoing efforts to maintain the integrity of visual content in the digital age by providing robust solutions for deepfake detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views21 pages

Cse400 p2 Group8 (PDF) - 1

This thesis investigates the effectiveness of advanced convolutional neural network (CNN) architectures, specifically ResNet50 and Xception, in real-time detection of AI-generated images, known as deepfakes. The research highlights that ResNet50, when combined with enhanced preprocessing techniques, outperforms Xception in both accuracy and real-time application, addressing the urgent need for reliable detection methods to combat misinformation. The study aims to contribute to the ongoing efforts to maintain the integrity of visual content in the digital age by providing robust solutions for deepfake detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Patch-Based Deepfake Localization: Unveiling Manipulated

Regions in Images through Visual Artifact Analysis and


Deeplearning

by

Syed Hasnad Jami


20301236
[Link] Ul Alam
20301228
Alpona Azrin
21101135
S.M. Atahar Istiaque
20301237

A thesis submitted to the Department of Computer Science and Engineering


in partial fulfillment of the requirements for the degree of
[Link]. in Computer Science

Department of Computer Science and Engineering


Brac University
2024

© 2024. Brac University


All rights reserved.
Declaration
It is hereby declared that
1. The thesis submitted is my/our own original work while completing degree at
Brac University.
2. The thesis does not contain material previously published or written by a third
party, except where this is appropriately cited through full and accurate
referencing.
3. The thesis does not contain material which has been accepted, or submitted,
for any other degree or diploma at a university or other institution.
4. We have acknowledged all main sources of help.

Student’s Full Name & Signature

Syed Hasnad Jami [Link] Ul Alam


20301236 20301228

Alpona Azrin S.M. Atahar Istiaque


21101135 20301237
Approval
The thesis/project titled “Patch-Based Deepfake Localization: Unveiling Manipu-
lated Regions in Images through Visual Artifact Analysis and deeplearning” sub-
mitted by

1. Syed Hasnad Jami(20301236)

2. [Link] Ul Alam(20301228)

3. Alpona Azrin(21101135)

4. S.M. Atahar Istiaque(20301237)

Of Summer, 2024 has been accepted as satisfactory in partial fulfillment of the


requirement for the degree of [Link]. in Computer Science on 2024.

Examining Committee:

Supervisor:
(Member)
……………………………………………………………
Dr. M Iqbal Hossain
Associate Professor
Department of Computer Science and Engineering
Brac University

Co Supervisor:
(Member)
……………………………………………………………
Labib Hasan Khan
Lecturer
Department of Computer Science and Engineering
Brac University
Abstract
The rapid evolution of deepfake technology, which generates highly realistic fake images and
videos using artificial intelligence, has intensified concerns about the authenticity of visual
content. Addressing the critical challenge of real-time detection of AI-generated images is
essential in curbing the spread of misinformation. This study focuses on developing a robust
model for real-time detection of deepfake images, comparing the performance of advanced
convolutional neural network (CNN) architectures, including ResNet50, Xception, and a basic
CNN. Although Xception is reputed for its computational efficiency due to its architecture
leveraging depthwise separable convolutions, our findings demonstrate that ResNet50, when
combined with enhanced data preprocessing techniques, outperforms Xception in both
accuracy and real-time application. The superior performance of ResNet50 underscores the
significance of preprocessing in optimizing model accuracy. Our research provides a
comprehensive analysis of these CNN architectures, establishing that ResNet50, with
optimized preprocessing, is more effective for real-time deepfake detection. This study not only
contributes to the ongoing efforts to ensure the integrity of visual content but also offers a
potent solution to the growing digital threat posed by manipulated media.

Keywords: CNN, Xception, ResNet50, Convolution, Preprocessing, Augmentation,


Synthetic image,Ai-generated image,Deepfake detection .

i
Table of Contents
Declaration................................................................................................................................. i

Student’s Full Name & Signature................................................................... i

Abstract ...................................................................................................................................... i

Keywords: ........................................................................................................ i

Chapter 1 ................................................................................................................................ iii

1.1 Introduction................................................................................................................................. iii

1.1.1 Research Problem ................................................................................ iv

1.1.2 Research Objectives .............................................................................. v

Chapter 2 ................................................................................................................................. vi

2.1 Literature Review ............................................................................................................................ vi

2.1.1 ResNet50 vs. Xception in terms of Inference time ............................ vi

2.2 Related Works ................................................................................................................................ vii

Chapter 3 .............................................................................................................................. viii

3.1 Methodology .................................................................................................................................. viii

3.1.1 Data Collection and Input ................................................................... ix

3.1.2 Data Pre-Processing ............................................................................. ix

3.1.3 Comparison of Accuracy before and after preprocessing data ....... xi

Chapter 4 ................................................................................................................................xii

4.1 Implementation of Model and Result ............................................................................................ xii

4.1.1. Model Architecture ...........................................................................xii

4.1.2 Result .................................................................................................. xiii

Conclusion .............................................................................................................................. xv

Reference List……………………………………………………………………… xviii

ii
Chapter 1

1.1 Introduction

In the digital era, as AI technology becomes more integrated into our lives, enhancing
convenience and efficiency, the misuse of AI-generated images, known as deepfakes, poses a
significant threat to global information integrity and security. Real-time detection systems are
crucial in mitigating the spread of misinformation, protecting reputations, and maintaining
public trust in visual content. A notable example is the 2023 deepfake video of Ukrainian
President Volodymyr Zelenskyy, where he appeared to urge Ukrainian soldiers to surrender—
a false statement that caused considerable confusion [6]. This incident underscores the
profound impact deepfakes can have, from manipulating public opinion to damaging individual
reputations. The ability to detect these manipulations in real-time is essential to counter these
threats and maintain the credibility of visual content.

This research investigates the effectiveness of various convolutional neural network (CNN)
architectures in detecting AI-generated images, focusing on real-time detection capabilities.
We evaluate the performance of ResNet50, Xception, and a basic CNN model. While Xception
is known for its computational efficiency due to its depthwise separable convolutions [2], our
findings demonstrate that the ResNet50 model, when enhanced with advanced preprocessing
techniques, outperforms Xception in accuracy and suitability for real-time applications. This
highlights the critical role of preprocessing in optimizing model performance, potentially
surpassing Xception's computational advantages.

The urgency of real-time deepfake detection cannot be overstated. As deepfakes become more
sophisticated and accessible, their misuse potential grows. Immediate detection is necessary to
prevent the rapid spread of fake content across platforms, which can occur within minutes of
posting. Real-time detection systems enable timely intervention, maintaining the integrity of
information and protecting individuals and organizations from deepfake content's harmful
effects.

Significant progress has been made in AI-generated image detection. For instance, [1]
introduced MesoNet, a compact CNN designed for deepfake detection, while [5] provided a
comprehensive evaluation of various detection methods using the FaceForensics++ dataset.
More recent works by [3] and [4] have further advanced the field by exploring more efficient
and robust detection techniques. Despite these advancements, achieving real-time detection
with high accuracy remains challenging. Our research aims to address this gap by leveraging
the computational advantages of the ResNet50 model, offering a solution that combines
accuracy with the speed necessary for real-time applications.

In conclusion, the growing sophistication of AI-generated images necessitates advanced and


efficient detection techniques. By comparing the performance of ResNet50, Xception, and a
basic CNN architecture, our study contributes to the advancement of reliable real-time
deepfake detection methods. This research not only enhances detection accuracy but also
ensures the efficiency required to keep pace with the rapidly evolving digital landscape,
safeguarding the authenticity of visual content.

iii
1.1.1 Research Problem

The increasing sophistication of AI-generated images, known as deepfakes, presents a


significant challenge to information integrity and security. Recent reports, such as those by
[23], highlight the misuse of AI-generated content in copyright infringement and malicious
activities, exacerbating the urgency for robust detection methods. With the rapid advancements
in AI technologies like DALL-E, which can create highly realistic images almost
instantaneously, there is a critical need for equally advanced detection counterparts [17].
Despite advancements in detection technologies, current research often overlooks real-time
application, which is crucial for immediate mitigation of deepfake threats. This research aims
to investigate these gaps and propose solutions to enhance the accuracy and efficiency of real-
time deepfake detection systems.

Current deepfake detection models often fail to achieve the necessary accuracy for reliable
real-time application. For instance, [1] introduced MesoNet, a compact CNN designed for
deepfake detection. However, its performance is hindered by limited accuracy and poor
generalization across diverse datasets. Similarly, [5] evaluated several detection methods using
the FaceForensics++ dataset, highlighting that many models struggle with high false positive
and false negative rates. These shortcomings are further corroborated by [3], who noted that
existing models do not adequately address the rapid evolution of deepfake technology, resulting
in significant gaps in detection accuracy.

While AI image generators capable of creating highly realistic images in real-time are widely
available, there is a significant gap in the development of corresponding real-time detection
technologies. According to [4], the proliferation of tools that can generate deepfakes almost
instantaneously has outpaced the development of detection methods, leading to an increased
risk of misinformation and malicious use. This disparity is further emphasized by the work of
[12], who found that current detection systems struggle to keep up with the speed and
sophistication of deepfake generation, underscoring the urgent need for real-time detection
solutions that can provide immediate and reliable responses.

The lack of comprehensive and up-to-date datasets for training and evaluating deepfake
detection models is a major obstacle. As AI-generated image quality continues to improve,
existing datasets quickly become outdated, failing to represent the latest advancements in
deepfake creation techniques. Research by [13] and [14] highlights this issue, noting that the
dynamic nature of AI image generation demands continuous updates to datasets to ensure that
detection technologies remain effective. This gap hampers researchers' ability to develop robust
detection systems that can adapt to new and emerging threats, as evidenced by the findings of
[15].

Existing research predominantly focuses on distinguishing real images from synthetic ones—
a binary classification problem. However, AI-generated images encompass a broader range,
including images created by combining different faces (synthetic) and purely AI-generated
images. Addressing this broader spectrum requires a multiclass classification approach, with
classes for real, synthetic, and AI-generated images. According to [18], highlighted this
necessity in his research, noting that while this approach adds complexity, it is crucial for
achieving accurate detection.

iv
Given these challenges, the central question this research seeks to answer is:

How effective are advanced convolutional neural network (CNN) architectures, specifically
ResNet50, when combined with sophisticated preprocessing techniques, in achieving high
accuracy and real-time detection capabilities for multiclass detection of AI-generated
images?

This research aims to answer the question by exploring the effectiveness of advanced CNN
architectures, specifically ResNet50 and Xception, enhanced with sophisticated preprocessing
techniques. By employing better preprocessing techniques, such as advanced data
augmentation and noise reduction, ResNet50 has shown superior performance over other
models, resulting in reduced computation time. Additionally, by training the models with a
multiclass classification approach—distinguishing real, synthetic, and AI-generated images—
our models demonstrate advanced capabilities and improved accuracy in real-time detection
scenarios.

This study seeks to contribute to the development of more reliable and efficient real-time
deepfake detection methods by addressing the limitations of current models and the
inadequacies in existing datasets. Through this research, we aim to enhance the accuracy and
efficiency of deepfake detection systems, providing robust solutions to the growing threat of
AI-generated images.

1.1.2 Research Objectives


The primary objective of this research is to develop and evaluate advanced convolutional neural
network (CNN) architectures, specifically ResNet50 and Xception, enhanced with
sophisticated preprocessing techniques, to achieve high accuracy and real-time detection
capabilities for multiclass detection of AI-generated images. This involves:

1. Implementing advanced preprocessing techniques, such as data augmentation to


optimize model performance.

2. Training and testing CNN models on a multiclass classification task that distinguishes
between real, synthetic, and AI-generated images.

3. Comparing the performance of ResNet50 and Xception in terms of accuracy,


computational efficiency, and real-time applicability.

4. Providing a robust solution to the challenges of real-time deepfake detection,


contributing to the ongoing efforts to maintain the integrity and authenticity of visual
content in the digital age

v
Chapter 2

2.1 Literature Review

Deepfake technology has revolutionized image and video synthesis, employing advanced
artificial intelligence (AI) techniques to create highly realistic images and videos that are often
indistinguishable from real ones. AI-generated images, particularly those produced through
Generative Adversarial Networks (GANs) like StyleGAN, represent a significant leap in the
ability to produce synthetic content. These images are generated by training neural networks
to mimic the characteristics of real images, leading to results that are visually convincing and
difficult to differentiate from authentic media [16]. Additionally, tools like DALL-E, which is
trained with 4 billion images on the CLIP model, can produce highly accurate and detailed
content, further complicating the detection and verification of such AI-generated images.

The need for a robust counterpart to detect such sophisticated AI-generated images is critical.
As these technologies become more advanced, they pose a significant threat to information
integrity and security. Effective detection mechanisms are essential to counter the potential
misuse of these technologies in creating misleading or harmful content.

Synthetic images, broadly defined, include any images generated through computer algorithms,
encompassing deepfakes. These images are utilized in various applications such as computer
graphics, virtual reality, and video games. Unlike traditional synthetic images, which are
created through explicit programming and design, deepfakes use machine learning algorithms
to automatically generate content that closely mimics real-world imagery. This automatic
generation presents unique challenges in detection and authenticity verification.

People often confuse synthetic images with real ones due to the high quality and realism of AI-
generated content. This confusion underscores the importance of developing sophisticated
detection models that can accurately classify images as real, AI-generated, or traditionally
synthetic. In our research, we employ multiclass classification using SoftMax and sigmoid
functions in ResNet50 to address this challenge. This approach allows us to effectively
categorize images into multiple classes, enhancing the robustness of our detection system.

2.1.1 ResNet50 vs. Xception in terms of Inference time


Our research evaluates the performance of different convolutional neural network (CNN)
architectures in the real-time detection of AI-generated images. ResNet50 and Xception are
two prominent models under investigation. Xception, known for its depthwise separable
convolutions, is generally expected to offer superior computational efficiency, making it
theoretically ideal for real-time applications [2]. This architecture reduces the number of
parameters and computational load, enhancing inference time.

However, our model indicates that ResNet50, when enhanced with advanced preprocessing
techniques and transfer learning with multiclass classification, outperforms Xception in terms
of inference time where both perform accurately in terms of accuracy result. ResNet50, with
its residual learning framework, effectively mitigates the vanishing gradient problem, allowing

vi
for deeper network training and better feature extraction [9]. The transfer learning approach
leverages pre-trained weights on large datasets, fine-tuning the model for specific tasks such
as deepfake detection, thereby improving its efficiency and performance in real-time scenarios.

2.2 Related Works


Recent research has made significant strides in deepfake detection, with several notable studies
contributing to the advancement of this technology in recent times.

Zhao et al. (2023) explored the use of transformer-based models for deepfake detection,
demonstrating that transformers can achieve high accuracy by capturing long-range
dependencies and contextual information in images. Their approach leverages self-attention
mechanisms to effectively differentiate between real and AI-generated content. On the
contrary, the research achieved a higher accuracy but failed to inference in real time.

Li et al. (2023) introduced a novel framework combining CNNs with capsule networks to
improve the robustness of deepfake detection models. Their method focuses on preserving
spatial hierarchies in images, enhancing the model's ability to detect subtle manipulations in
AI-generated content.

Wang et al. (2023) proposed an ensemble learning approach that integrates multiple detection
models to improve overall accuracy and reliability. By combining the strengths of different
architectures, their ensemble model outperforms individual models, providing a more
comprehensive solution for deepfake detection.

Chen et al. (2023) examined the impact of adversarial training on the robustness of deepfake
detection models. Their study shows that incorporating adversarial examples during training
helps models better generalize to new, unseen deepfakes, thus enhancing their detection
capabilities.

Zhang et al. (2023) focused on developing lightweight deepfake detection models suitable for
deployment on edge devices. Their research highlights the importance of balancing accuracy
and computational efficiency, ensuring that detection systems can operate effectively in real-
time with limited hardware resources.

These studies collectively underscore the rapid advancements in deepfake detection technology
and the ongoing efforts to address the challenges posed by increasingly sophisticated AI-
generated images. Our research builds on these findings, contributing to the development of
robust, real-time detection systems that leverage the strengths of ResNet50 and advanced
preprocessing techniques. By addressing the inadequacies of current models, the lack of real-
time detection technologies, and the absence of comprehensive datasets, our study aims to
enhance the accuracy and efficiency of deepfake detection systems, ultimately safeguarding
the authenticity of visual content.

vii
Chapter 3

3.1 Methodology
The methodology for this study employs a structured approach to detect real, synthetic, and
AI-generated images using advanced transfer learning techniques combined with sophisticated
preprocessing methods and the ResNet50 architecture. Along with the model, one golden rule
to highlight, our dataset is processed with a technique which divided the data to 60 percent
training data, 20 percent validation and 20 percent test data. Finally, as illustrated in Figure 1.1,
the process encompasses several stages, from data input to the final classification decision.
This methodology provides a robust framework for training and evaluating the model, ensuring
accuracy and reliability in the results.

Fig 1 Model Implementation and Data Training with Decision

viii
3.1.1 Data Collection and Input
The process begins with collecting a diverse dataset that includes real, synthetic, and AI-
generated images. This dataset is categorized into three groups: Real Images, Fake Images, and
Synthetic Images. The fake images are manually created using DALL-E 3, while the real
images are sourced from [28] named as ‘Real vs Fake Faces’ dataset and the synthetic images
are produced by [27] known as “1 million fake faces” dataset. Therefore, combining real
images with handcrafted AI-created photos is a key point for our model to spot strong AI-
generated images, which are pre-processed and trained in our model.

3.1.2 Data Pre-Processing


Preprocessing ensures that the input data is in the optimal form for training the ResNet50-based
SAAT model. By augmenting the dataset and standardizing the inputs, the model can learn
more effectively, leading to improved detection of real, synthetic, and AI-generated images.
This rigorous approach to data preprocessing is fundamental to achieving high performance
and reliability in real-time deepfake detection.

Fig 2 Data Pre-Processing

ix
Resize Image to (224, 224): All input images are resized to 224x224 pixels, the default input
size required by ResNet50. This uniform size ensures consistency across the dataset,
facilitating efficient processing by the convolutional layers. After that, pixel values are rescaled
from the range [0, 255] to [0, 1]. This normalization step improves the numerical stability of
our model during training by reducing the variance in pixel values.

Convert Image from RGB to BGR: ResNet50 is pre-trained on the ImageNet dataset, which
uses the BGR color space. Converting images from RGB to BGR aligns with the pre-trained
model expectations, ensuring compatibility and enhancing feature extraction.

Width, Height, and Shear Shift (Max 20% Shift): Geometric transformations, including
translations and shear shifts, are applied to the images with a maximum shift of 20%. These
transformations increase the diversity of the dataset by creating variations that mimic natural
distortions and movements in real-world scenarios.
Canvas Filling

Fill the Empty Part of the Canvas by Sampling the Nearest Pixel: When geometric
transformations result in empty regions in the image, these regions are filled by sampling the
nearest pixel values. This maintains the integrity of the image and prevents artifacts that could
confuse the model during training.

Random Horizontal Flip: Images are randomly flipped horizontally. This augmentation
technique simulates real-world variations and helps the model become invariant to horizontal
orientation changes, thereby improving generalization.

Zoom Between -20% to 20%: Random zooming within the range of -20% to +20% is applied
to the images. This augmentation helps the model learn to recognize objects at different scales,
enhancing its robustness to variations in object size and distance.

Convert Labels to One-Hot Encoding (Categorical): Labels are converted to one-hot encoded
vectors, which represent the class labels (real, synthetic, AI-generated) in a format suitable for
categorical cross-entropy loss. This encoding allows the model to output probabilities for each
class, facilitating multi-class classification.

x
3.1.3 Comparison of Accuracy before and after preprocessing data

Fig 3 Comparison of Accuracy before and after preprocessing data

Fig4 Images before and after preprocessing

xi
Chapter 4

4.1 Implementation of Model and Result

This section meticulously outlines the implementation process of a cutting-edge model for real-
time detection of AI-generated images. By evaluating advanced convolutional neural network
(CNN) architectures—ResNet50, Xception, and a basic CNN model—we identify the optimal
architecture for deepfake detection. The steps include data preparation, model architecture
design, training, and evaluation, emphasizing sophisticated techniques to enhance accuracy and
efficiency.

The dataset is meticulously partitioned into training (60%), validation (20%), and testing (20%)
subsets, encompassing three distinct classes: real, synthetic, and AI-generated images. Each
subset is balanced and randomized to ensure unbiased training and evaluation. To facilitate
seamless data loading and preprocessing, a directory structure is established using the
ImageDataGenerator class from Keras. Images are normalized by rescaling pixel values to a
range of 0 to 1, achieved by multiplying each pixel value by 1/255. This normalization
enhances the learning process. Advanced augmentation techniques, including random
rotations, shifts, and flips, are applied to the training data to bolster the model’s generalization
capabilities.

4.1.1. Model Architecture


Architecture: The ResNet50 architecture, pre-trained on the ImageNet dataset, serves as the
foundation for feature extraction due to its exceptional performance in image classification
tasks. ResNet50’s deep residual learning framework effectively mitigates the vanishing
gradient problem, which is crucial for training deep networks. In our model, the pre-trained
ResNet50 is extended by incorporating a Global Average Pooling layer, which reduces each
feature map to a single value by averaging, thus simplifying the model and reducing
overfitting. This is followed by a dense layer with 256 neurons and the ReLU (Rectified
Linear Unit) activation function, which introduces non-linearity and defined as:

The final dense layer employs the SoftMax activation function to output probabilities across
the three classes. The SoftMax function is defined as:

Model Compilation and Training: The model is compiled using the Adam optimizer,
renowned for its efficiency and adaptive learning rate properties. The categorical cross-
entropy loss function is used, appropriate for multiclass classification, and is defined as:

xii
where (yi) is the binary indicator (0 or 1) if class label i is correct for observation N, and pi is
the predicted probability for class i. Performance metrics such as accuracy, AUC (Area
Under the ROC Curve), precision, and recall are tracked during training. These metrics
provide a comprehensive evaluation, ensuring not only high accuracy but also the model’s
ability to effectively distinguish between different classes. The model undergoes training over
multiple epochs, with the validation set used to monitor and adjust the training process,
preventing overfitting.

To predict the class of an image among three possible classes (real, synthetic, and AI-
generated), one-hot encoding is employed. In one-hot encoding, each class is represented as a
binary vector of length equal to the number of classes. For three classes, the vectors are:

Real : [1, 0, 0]
Synthetic : [0, 1, 0]
AI-generated : [0, 0, 1]

During prediction, the SoftMax output provides a probability distribution over the three
classes. The class with the highest probability is selected as the predicted class.
Mathematically, this is represented as:

yˆ = arg max i σ(zi)

where ˆy is the predicted class, and σ(zi) is the SoftMax probability for class

4.1.2 Result
After running our SAAT model using transfer learning with ResNet50, Xception, and Basic
CNN architectures, we attained test accuracies of 96.59%, 90%, and 69%, respectively.
Notably, by employing proper techniques outlined in the model architecture, we achieved
improved inference times, particularly notable in the ResNet50 model. Moving forward, our
aim is to enhance our model's accuracy while reducing inference time, especially as we
transition towards detecting real-time deep fake videos in future.

xiii
Fig 5 Comparing Inference Time our model when applied in Resnet50 Xception and a Basic CNN

Fig 6 Confusion Matrix

xiv
Conclusion
Our SAAT model excels in real-time detection of AI-generated images, leveraging a fine-tuned
ResNet50 architecture. Comprehensive preprocessing steps, including image resizing,
rescaling, color space conversion, and data augmentation, ensured high-quality and diverse
datasets. By enhancing ResNet50's robust feature extraction with custom layers, we achieved
a model that is both highly accurate and efficient. Training and validation using structured data
generators maintained optimal input formats, with consistent improvements in accuracy and
reductions in loss over epochs highlighting the model’s learning effectiveness. Evaluations on
test data confirmed its strong generalization ability and reliability. Overall, the SAAT model
provides a robust solution for real-time deepfake detection, meeting the high demands for
accuracy and efficiency in digital media security, and advances the field of AI-generated image
detection while laying a foundation for future improvements and applications.

xv
Reference List

1. Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: A Compact Facial
Video Forgery Detection Network. In Proceedings of the IEEE International Workshop on
Information Forensics and Security (WIFS). [Link]

[Link], F. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. In


Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(pp. 1251-1258). [Link]

[Link], B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C. C. (2020).
The Deepfake Detection Challenge Dataset. arXiv preprint arXiv:2006.07397.
[Link]

[Link], S., Farid, H., Gu, Y., He, M., Nagano, K., & Li, H. (2020). Protecting World
Leaders Against Deep Fakes. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR) Workshops.
[Link]

[Link], A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019).
FaceForensics++: Learning to Detect Manipulated Facial Images. In Proceedings of the IEEE
International Conference on Computer Vision (ICCV) (pp. 1-11).
[Link]

[Link], J. (2023). The Impact of Deepfake Videos on Public Perception: The Case of
Ukrainian President Volodymyr Zelenskyy. Journal of Digital Forensics, Security and Law,
18(1), 23-35. [Link]

[Link], I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. ISBN: 978-
0262035613.

[Link], K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-
Scale Image Recognition. arXiv preprint arXiv:1409.1556. [Link]

[Link], K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image
Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) (pp. 770-778). [Link]

[Link], H. H., Yamagishi, J., & Echizen, I. (2019). Capsule-Forensics: Using Capsule
Networks to Detect Forged Images and Videos. In Proceedings of the IEEE International

xvi
[Link] on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2307-2311).
[Link]

[Link], P., & Marcel, S. (2018). DeepFakes: A New Threat to Face Recognition?
Assessment and Detection. arXiv preprint arXiv:1812.08685.
[Link]

[Link], X., Li, Y., & Lyu, S. (2019). Exposing Deep Fakes Using Inconsistent Head Poses.
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[Link]

[Link], R., Vera-Rodriguez, R., Fierrez, J., Morales, A., & Ortega-Garcia, J. (2020).
DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection. Information
Fusion, 64, 131-148. [Link]

[Link], L. (2020). Media Forensics and DeepFakes: An Overview. IEEE Journal of


Selected Topics in Signal Processing, 14(5), 910-932.
[Link]

[Link], T., Laine, S., & Aila, T. (2019). A Style-Based Generator Architecture for
Generative Adversarial Networks. IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 4401-4410. [Link]

[Link], B., Smith, J., & Williams, R. (2024). Real-time Detection of AI-generated
Content Using Advanced CNN Architectures. Journal of Artificial Intelligence Research,
62(1), 15-30. [Link]

[Link], T., Zhao, J., & Li, Y. (2022). Multiclass Classification of AI-Generated Images
Using CNNs. Journal of Machine Learning Research, 23(98), 1-20.
[Link]

[Link], H., Feng, J., & Liu, X. (2023). Transformer-based Models for Deepfake Detection:
An Empirical Study. Proceedings of the IEEE/CVF International Conference on Computer
Vision (ICCV). [Link]

[Link], J., Shi, X., & Sun, C. (2023). Combining CNNs with Capsule Networks for Improved
Deepfake Detection. Neural Computing and Applications. [Link]
023-07345-y

[Link], Y., Zhang, P., & Liu, M. (2023). An Ensemble Learning Approach for Robust
Deepfake Detection. IEEE Transactions on Information Forensics and Security, 18, 3045-
3057. [Link]

[Link], G., Lu, Y., & He, X. (2023). Enhancing Deepfake Detection with Adversarial
Training. Pattern Recognition Letters, 157, 82-90.
[Link]

xvii
[Link], J., & Jones, M. (2024). Misuse of AI-Generated Content in Copyright
Infringement: Implications and Detection Methods. International Journal of Digital Crime
and Forensics, 16(2), 45-60. [Link]

[Link], F. (2017). Xception: Deep Learning with Depthwise Separable Convolutions.


IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1251-1258.
[Link]

25.Güera, D., & Delp, E. J. (2018). Deepfake Video Detection Using Recurrent Neural
Networks. 15th IEEE International Conference on Advanced Video and Signal Based
Surveillance (AVSS), 1-6. [Link]

[Link], H., Liu, F., Stehouwer, J., Liu, X., & Jain, A. K. (2020). On the Detection of Digital
Face Manipulation. IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 5781-5790. [Link]

27. F. Tunguz, "1 Million Fake Faces," Kaggle, 2021. [Online].


Available: [Link]

28.U. Sharma, "Real vs Fake Faces," Kaggle, 2021. [Online]. Available:


[Link]

xviii

You might also like