Release
Domain Model Technique Latest Version Description
Date
Vision Scalable multi-modal transformer
ViTamin-XL-
Image ViTamin-XL Transformers 2024 for classification and segmentation
384px
(ViT) with zero-shot performance
Adapter-enhanced transformer for
Adapter-enhanced ViT-Adapter
ViT-Adapter Jan 2024 dense predictions like
ViT (InternViT-6B)
segmentation and detection
Foundation Model Uses pre-trained features to detect
EfficientNet EfficientNet-B7 Late 2023
Features GAN-based image manipulations
Efficiently fine-tunes self-
Self-Supervised Wav2Vec 2.0
Audio Wav2Vec 2.0 2023 supervised audio features for
Learning Adapter-enhanced
synthetic audio detection
AST (Audio
Spectrogram Converts audio into spectrograms
Spectrogram AST-ViT 2023
Analysis with ViT for transformer-based classification
Transformer)
Combines time and frequency
Neural Stitching
Neural Stitching Feature Fusion 2023 features for detecting deepfake
Model
audio
Tracks factual and semantic
Entity Graphs + FAST-GNN
Text FAST 2024 consistency using graph neural
GNNs Enhanced
networks
Multi-Scale Longformer- Early Processes long texts with advanced
Longformer
Attention Large 2024 attention mechanisms
Perplexity-Based Identifies predictable patterns in
DetectGPT DetectGPT+ 2023
Analysis AI-generated text
Dataset for images:
1. Deepfake processed dataset (419k real and fake images of human face )
This dataset is used for detecting deepfakes across various types of media, with a focus
on identifying manipulation in facial regions, though it may also include body and
background inconsistencies
2. CIFAKE dataset (120k real and fake images of different objects)
While it covers diverse real and fake images, the primary focus is on detecting
inconsistencies in AI-generated faces and objects, with most of the analysis focusing on
facial features.
3. Deepfake and real images (140k real and fake images of human faces )
This dataset focuses on general images with both real and deepfake images. It does not
specify a particular region or subject focus, allowing for diverse image types for detection
purposes.
We will take samples from above datasets and combine them to create a new dataset. By using
this dataset model will not detect the deepfakes only based on human faces it will also detect the
deepfakes of other objects (animals, vehicles).
Dataset for audio:
1. Balanced ASVspoof 2021 PA (110k real and fake audios)
Focuses on physical access attacks, where spoofing involves playing fake audio through
physical devices like speakers to deceive ASV systems. It simulates scenarios like
playback attacks, where fake audio is introduced in real-world environments. This dataset
is mostly used in researches.
2. In the Wild (Audio deepfake) 31.8k real and fake audios
Aimed at detecting deepfake audio in natural scenarios. Focuses on detecting
manipulated audio in public discourse, such as political speeches or fake interviews. This
dataset is used for natural scenarios collected data from social media.
We will combine few samples from both datasets to generate a new dataset to train a model
which will be able to detect deepfake audio in natural scenarios and others also.
Dataset for text:
1. TweepFake Dataset: This dataset consists of 25,572 tweets, half human-written and
half generated by bots using techniques like GPT-2, RNNs, and Markov Chains. It focuses
on Twitter-style messages
2. GPT-2 Output Dataset: Created by OpenAI, it includes outputs from nine GPT-2 models
trained on WebText, featuring over 2 million text samples for training, validation, and
testing. Useful for analyzing AI-generated text