A Robust Audio Deepfake Detection System Via Multi-View Feature

Uploaded by

mrajpal2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views5 pages

A Robust Audio Deepfake Detection System Via Multi-View Feature

Uploaded by

mrajpal2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 979-8-3503-4485-1/24/$31.

00 ©2024 IEEE | DOI: 10.1109/ICASSP48485.2024.10446560

A ROBUST AUDIO DEEPFAKE DETECTION SYSTEM VIA MULTI-VIEW FEATURE

Yujie Yang†, Haochen Qin†, Hang Zhou, Chengcheng Wang, Tianyu Guo, Kai Han*, Yunhe Wang*

Huawei Noah’s Ark Lab

ABSTRACT learning based features. Hand-crafted features, although sim-

ple, have acceptable performance since their special design to
With the advancement of generative modeling techniques, extract audio properties. For instance, the constant-Q trans-
synthetic human speech becomes increasingly indistinguish- form (CQT) is good at capturing both long-range features
able from real, and tricky challenges are elicited for the audio and fine details in audio signals with its different filter win-
deepfake detection (ADD) system. In this paper, we ex- dow lengths across frequency bands[4]. MFCC and LFCC
ploit audio features to improve the generalizability of ADD features, have good match with human auditory characteris-
systems. Investigation of the ADD task performance is con- tics and emphasis on low-frequency information to bolster
ducted over a broad range of audio features, including various speech detection tasks. In recent years, the application of
handcrafted features and learning-based features. Experi- learning-based audio features in ADD tasks has attracted
ments show that learning-based audio features pretrained on tremendous attention. Research has explored the use of audio
a large amount of data generalize better than hand-crafted features of Whisper [5] for detecting synthetic speech. Large
features on out-of-domain scenarios. Subsequently, we fur- amounts of audio data support the Whisper ASR system,
ther improve the generalizability of the ADD system using demonstrating their superiority over handcrafted features [6].
proposed multi-feature approaches to incorporate compli- Similarly, self-supervised learning-based audio features have
mentary information from features of different views. The also proven beneficial for ADD tasks [7] The success of self-
model trained on ASV2019 data achieves an equal error rate supervised models in various scenarios can be attributed to
of 24.27% on the In-the-Wild dataset. The code will be the usage of extensive pre-training data sourced from diverse
released as soon 1 . domains, ensuring the model to produce meaningful audio
Index Terms— Audio deepfake detection, anti-spoofing, features even in complicated situations. These features aid
feature incorporation in distinguishing between real and fake speech and perform
well on out-of-domain dataset [8].
However, performance of ADD model based on single
1. INTRODUCTION feature might be degraded since the spurious speech can be
generated from distinctive audio synthesis systems, where
AI technology currently has made breakthroughs with the single feature fails to represent characteristics of all the syn-
support of large-scale models, massive datasets, and powerful thesis systems. Based on such phenomenon, we propose to
computing capabilities. Speech synthesis, speech conversion, use multiple features, which can improve the model gener-
and speech editing technologies have been able to generate alizability by providing information from different aspects.
human speech that is virtually indistinguishable from real Two methods are proposed based on feature selection and fea-
human speech. However, progress in these speech generation ture fusion respectively. These approaches can better capture
technologies has also raised potential threats. The synthetic the subtle differences between fake speech and real speech,
speech could be misused for spreading rumors, executing bolstering detection system accuracy of identifying deep
fraud, and eliciting other illicit activities. Therefore, identi- forgery samples especially generated by unknown synthesis
fying synthetic speech is increasingly important. In response systems.
to such challenges, efforts including automatic speaker verifi- This work focuses on improving the generalizability of
cation spoofing and countermeasures (ASVspoof), and audio ADD system, and contributions are:
deepfake detection (ADD) competitions have been held to
collect solutions [1, 2, 3]. 1. We investigate a broad range of handcrafted features
Many works in ADD focus on finding proper audio fea- and learning-based deep features. Experimental re-
tures, which can be roughly categorized into hand-crafted and sults show strong generalizability for learning-based
features pretrained on large amounts of data.
†Euqal contribution
*Corresponding author 2. We propose two multi-view feature incorporation meth-
1 Mindspore: https://gitee.com/mindspore/models ods to capture subtlety of the multiple candidate fea-

979-8-3503-4485-1/24/$31.00 ©2024 IEEE 13131 ICASSP 2024

Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on July 11,2024 at 05:11:55 UTC from IEEE Xplore. Restrictions apply.
tures to further improve the performance and general- we propose two methods based on feature selection and fea-
izability of the system. ture fusion respectively.

2. AUDIO FEATURES AND MULTI-VIEW FEATURE

2.2.1. Feature selection
INCORPORATION
Identifying the most effective feature for the ADD task is dif-
In this section, we first introduce audio features of the hand-
ficult, especially for test data with unknown distributions. So,
crafted and learning-based approaches, which are investigated
we introduce multi-feature candidates to improve the gener-
in the experiments section. Then, the proposed feature multi-
alization of the ADD system. However, the introduction of
view approaches based on feature selection and feature fusion
redundant or irrelevant features may hinder the learning pro-
are demonstrated.
cess of the classifier. Therefore, we propose a feature se-
lection mechanism that decides whether to introduce a fea-
2.1. Audio features ture into the decision process based on sample-specific infor-
mation, thus exploiting the information provided by multi-
2.1.1. Hand-crafted features features while avoiding the negative impact of certain fea-
Hand-crafted acoustic features have been well investigated tures.
in ADD studies. In this paper, we evaluate 5 hand-crafted
features including Mel-scaled spectrogram (Mel), Mel fre- mi = Sθ (fi )
(1)
quency cepstral coefficient (MFCC), log-frequency spec- Fselect = Concat({fi ⊙ mi }), i ∈ [0 , N ]
trogram (LogSpec), linear frequency cepstral coefficients
(LFCC) and constant-Q transform (CQT). Our proposed feature selection mechanism is shown as eq1.
Where fi denotes the candidate features, mi and Fselect is the
selected mask and select features respectively. Each feature
2.1.2. Learning-based features
goes through a selection module Sθ before concatenated and
Learning-based acoustic features are generated from various fed into the classifier. This module consists of lightweight
audio tasks, and there is already precedent for using them for self-attentive layers, and the output of the module is a binary
ADD tasks [9, 6, 7]. In this paper, 9 learning-based audio mask that determines whether the feature should be used in
features proposed for various tasks have been extensively in- the decision for this sample. The discrete decision is obtained
vestigated and benchmarked for generalization performance by the Gumbel-max method, thus allowing the selection mod-
on the ADD task. ule to be trained end-to-end with the whole system.
The learnable acoustic front-ends automatically get the
proper filter banks while optimizing the objective. We adopt
SincNet [10] and LEAF [11] as learnable front-ends for ADD. 2.2.2. Feature fusion
Besides, we also evaluate a range of deep learning-based au-
dio features, where the use of additional data as well as task- Feature fusion, on the other hand, can incorporate all infor-
related training approaches can be beneficial for the ADD mation in the multi-view feature without deleting any views.
task. 7 deep learning-based models across various tasks are To smoothly incorporate acoustic representations from differ-
chosen to generate audio features. For audio neural codec ent pretrained models, we combine channel attention mecha-
models, we use EnCodec [12] and AudioDec [13] mainly con- nism and Transformer encoder to build a feature fusion mod-
sisted of autoencoder architecture and aimed to encode audio ule in (2). The multi-view feature, formed by concatenat-
compactly. AudioMAE [14] is selected as representative of ing candidate fi on channel dimension, is first processed by
pretrained model towards universal audio perception. For pre- a lightweight channel attention block to fuse on channel level
trained model on human speech, we select Wav2Vec2, Hubert (each channel represents one deep feature). Then, a Trans-
[15], and WavLM [16], which share similar network architec- former encoder is applied to fuse the feature ri , on both time
tures but different self-supervised losses. For the ASR model, and frequency dimensions. With element-wise global recep-
we use Whisper [5] model trained on a large dataset from di- tive field, the final fused representation Ffusion are input into
verse speech scenario. the classifier.

ri = CA(Concat(fi ))
2.2. Multi-view feature incorporation (2)
Ffusion = TE(Concat({ri })), i ∈ [0 , N ]
Features extracted from different deep models contain unique
information, which can further boost ADD model generaliz- Where CA means channel-attention and TE means vallina
ability with proper feature incorporation methods. Therefore, Transformer encoder.

13132

Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on July 11,2024 at 05:11:55 UTC from IEEE Xplore. Restrictions apply.
3. EXPERIMENTS
Table 1. Performance of various single audio features on the
3.1. Datasets ADD task EER (%)

We train our models on the train and dev subsets of the ASVspoof19 ASVspoof21
Features In-the-Wild
ASVSpoof 2019 Logical Access (LA) dataset part [17], LA eval DF eval
which is consistent with most related works. To evaluate Mel 7.42 20.13 50.56
our systems, we adopt three datasets. The eval subset of MFCC 6.45 27.27 75.43
ASVspoof 2019 and 2021 challenge are used to test the per- LogSpec 5.67 20.62 52.93
formance within similar domains [2]. The spoof audio of LFCC 15.35 25.67 65.45
the ASVspoof challenge is generated by 11 TTS and 8 VC CQT 4.91 20.75 56.69
algorithms from VCTK corpus. The samples of its eval sub- LEAF 8.54 21.54 49.70
set is generated with different algorithms compared to the SincNet 6.12 20.78 56.74
train subset. To evaluate the generalization ability of our sys- EnCodec 10.25 24.93 39.44
tems, we also test our systems on In-the-Wild dataset, which AudioDec 10.47 26.13 43.69
contains 20.8 hours of real audio and 17.2 hours of deep- AudioMAE 11.07 30.47 75.40
fake audio[18]. The In-the-Wild dataset is collected from the XLS-R 2.07 11.78 29.19
Internet and consists of audio from various realistic scenarios. Hubert 6.78 14.76 27.48
WavLM 7.24 15.53 30.50
3.2. Implement details Whisper 5.59 23.28 42.73

All audio samples are trimmed or padded to 4s and resam-

pled to 16kHz for all acoustic features except the neural audio and the In-the-Wild dataset. However, the results from the
codec models EnCodec and AudioDec, which support sam- ASV2021 DF evaluation dataset and the In-the-Wild dataset
ple rate of 24kHz. For all handcrafted features, the window show that the ADD system trained on the ASV2019 dataset
length and hop length are set to 25 ms and 10 ms, respectively. is poorly generalized. The ASV2021 DF dataset contains
For speech self-supervised models, we employ the Wav2- samples from various spoofing systems that utilize different
Vec2 XLS-R [19] model pretrained on 128 languages, Hubert- audio codec processing methods. For the In-the-Wild dataset,
base model pretrained on LibriSpeech, WavLM-Base-Plus samples are collected from complex environments outside of
model pretrained on Libri-Light, GigaSpeech and VoxPopuli professional studio and the speech content differs.
datasets. For the neural audio codec models, the continu- In our experiments, the handcrafted features fail to show
ous features of encoder output, instead of the discrete code, reliable discrimination ability in realistic scenario. All the
are used as audio features to prevent information loss. The systems using handcrafted features get ERR greater than 50%
AudioMAE model used in our experiments is pretrained on in the In-the-Wild dataset. The learnable front-end Leaf and
AudioSet. The selected Whisper model is a tiny version pre- SincNet learn filter banks during training, but still generalize
trained on the speech recognition task. Besides, We select the poorly, with EERs 56.69 and 49.70 respectively.
24khz version of the EnCodec and AudioDec model. Audio On the contrary, most deep features show stronger gener-
features extracted from above deep models are output of their alizability. The neural audio codecs EnCodec and AudioDec
encoders respectively. emphasize the compression rate and the fidelity of the de-
For all experiments, we use a ResNet18 as classifier. We coded audio. While underperforming on the ASV2019 LA
train all of our systems with a cross-entropy loss. We use and ASV2021 DF evaluation datasets, these two models get
Adam optimizer with fixed learning rate at 1e-4 and weight an EER of 39.44 and 43.69 on the In-the-Wild dataset. The
decay at 1e-4. We train all the systems 100 epochs. Check- Wav2Vec2 XLS-R model is pretrained on 436K hours of
point with lowest loss on validation set is saved for evaluation. speech in 128 languages, based on which the system achieves
All systems are evaluated by equal-error rate (EER). the best EER on the ASV2019 LA and ASV2021 DF datasets.
For results in the In-the-Wild dataset, EER decreases by 21.37
in comparison to the best manual features. The Hubert and
4. RESULT AND ANALYSIS
WavLM features also perform excellently on the In-the-Wild
dataset, where the Hubert feature achieves the best EER
4.1. Single feature
among all single-feature detection systems at 27.48.
Table 1 shows the results of our experiments, where we eval- Of all the deep features, the AudioMAE model pretrained
uate 14 audio features under the same experimental setup and on the audio spectrograms using mask autoencoder shows the
test our systems on 3 datasets. The classification results for all poorest generalization on the ADD task. The EER is 75.40
features on the ASV2019 LA evaluation set are significantly on the In-the-Wild dataset, which is even worse than most of
better than those on the ASV2021 DF evaluation dataset the handcrafted features. The failure might be attributed to

13133

Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on July 11,2024 at 05:11:55 UTC from IEEE Xplore. Restrictions apply.
Table 2. The comparison of EER (%) score on In-the-Wild
dataset.
Model Features In-the-Wild
RawNet2[20] waveform 36.74
RawNet2[18] waveform 33.94
AASIST[20] waveform 34.81
ResNet34[20] XLS-R 46.35
LCNN[20] XLS-R 39.82
(a) (b) Res2Net[20] XLS-R 36.62
ResNet18(ours) XLS-R 29.19
Fig. 1. Visualization of CQT features and Hubert features for ResNet18(ours) Hubert 27.48
real and fake speech in the In-the-Wild dataset. ResNet18(ours) WavLM 30.50
Selection(ours) XLS-R,WavLM,Hubert 25.98
Fusion(ours) XLS-R,WavLM,Hubert 24.27
the pretrained dataset. AudioMAE is pretrained on the Au-
dioset dataset, which contains more universal audio than hu-
man speech, dispersing the ability to discriminate between attends to any other value not only across time and frequency
true and false human speech. The whisper feature also fails to dimension, but also feature dimension. So, the fused feature
generalize well, even though pretrained with more than 680k are better representation for the ResNet18 classifier to get the
hours of unlabeled speech data. This feature is obtained by best EER on the In-the-Wild dataset.
weakly supervised training on the ASR task, which focuses
more on speech content instead of audio signal information. 5. CONCLUSION
To visualize the superior generalizability of deep features
over hand-crafted features, Fig 4.1 shows the visualization of In this paper, we study the association between audio features
CQT features and Hubert features for real and fake speech and the generalizability of the ADD system. First, more audio
in the In-the-Wild dataset using t-SNE. Although difficulty in features are tested and analyzed compared to any other studies
discriminating between real and fake is revealed, the Hubert on the ADD task, including handcrafted features, learnable
feature is more distinguishable than the CQT feature space audio front-end, audio neural codec, audio pretrained model,
where the two categories completely overlap. speech pretrained model, and speech recognition model in
a total of 14 audio features. Experimental results show that
in the In-the-Wild dataset, features of the speech pretraining
4.2. Multi-view feature incorporation
models have good generalization performance while hand-
Based on results from single-view feature experiments, Hu- crafted features generalize poorly. The generalization perfor-
bert, XLS-R, and WavLM features that perform well in the mance of speech features on ADD task comes from the large
ASV2021 datasets are chosen as multi-view feature to further amount of pretraining data as well as the appropriate pretrain-
improve the generalizability of the detection system. ing task. We further improve the generalization ability of the
Table 2 shows results based on incorporating these three model based on the proposed feature selection and feature fu-
deep features on the In-the-Wild dataset. Compared to re- sion methods. The results show that these two methods can
sults either implemented in this work or results from stud- improve the generalizability compared to single features.
ies [20, 18], both proposed approaches are proved beneficial
to significantly improve the model generalizability where the 6. ACKNOWLEDGEMENT
EER reduces from 27.48, to 24.27 by the feature fusion, or to
25.98 by the feature selection. The effectiveness of feature se- We gratefully acknowledge the support of MindSpore, CANN
lection comes from a sample-aware mask mechanism, based (Compute Architecture for Neural Networks) and Ascend AI
on which each individual sample can select the most appro- Processor used for this research.
priate feature, while single-feature detection system provide
no feature selection space. The audio characteristic of the in- 7. REFERENCES
dividual sample is learned to form the mask, which is also
supervised by the detection task. This end-to-end approach [1] Massimiliano Todisco, Xin Wang, Ville Vestman,
guarantees such effectiveness. On the other hand, the success Md Sahidullah, Héctor Delgado, Andreas Nautsch, Ju-
of feature fusion indicates a complementary effect among the nichi Yamagishi, Nicholas Evans, Tomi Kinnunen, and
selected three deep features. Each value in the fused feature Kong Aik Lee, “Asvspoof 2019: Future horizons

13134

Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on July 11,2024 at 05:11:55 UTC from IEEE Xplore. Restrictions apply.
in spoofed and fake audio detection,” arXiv preprint [12] Alexandre Défossez, Jade Copet, Gabriel Synnaeve, and
arXiv:1904.05441, 2019. Yossi Adi, “High fidelity neural audio compression,”
arXiv preprint arXiv:2210.13438, 2022.
[2] Junichi Yamagishi, Xin Wang, Massimiliano Todisco,
Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen [13] Yi-Chiao Wu, Israel D Gebru, Dejan Marković, and
Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Alexander Richard, “Audiodec: An open-source stream-
et al., “Asvspoof 2021: accelerating progress in ing high-fidelity neural audio codec,” in ICASSP
spoofed and deepfake speech detection,” arXiv preprint 2023-2023 IEEE International Conference on Acous-
arXiv:2109.00537, 2021. tics, Speech and Signal Processing (ICASSP). IEEE,
2023, pp. 1–5.
[3] Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin
Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, [14] Po-Yao Huang, Hu Xu, Juncheng Li, Alexei Baevski,
Ye Bai, Cunhang Fan, et al., “Add 2022: the first au- Michael Auli, Wojciech Galuba, Florian Metze, and
dio deep synthesis detection challenge,” in ICASSP Christoph Feichtenhofer, “Masked autoencoders that
2022-2022 IEEE International Conference on Acous- listen,” Advances in Neural Information Processing Sys-
tics, Speech and Signal Processing (ICASSP). IEEE, tems, vol. 35, pp. 28708–28720, 2022.
2022, pp. 9216–9220.
[15] Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai,
[4] Rohan Kumar Das, Jichen Yang, and Haizhou Li, “Long Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrah-
range acoustic features for spoofed speech detection.,” man Mohamed, “Hubert: Self-supervised speech rep-
in Interspeech, 2019, pp. 1058–1062. resentation learning by masked prediction of hidden
[5] Alec Radford, Jong Wook Kim, Tao Xu, Greg Brock- units,” IEEE/ACM Transactions on Audio, Speech, and
man, Christine McLeavey, and Ilya Sutskever, “Robust Language Processing, vol. 29, pp. 3451–3460, 2021.
speech recognition via large-scale weak supervision,” in
[16] Sanyuan Chen, Chengyi Wang, Zhengyang Chen,
International Conference on Machine Learning. PMLR,
Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki
2023, pp. 28492–28518.
Kanda, Takuya Yoshioka, Xiong Xiao, et al., “Wavlm:
[6] Piotr Kawa, Marcin Plata, Michał Czuba, Piotr Large-scale self-supervised pre-training for full stack
Szymański, and Piotr Syga, “Improved deepfake speech processing,” IEEE Journal of Selected Topics in
detection using whisper features,” arXiv preprint Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022.
arXiv:2306.01428, 2023.
[17] Xin Wang, Junichi Yamagishi, Massimiliano Todisco,
[7] Xin Wang and Junichi Yamagishi, “Investigating self- Héctor Delgado, Andreas Nautsch, Nicholas Evans,
supervised front ends for speech spoofing countermea- Md Sahidullah, Ville Vestman, Tomi Kinnunen,
sures,” arXiv preprint arXiv:2111.07725, 2021. Kong Aik Lee, et al., “Asvspoof 2019: A large-scale
public database of synthesized, converted and replayed
[8] Yuankun Xie, Haonan Cheng, Yutian Wang, and Long
speech,” Computer Speech & Language, vol. 64, pp.
Ye, “Learning A Self-Supervised Domain-Invariant
101114, 2020.
Feature Representation for Generalized Audio Deepfake
Detection,” in Proc. INTERSPEECH 2023, 2023, pp. [18] Nicolas M Müller, Pavel Czempin, Franziska Dieck-
2808–2812. mann, Adam Froghyar, and Konstantin Böttinger, “Does
[9] Hemlata Tak, Jose Patino, Massimiliano Todisco, An- audio deepfake detection generalize?,” arXiv preprint
dreas Nautsch, Nicholas Evans, and Anthony Larcher, arXiv:2203.16263, 2022.
“End-to-end anti-spoofing with rawnet2,” in ICASSP [19] Arun Babu, Changhan Wang, Andros Tjandra, Kushal
2021-2021 IEEE International Conference on Acous- Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh,
tics, Speech and Signal Processing (ICASSP). IEEE, Patrick von Platen, Yatharth Saraf, Juan Pino,
2021, pp. 6369–6373. et al., “Xls-r: Self-supervised cross-lingual speech
[10] Mirco Ravanelli and Yoshua Bengio, “Speaker recogni- representation learning at scale,” arXiv preprint
tion from raw waveform with sincnet,” in 2018 IEEE arXiv:2111.09296, 2021.
spoken language technology workshop (SLT). IEEE, [20] Jiangyan Yi, Chenglong Wang, Jianhua Tao, Xiao-
2018, pp. 1021–1028. hui Zhang, Chu Yuan Zhang, and Yan Zhao, “Au-
[11] Neil Zeghidour, Olivier Teboul, Félix de Chaumont dio deepfake detection: A survey,” arXiv preprint
Quitry, and Marco Tagliasacchi, “Leaf: A learn- arXiv:2308.14970, 2023.
able frontend for audio classification,” arXiv preprint
arXiv:2101.08596, 2021.

13135

Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on July 11,2024 at 05:11:55 UTC from IEEE Xplore. Restrictions apply.

BTP Report
No ratings yet
BTP Report
39 pages
AI Audio Deepfake
No ratings yet
AI Audio Deepfake
18 pages
Audio Deepfake (Camera Ready Paper)
No ratings yet
Audio Deepfake (Camera Ready Paper)
13 pages
Audio Deepfake Detection by Using Machine and Deep Learning
No ratings yet
Audio Deepfake Detection by Using Machine and Deep Learning
5 pages
An Improved Feature Extraction For Hindi Language Audio Impersonation Attack Detection
No ratings yet
An Improved Feature Extraction For Hindi Language Audio Impersonation Attack Detection
26 pages
Seminar Report Parthiv
No ratings yet
Seminar Report Parthiv
58 pages
p58 Xie
No ratings yet
p58 Xie
6 pages
Allmodels
No ratings yet
Allmodels
4 pages
Vision Transformer Based Audio Classification Using Patch-Level Feature Fusion
No ratings yet
Vision Transformer Based Audio Classification Using Patch-Level Feature Fusion
5 pages
Audio - Deepfake - Detection - Using - Deep - Learning Paper2
No ratings yet
Audio - Deepfake - Detection - Using - Deep - Learning Paper2
6 pages
Unsupervised Features Learning For Audio Analysis
No ratings yet
Unsupervised Features Learning For Audio Analysis
4 pages
Acoustic Scene Classification Method
No ratings yet
Acoustic Scene Classification Method
4 pages
Audio Deepfake Detection Advances
No ratings yet
Audio Deepfake Detection Advances
15 pages
PHD Thesis Sound Event Detection With Weakly Labelled Data - v2.0
No ratings yet
PHD Thesis Sound Event Detection With Weakly Labelled Data - v2.0
102 pages
Mayank's Survey
No ratings yet
Mayank's Survey
9 pages
Lightweight 1D CNN for Sound Classification
No ratings yet
Lightweight 1D CNN for Sound Classification
10 pages
Applsci 13 08488 v2
No ratings yet
Applsci 13 08488 v2
15 pages
Research Paper Update S6
No ratings yet
Research Paper Update S6
9 pages
Seminar Report Final
No ratings yet
Seminar Report Final
37 pages
Audio Classification
No ratings yet
Audio Classification
6 pages
RBPRATYUSH448
No ratings yet
RBPRATYUSH448
20 pages
Beyond The Illusion Ensemble Learning For Effective Voice Deepfake Detection
No ratings yet
Beyond The Illusion Ensemble Learning For Effective Voice Deepfake Detection
20 pages
Base Paper Audio Deep Fake Detection
No ratings yet
Base Paper Audio Deep Fake Detection
16 pages
Nidhi Chakravarty Mohit Dua: A Lightweight Feature Extraction Technique For Deepfake Audio Detection
No ratings yet
Nidhi Chakravarty Mohit Dua: A Lightweight Feature Extraction Technique For Deepfake Audio Detection
25 pages
A Hybrid CNN-LSTM Approach For Deepfake Audio Detection CRC FINAL
No ratings yet
A Hybrid CNN-LSTM Approach For Deepfake Audio Detection CRC FINAL
6 pages
Espi2015 Article ExploitingSpectro-temporalLoca
No ratings yet
Espi2015 Article ExploitingSpectro-temporalLoca
12 pages
Mrac Paper1a
No ratings yet
Mrac Paper1a
11 pages
Betray Oneself: A Novel Audio Deepfake Detection Model Via Mono-To-Stereo Conversion
No ratings yet
Betray Oneself: A Novel Audio Deepfake Detection Model Via Mono-To-Stereo Conversion
5 pages
Towards End-to-End Synthetic Speech Detection: Member, IEEE Senior Member, IEEE Member, IEEE
No ratings yet
Towards End-to-End Synthetic Speech Detection: Member, IEEE Senior Member, IEEE Member, IEEE
5 pages
BTP Final
No ratings yet
BTP Final
16 pages
Wei 2020 J. Phys. - Conf. Ser. 1453 012085
No ratings yet
Wei 2020 J. Phys. - Conf. Ser. 1453 012085
9 pages
DL For Acoustics
No ratings yet
DL For Acoustics
4 pages
Paper 4-Enhancing Audio Classification Through MFCC
No ratings yet
Paper 4-Enhancing Audio Classification Through MFCC
17 pages
Base Paper 1 (Hybrid Approach)
No ratings yet
Base Paper 1 (Hybrid Approach)
6 pages
Research Paper Update S4
No ratings yet
Research Paper Update S4
4 pages
cmmr2021 24
No ratings yet
cmmr2021 24
10 pages
Paper 10
No ratings yet
Paper 10
18 pages
Audio-Visual Speech Recognition
No ratings yet
Audio-Visual Speech Recognition
4 pages
Implementation Paper
No ratings yet
Implementation Paper
13 pages
CNN Bilstm 2021
No ratings yet
CNN Bilstm 2021
5 pages
Automatic Audio Feature Extraction For Keyword Spotting
No ratings yet
Automatic Audio Feature Extraction For Keyword Spotting
5 pages
Neural Networks for Speech Recognition
No ratings yet
Neural Networks for Speech Recognition
155 pages
Enhancement of Speech Dynamics For Voice Activity PDF
No ratings yet
Enhancement of Speech Dynamics For Voice Activity PDF
15 pages
Data Augmentation1
No ratings yet
Data Augmentation1
9 pages
Voice Recognition System Using Machine L
No ratings yet
Voice Recognition System Using Machine L
7 pages
Evaluating Hand-crafted vs. Learned Features
No ratings yet
Evaluating Hand-crafted vs. Learned Features
5 pages
Analysis of Env Sounds
No ratings yet
Analysis of Env Sounds
132 pages
IJISAE 3 Dr.+Shwetambari+Borade 3 1899
No ratings yet
IJISAE 3 Dr.+Shwetambari+Borade 3 1899
8 pages
1 s2.0 S0950705125007725 Main
No ratings yet
1 s2.0 S0950705125007725 Main
15 pages
Audio Recognition with Deep Learning
No ratings yet
Audio Recognition with Deep Learning
52 pages
Audio-Visual Phoneme Recognition
No ratings yet
Audio-Visual Phoneme Recognition
8 pages
Information 12 00263 v2
No ratings yet
Information 12 00263 v2
15 pages
Final
No ratings yet
Final
35 pages
July 2025: Top Read Articles in Soft Computing
No ratings yet
July 2025: Top Read Articles in Soft Computing
31 pages
Paper 10
No ratings yet
Paper 10
7 pages
Entropy 21 00479 PDF
No ratings yet
Entropy 21 00479 PDF
17 pages
A Deep Learning Framework For Audio Deepfake Detection
No ratings yet
A Deep Learning Framework For Audio Deepfake Detection
12 pages
Large Power Transformer Case Study - Shocklog 298
No ratings yet
Large Power Transformer Case Study - Shocklog 298
2 pages
Assembly Line
No ratings yet
Assembly Line
14 pages
Installation and Upgrade Guide BOYUM IT
No ratings yet
Installation and Upgrade Guide BOYUM IT
13 pages
Uwu2x Guide Better Version Than The One in Uwu2x.zip
No ratings yet
Uwu2x Guide Better Version Than The One in Uwu2x.zip
6 pages
Assistive Technology for Visually Impaired Students
No ratings yet
Assistive Technology for Visually Impaired Students
5 pages
Unit V Java Applets and Graphics Programming Complete
No ratings yet
Unit V Java Applets and Graphics Programming Complete
13 pages
DBMS BC214
No ratings yet
DBMS BC214
3 pages
Layouts
No ratings yet
Layouts
3 pages
E3D Admin Data Sheet-New
100% (1)
E3D Admin Data Sheet-New
2 pages
Cambridge IGCSE Information and Communication Technology Third Edition Graham Brown Ready To Read
100% (3)
Cambridge IGCSE Information and Communication Technology Third Edition Graham Brown Ready To Read
173 pages
Data Science Ai Revision Notes
No ratings yet
Data Science Ai Revision Notes
8 pages
BIM UK Progress
No ratings yet
BIM UK Progress
5 pages
Secure Shell
No ratings yet
Secure Shell
12 pages
EE122 Homework Assignment 1
No ratings yet
EE122 Homework Assignment 1
2 pages
Object-Oriented Testing Methods
No ratings yet
Object-Oriented Testing Methods
5 pages
Chapter IV Sample
No ratings yet
Chapter IV Sample
2 pages
Staffing Solutions for MNCs & Fortune 500
No ratings yet
Staffing Solutions for MNCs & Fortune 500
16 pages
Intersurgical I View Video Laryngoscope
No ratings yet
Intersurgical I View Video Laryngoscope
4 pages
Software Project Management Quiz
No ratings yet
Software Project Management Quiz
191 pages
RRB Technician Grade 1 19 December 2024 Shift 1
No ratings yet
RRB Technician Grade 1 19 December 2024 Shift 1
18 pages
Kundu Ka PDF
No ratings yet
Kundu Ka PDF
59 pages
Prioritizing Web Usability by Jakob Nielsen Hoa Loranger Z
100% (1)
Prioritizing Web Usability by Jakob Nielsen Hoa Loranger Z
482 pages
Et200sp Ha RQ 4x120vdc 230vac 5a Co Ha Manual en-US en-US
No ratings yet
Et200sp Ha RQ 4x120vdc 230vac 5a Co Ha Manual en-US en-US
38 pages
Specification Rdso CCTV RAILWAY
No ratings yet
Specification Rdso CCTV RAILWAY
61 pages
AOTS Ticket Information Form For WS VM
No ratings yet
AOTS Ticket Information Form For WS VM
3 pages
Partitioning in Oracle 1728042170
No ratings yet
Partitioning in Oracle 1728042170
12 pages
Business Communication Report Example
No ratings yet
Business Communication Report Example
25 pages
Credit Management Configuration Guide
No ratings yet
Credit Management Configuration Guide
3 pages
Power System Operation & Control
No ratings yet
Power System Operation & Control
105 pages
Understanding Self-Organizing Maps (SOM)
No ratings yet
Understanding Self-Organizing Maps (SOM)
22 pages