0% found this document useful (0 votes)

101 views5 pages

10 Audio Processing Tasks To Get You Started With Deep Learning Applications (With Case Studies)

Music transcription aims to convert an audio recording into written musical notation. While challenging, approaches to solve it include parsing the audio file to extract features like onset detection and pitch estimation. These features can then be used to train machine learning models to segment the audio into measures and identify pitch and rhythm. Deep learning has improved the accuracy of music transcription but it remains an difficult open problem.

Uploaded by

Hyndu Chowdary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views5 pages

10 Audio Processing Tasks To Get You Started With Deep Learning Applications (With Case Studies)

Uploaded by

Hyndu Chowdary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

10 Audio Processing Tasks to get you started with Deep Learning

Applications (with Case Studies)

A UD I O PRO C E S S I NG D E E P LE A RNI NG I NT E RM E D I AT E LI S T I C LE PYT HO N S O UND PRO C E S S I NG US E C A S E S

Introduction

Imagine a world where machines understand what you want and how you are feeling when you call at a
customer care – if you are unhappy about something, you speak to a person quickly. If you are looking for
a specific information, you may not need to talk to a person (unless you want to!).

This is going to be the new order of the world – you can already see this happening to a good degree.
Check out the highlights of 2017 in the data science industry. You can see the breakthroughs that deep
learning was bringing in a field which were difficult to solve before. One such field that deep learning has a
potential to help solving is audio/speech processing, especially due to its unstructured nature and vast
impact.

So for the curious ones out there, I have compiled a list of tasks that are worth getting your hands dirty
when starting out in audio processing. I’m sure there would be a few more breakthroughs in time to come
using Deep Learning.

The article is structured to explain each task and its importance. There is also a research paper that goes
in the details of that specific task, along with a case study that would help you get started in solving the
task.

So let’s get cracking!

1. Audio Classification
Audio classification is a fundamental problem in the field of audio processing. The task is essentially to
extract features from the audio, and then identify which class the audio belongs to. Many useful
applications pertaining to audio classification can be found in the wild – such as genre classification,
instrument recognition and artist identification.

This task is also the most explored topic in audio processing. Plenty of papers were published in this field
in the last year. In fact, we have also hosted a practice hackathon for community collaboration for solving
this particular task.

Whitepaper – http://ieeexplore.ieee.org/document/5664796/?reload=true

A common approach to solve an audio classification task is to pre-process the audio inputs to extract
useful features, and then apply a classification algorithm on it. For example, in the case study below we are
given a 5 second excerpt of a sound, and the task is to identify which class does it belong to – whether it
is a dog barking or a drilling sound. As mentioned in the article, an approach to deal with this is to extract
an audio feature called MFCC and then pass it though a neural network to get the appropriate class.

Case Study – https://www.analyticsvidhya.com/blog/2017/08/audio-voice-processing-deep-learning/

2. Audio Fingerprinting

The aim of audio fingerprinting is to determine the digital “summary” of the audio. This is done to identify
the audio from an audio sample. Shazam is an excellent example of an application of audio fingerprinting.
It recognises the music on the basis of the first two to five seconds of a song. However, there are still
situations where the system fails, especially where there is a high amount of background noise.

Whitepaper – http://www.cs.toronto.edu/~dross/ChandrasekharSharifiRoss_ISMIR2011.pdf

To solve this problem, an approach could be to represent the audio in a different manner, so that it is easily
deciphered. Then, we can find out the patterns that differentiate the audio from the background noise. In
the case study below, the author converts raw audio to spectrograms and then uses peak finding and
fingerprint hashing algorithms to define the fingerprints of that audio file.

Case Study – http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/

3. Automatic Music Tagging

Music Tagging is a more complex version of audio classification. Here, we can have multiple classes that
each audio may belong to, aka, a multi-label classification problem. A potential application of this task can
be to create metadata for the audio so that it can be searched later on. Deep learning has helped solve this
task to a certain extent which can be seen in the case study below.

Whitepaper – https://link.springer.com/article/10.1007/s10462-012-9362-y

As seen with most of the tasks, the first step is always to extract features from the audio sample. Then,
sort it according to the nuances of the audio (for example, if the audio contains more instrumental noise
than the singer’s voice, the tag could be “instrumental”). This can be done either by machine learning or
deep learning methods. The case study mentioned below uses deep learning to solve the problem,
specifically convolution recurrent neural network along with Mel Frequency Extraction.

Case Study – https://github.com/keunwoochoi/music-auto_tagging-keras

4. Audio Segmentation

Segmentation literally means dividing a particular object into parts (or segments) based on a defined set of
characteristics. Segmentation, especially for audio data analysis, is an important pre-processing step. This
is because we can segment a noisy and lengthy audio signal into short homogeneous segments (handy
short sequences of audio) which are used for further processing. An application of the task is heart sound
segmentation, i.e. to identify sounds specific to the heart.

Whitepaper – http://www.mecs-press.org/ijitcs/ijitcs-v6-n11/IJITCS-V6-N11-1.pdf

We can convert this into a supervised learning problem, where each time stamp can be classified on the
basis of the segments required. Then we can apply an audio classification approach to solve the problem.
In the case study below, the task is to segment the heart sound into two segments (lub and dub), so that
we can identify an anomaly in each segment. It can be solved by using audio feature extraction and then
deep learning can be applied for classification.

Case Study – https://www.analyticsvidhya.com/blog/2017/11/heart-sound-segmentation-deep-learning/

5. Audio Source Separation

Audio Source Separation consists of isolating one or more source signals from a mixture of signals. One of
the most common applications of this is identifying the lyrics from the audio for simultaneous translation
(karaoke, for instance). This is a classic example shown in Andrew Ng’s machine learning course where he
separates the sound of the speaker from the background music.

Whitepaper – http://ijcert.org/ems/ijcert_papers/V3I1103.pdf

A typical usage scenario involves:

loading an audio file

computing a time-frequency transform to obtain a spectrogram, and
using some of the source separation algorithm (such as non-negative matrix factorization) to obtain a
time-frequency mask

The mask is then multiplied with the spectrogram and the result is converted back to the time domain.

Case Study – https://github.com/IoSR-Surrey/untwist

6. Beat Tracking

As the name suggests, the goal here is to track the location of each beat in a collection of audio files. Beat
tracking can be utilized to automate time-consuming tasks that must be completed in order to synchronize
events with music. It is useful in various applications, such as video editing, audio editing, and human-
computer improvisation.

Whitepaper – https://www.audiolabs-erlangen.de/content/05-fau/professor/00-mueller/01-
students/2012_GroschePeter_MusicSignalProcessing_PhD-Thesis.pdf
An approach to solve beat tracking can be to be parse the audio file and use an onset detection algorithm
to track the beats. Although the techniques used to for onset detection rely heavily on audio feature
engineering and machine learning, deep learning can easily be used here to optimize the results.

Case Study – https://github.com/adamstark/BTrack

7. Music Recommendation

Thanks to the internet, we now have millions of songs we can listen to anytime. Ironically, this has made it
even harder to discover new music because of the plethora of options out there. Music recommendation
systems help deal with this information overload by automatically recommending new music to listeners.
Content providers like Spotify and Saavn have developed highly sophisticated music recommendation
engines. These models leverage the user’s past listening history among many other features to build
customized recommendation lists.

Whitepaper – https://pdfs.semanticscholar.org/7442/c1ebd6c9ceafa8979f683c5b1584d659b728.pdf

We can tackle the challenge of customizing listening preferences by training a regression/deep learning
model. This can be used to predict the latent representations of songs that were obtained from a
collaborative filtering model. This way, we could predict the representation of a song in the collaborative
filtering space, even if no usage data was available.

Case Study – http://benanne.github.io/2014/08/05/spotify-cnns.html

8. Music Retrieval

One of the most difficult tasks in audio processing, Music Retrieval essentially aims to build a search
engine based on audio. Although we can do this by solving sub-tasks like audio fingerprinting, this task
encompasses much more that that. For example, we also have to solve different smaller tasks for different
types of music retrieval (timbre detection would be great for gender identification). Currently, there is no
other system that has been developed to match the industry expected standards.

Whitepaper – http://www.nowpublishers.com/article/Details/INR-042

The task of music retrieval is divided into smaller and simpler steps, which include tonal analysis (e.g.
melody and harmony) and rhythm or tempo (e.g. beat tracking). Then, on the basis of these individual
analysis, information is extracted which is used for retrieval of similar audio samples.

Case Study – https://youtu.be/oGGVvTgHMHw

9. Music Transcription

Music Transcription is another challenging audio processing task. It comprises of annotating audio and
creating a kind of “sheet” for generating music from it at a later point of time. The manual effort involved in
transcribing music from recordings can be vast. It varies enormously depending on the complexity of the
music, how good our listening skills are and how detailed we want our transcription to be.

Whitepaper – http://ieeexplore.ieee.org/abstract/document/7955698
The approach for music transcription is similar to that of speech recognition, where musical notes are
transcribed into lyrical excerpts of instruments.

Case Study – https://youtu.be/9boJ-Ai6QFM

10. Onset Detection

Onset detection is the first step in analysing an audio/music sequence. For most of the tasks mentioned
above, it is somewhat necessary to perform onset detection, i.e. detecting the start of an audio event.
Onset detection was essentially the first task that researchers intended to solve in audio processing.

Whitepaper – http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.332.989&rep=rep1&type=pdf

Onset detection is typically done by:

computing a spectral novelty function

finding peaks in the spectral novelty function
backtracking from each peak to a preceding local minimum. Backtracking can be useful for finding
segmentation points such that the onset occurs shortly after the beginning of the segment

Case Study – https://musicinformationretrieval.com/onset_detection.html

End Notes

In this article, I have mentioned a few tasks that can be looked at when solving audio processing
problems. I hope you find the article insightful in dealing with audio/speech related projects.

Learn, engage , hack and get hired!

Article Url - https://www.analyticsvidhya.com/blog/2018/01/10-audio-processing-projects-applications/

Faizan Shaikh
Faizan is a Data Science enthusiast and a Deep learning rookie. A recent Comp. Sc. undergrad, he aims
to utilize his skills to push the boundaries of AI research.

Paper 10
No ratings yet
Paper 10
9 pages
Audio Deep Learning Made Simple (Part 2) - Why Mel Spectrograms Perform Better - Towards Data Science
No ratings yet
Audio Deep Learning Made Simple (Part 2) - Why Mel Spectrograms Perform Better - Towards Data Science
16 pages
Roadmap To Mastering Applied AI - ML For Audio & Music (2025-2026)
No ratings yet
Roadmap To Mastering Applied AI - ML For Audio & Music (2025-2026)
11 pages
2021 Deep Learning Audio Book
No ratings yet
2021 Deep Learning Audio Book
38 pages
Audio Analysis in Healthcare Using ML
No ratings yet
Audio Analysis in Healthcare Using ML
74 pages
SNS - Final Project Report
No ratings yet
SNS - Final Project Report
19 pages
Deep Learning For Audio Signal Processing
No ratings yet
Deep Learning For Audio Signal Processing
14 pages
Audio Classification
No ratings yet
Audio Classification
6 pages
Audio Deep Learning Made Simple (Part 1) - State-of-the-Art Techniques - Towards Data Science
No ratings yet
Audio Deep Learning Made Simple (Part 1) - State-of-the-Art Techniques - Towards Data Science
20 pages
Samsung Prism PPT 2
No ratings yet
Samsung Prism PPT 2
11 pages
Deep Learning for Music Processing
No ratings yet
Deep Learning for Music Processing
152 pages
Paper 4-Enhancing Audio Classification Through MFCC
No ratings yet
Paper 4-Enhancing Audio Classification Through MFCC
17 pages
Audio Object Detection with VGGish
No ratings yet
Audio Object Detection with VGGish
6 pages
Multimedia Auditory Signal Analysis
No ratings yet
Multimedia Auditory Signal Analysis
17 pages
DL For Acoustics
No ratings yet
DL For Acoustics
4 pages
Samyuktha 033
No ratings yet
Samyuktha 033
10 pages
Mrac Paper1a
No ratings yet
Mrac Paper1a
11 pages
Audio Deep Learning Made Simple - Sound Classification, Step-By-step - Towards Data Science
No ratings yet
Audio Deep Learning Made Simple - Sound Classification, Step-By-step - Towards Data Science
31 pages
Deep Learning Audio Classification
No ratings yet
Deep Learning Audio Classification
25 pages
Sound Classification
No ratings yet
Sound Classification
5 pages
Audio Recognition with Deep Learning
No ratings yet
Audio Recognition with Deep Learning
52 pages
Predicting Singer Voice Using Convolutional Neural Network
No ratings yet
Predicting Singer Voice Using Convolutional Neural Network
17 pages
Nietjet 0602S 2018 003
No ratings yet
Nietjet 0602S 2018 003
5 pages
10 1109@JSTSP 2019 2909479
No ratings yet
10 1109@JSTSP 2019 2909479
13 pages
PHD Thesis Sound Event Detection With Weakly Labelled Data - v2.0
No ratings yet
PHD Thesis Sound Event Detection With Weakly Labelled Data - v2.0
102 pages
Deep Learning for Audio Noise Detection
No ratings yet
Deep Learning for Audio Noise Detection
29 pages
Urban Sound Classification For Audio Analysis Using Long Short-Term Memory
No ratings yet
Urban Sound Classification For Audio Analysis Using Long Short-Term Memory
11 pages
DL Report
No ratings yet
DL Report
16 pages
Panns: Large-Scale Pretrained Audio Neural Networks For Audio Pattern Recognition
No ratings yet
Panns: Large-Scale Pretrained Audio Neural Networks For Audio Pattern Recognition
15 pages
Audio Signal Analysis in Industrial Settings
No ratings yet
Audio Signal Analysis in Industrial Settings
11 pages
Speech Chapter 4
No ratings yet
Speech Chapter 4
41 pages
Wei 2020 J. Phys. - Conf. Ser. 1453 012085
No ratings yet
Wei 2020 J. Phys. - Conf. Ser. 1453 012085
9 pages
Data Augmentation1
No ratings yet
Data Augmentation1
9 pages
APPFDL
No ratings yet
APPFDL
9 pages
BTP Final
No ratings yet
BTP Final
16 pages
Report PS12 Underwater Domain Awareness
No ratings yet
Report PS12 Underwater Domain Awareness
22 pages
Digital Signal Processing Report
No ratings yet
Digital Signal Processing Report
20 pages
Report
No ratings yet
Report
41 pages
Environmental Sound Classificationwith Convolutional Neural Networks
No ratings yet
Environmental Sound Classificationwith Convolutional Neural Networks
6 pages
Research Paper Update S6
No ratings yet
Research Paper Update S6
9 pages
Randomly Weighted CNNs For Audio Classification
No ratings yet
Randomly Weighted CNNs For Audio Classification
5 pages
Report DSP
No ratings yet
Report DSP
6 pages
Audiosegment Readthedocs Io en Latest
No ratings yet
Audiosegment Readthedocs Io en Latest
23 pages
Guide To YAMNet - Sound Event Classifier
No ratings yet
Guide To YAMNet - Sound Event Classifier
10 pages
Chord Detection Using Deep Learning
No ratings yet
Chord Detection Using Deep Learning
7 pages
Lightweight 1D CNN for Sound Classification
No ratings yet
Lightweight 1D CNN for Sound Classification
10 pages
Audio Signal Processing For Machine Learning
No ratings yet
Audio Signal Processing For Machine Learning
15 pages
Audio Annotation 3' 4' 5'
No ratings yet
Audio Annotation 3' 4' 5'
2 pages
Acoustic Scene Classification Method
No ratings yet
Acoustic Scene Classification Method
4 pages
Seminar Report - 3sem
No ratings yet
Seminar Report - 3sem
34 pages
1 s20 S0957417423010229 Main
No ratings yet
1 s20 S0957417423010229 Main
35 pages
Ref1 - Audio-Language Models For Audio-Centric Tasks - A Survey
No ratings yet
Ref1 - Audio-Language Models For Audio-Centric Tasks - A Survey
21 pages
Audio Annotation
No ratings yet
Audio Annotation
4 pages
Deep Learning
No ratings yet
Deep Learning
56 pages
A Robust Audio Deepfake Detection System Via Multi-View Feature
No ratings yet
A Robust Audio Deepfake Detection System Via Multi-View Feature
5 pages
A Survey of Deep Learning Audio Generation Methods
No ratings yet
A Survey of Deep Learning Audio Generation Methods
14 pages
A System For Improving Data Leakage Detection Based On Association Relationship Between Data Leakage Patterns
No ratings yet
A System For Improving Data Leakage Detection Based On Association Relationship Between Data Leakage Patterns
18 pages
Hybrid Algorithm
No ratings yet
Hybrid Algorithm
20 pages
Latex Code - Summer2021
No ratings yet
Latex Code - Summer2021
10 pages
NS2 Simulation Tutorial
No ratings yet
NS2 Simulation Tutorial
14 pages
Sensors: A Survey of Using Swarm Intelligence Algorithms in Iot
No ratings yet
Sensors: A Survey of Using Swarm Intelligence Algorithms in Iot
27 pages
String Instruction Program
100% (1)
String Instruction Program
9 pages
Catálogo de Sinfonías de Gustav Mahler
No ratings yet
Catálogo de Sinfonías de Gustav Mahler
14 pages
f4 Jue Bu Neng Shi Qu Ni Muse Mid
No ratings yet
f4 Jue Bu Neng Shi Qu Ni Muse Mid
67 pages
A First Book of MOZART
No ratings yet
A First Book of MOZART
25 pages
The Words and Songs of Bessie Smith Billie Holiday and Nina Simone Sound Motion Blues Spirit and African Memory Studies in African American History and Culture 1st Edition Melanie E. Bratcher pdf download
100% (9)
The Words and Songs of Bessie Smith Billie Holiday and Nina Simone Sound Motion Blues Spirit and African Memory Studies in African American History and Culture 1st Edition Melanie E. Bratcher pdf download
71 pages
The Cave Tab With Lyrics by Mumford and Sons Guitar Tab
No ratings yet
The Cave Tab With Lyrics by Mumford and Sons Guitar Tab
2 pages
Copland AppalachianSpring c.1945 Timpani
No ratings yet
Copland AppalachianSpring c.1945 Timpani
7 pages
Heman Dubh Guitar Tab & MIDI
100% (1)
Heman Dubh Guitar Tab & MIDI
5 pages
Shamu Dissertation
No ratings yet
Shamu Dissertation
173 pages
Simon Combined Records
No ratings yet
Simon Combined Records
46 pages
X Japan
No ratings yet
X Japan
10 pages
Booklet PDF
No ratings yet
Booklet PDF
12 pages
Lee Morgan - Im Old Fashioned - Analysis
100% (2)
Lee Morgan - Im Old Fashioned - Analysis
4 pages
CARMEM Clarinet 1 2
No ratings yet
CARMEM Clarinet 1 2
17 pages
Michele
No ratings yet
Michele
9 pages
Piano Practical Grade 8 2021 2022 Online 7 July 2020 PDF
No ratings yet
Piano Practical Grade 8 2021 2022 Online 7 July 2020 PDF
11 pages
How-To Manual for Zimbalist's Tango
No ratings yet
How-To Manual for Zimbalist's Tango
13 pages
Pedal Boards For Sax
100% (1)
Pedal Boards For Sax
9 pages
Temporal Augmentation in Global Music
No ratings yet
Temporal Augmentation in Global Music
24 pages
Will Smith: A Life in Music and Film
No ratings yet
Will Smith: A Life in Music and Film
2 pages
Tarrega's Transcriptions of Chopin For Guitar and Their Influence On His Own Compositions PDF
100% (11)
Tarrega's Transcriptions of Chopin For Guitar and Their Influence On His Own Compositions PDF
46 pages
TulouFluteMethod1 29 PDF
67% (3)
TulouFluteMethod1 29 PDF
41 pages
Ben Monder Lessons
100% (2)
Ben Monder Lessons
3 pages
Lavery Brook2021 MPhil
No ratings yet
Lavery Brook2021 MPhil
139 pages
Blues & Jazz: A Historical Journey
No ratings yet
Blues & Jazz: A Historical Journey
5 pages
Comparative Study of Chinese and Indonesian Music
No ratings yet
Comparative Study of Chinese and Indonesian Music
15 pages
Musliu
No ratings yet
Musliu
7 pages
A Night in Tunisia - Score
No ratings yet
A Night in Tunisia - Score
9 pages
Tonal Harmony Pages
No ratings yet
Tonal Harmony Pages
5 pages
Merry-Go-Round of Life: Howl's Moving Castle
No ratings yet
Merry-Go-Round of Life: Howl's Moving Castle
7 pages

10 Audio Processing Tasks To Get You Started With Deep Learning Applications (With Case Studies)

Uploaded by

10 Audio Processing Tasks To Get You Started With Deep Learning Applications (With Case Studies)

Uploaded by

10 Audio Processing Tasks to get you started with Deep Learning

Applications (with Case Studies)

So let’s get cracking!

Case Study – https://www.analyticsvidhya.com/blog/2017/08/audio-voice-processing-deep-learning/

Case Study – http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/

3. Automatic Music Tagging

Case Study – https://github.com/keunwoochoi/music-auto_tagging-keras

Case Study – https://www.analyticsvidhya.com/blog/2017/11/heart-sound-segmentation-deep-learning/

5. Audio Source Separation

A typical usage scenario involves:

loading an audio file

Case Study – https://github.com/IoSR-Surrey/untwist

Case Study – https://github.com/adamstark/BTrack

Case Study – http://benanne.github.io/2014/08/05/spotify-cnns.html

Case Study – https://youtu.be/oGGVvTgHMHw

Case Study – https://youtu.be/9boJ-Ai6QFM

10. Onset Detection

Onset detection is typically done by:

computing a spectral novelty function

Case Study – https://musicinformationretrieval.com/onset_detection.html

Learn, engage , hack and get hired!

Article Url - https://www.analyticsvidhya.com/blog/2018/01/10-audio-processing-projects-applications/

You might also like