0% found this document useful (0 votes)
72 views17 pages

Spoken Language Processing in Python Chapter1

This document introduces audio data processing in Python. It discusses different audio file formats like mp3 and wav, and how audio is measured in frequency (kHz). It then demonstrates how to open an audio file in Python, convert the soundwave bytes to integers, find the frame rate and timestamps. Finally, it shows how to visualize two sound waves on a single plot to compare them.

Uploaded by

Fgpeqw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
72 views17 pages

Spoken Language Processing in Python Chapter1

This document introduces audio data processing in Python. It discusses different audio file formats like mp3 and wav, and how audio is measured in frequency (kHz). It then demonstrates how to open an audio file in Python, convert the soundwave bytes to integers, find the frame rate and timestamps. Finally, it shows how to visualize two sound waves on a single plot to compare them.

Uploaded by

Fgpeqw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 17

Introduction to

audio data in Python


S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON

Daniel Bourke
Machine Learning Engineer/YouTube
Creator
Dealing with audio les in Python
Different kinds all of audio les
mp3

wav

m4a

ac

Digital sounds measured in frequency (kHz)


1 kHz = 1000 pieces of information per second

SPOKEN LANGUAGE PROCESSING IN PYTHON


Frequency examples
Streaming songs have a frequency of 32 kHz

Audiobooks and spoken language are between 8 and 16 kHz

We can't see audio les so we have to transform them rst

import wave

SPOKEN LANGUAGE PROCESSING IN PYTHON


Opening an audio le in Python
Audio le saved as good-morning.wav

# Import audio file as wave object


good_morning = wave.open("good-morning.wav", "r")

# Convert wave object to bytes


good_morning_soundwave = good_morning.readframes(-1)

# View the wav file in byte form


good_morning_soundwave

b'\xfd\xff\xfb\xff\xf8\xff\xf8\xff\xf7\...

SPOKEN LANGUAGE PROCESSING IN PYTHON


Working with audio is different
Have to convert the audio to something useful

Small sample of audio = large amount of information

SPOKEN LANGUAGE PROCESSING IN PYTHON


Let's practice!
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON
Converting sound
wave bytes to
integers
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON

Daniel Bourke
Machine Learning Engineer/YouTube
Creator
Converting bytes to integers
Can't use bytes

Convert bytes to integers using numpy

import numpy as np

# Convert soundwave_gm from bytes to integers


signal_gm = np.frombuffer(soundwave_gm, dtype='int16')

# Show the first 10 items


signal_gm[:10]

array([ -3, -5, -8, -8, -9, -13, -8, -10, -9, -11], dtype=int16)

SPOKEN LANGUAGE PROCESSING IN PYTHON


Finding the frame rate
Frequency (Hz) = length of wave object array/duration of audio le (seconds)

# Get the frame rate


framerate_gm = good_morning.getframerate()

# Show the frame rate


framerate_gm

48,000

Duration of audio le (seconds) = length of wave object array/frequency (Hz)

SPOKEN LANGUAGE PROCESSING IN PYTHON


Finding sound wave timestamps
# Return evenly spaced values between start and stop
np.linspace(start=1, stop=10, num=10)

array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])

# Get the timestamps of the good morning sound wave


time_gm = np.linspace(start=0,
stop=len(soundwave_gm)/framerate_gm,
num=len(soundwave_gm))

SPOKEN LANGUAGE PROCESSING IN PYTHON


Finding sound wave timestamps
# View first 10 time stamps of good morning sound wave
time_gm[:10]

array([0.00000000e+00, 2.08334167e-05, 4.16668333e-05, 6.25002500e-05,


8.33336667e-05, 1.04167083e-04, 1.25000500e-04, 1.45833917e-04,
1.66667333e-04, 1.87500750e-04])

SPOKEN LANGUAGE PROCESSING IN PYTHON


Let's practice!
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON
Visualizing sound
waves
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON

Daniel Bourke
Machine Learning Engineer/YouTube
Creator
Adding another sound wave
New audio le: good_afternoon.wav

Both are 48 kHz

Same data transformations to all audio les

SPOKEN LANGUAGE PROCESSING IN PYTHON


Setting up a plot
import matplotlib.pyplot as plt

# Initialize figure and setup title


plt.title("Good Afternoon vs. Good Morning")

# x and y axis labels


plt.xlabel("Time (seconds)")
plt.ylabel("Amplitude")

# Add good morning and good afternoon values


plt.plot(time_ga, soundwave_ga, label ="Good Afternoon")
plt.plot(time_gm, soundwave_gm, label="Good Morning",
alpha=0.5)

# Create a legend and show our plot


plt.legend()
plt.show()

SPOKEN LANGUAGE PROCESSING IN PYTHON


SPOKEN LANGUAGE PROCESSING IN PYTHON
Time to visualize!
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON

You might also like