0% found this document useful (0 votes)

2 views

Chapter 1 Introduction

Uploaded by

fmlomat

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Chapter 1 Introduction

Uploaded by

fmlomat

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Speech Processing

Course Code : CS300

Course Overview

Course Specification

Course Plan
Chapter 1
Introduction
Simple Period Waves (sine waves)
• Characterized by: 0.99

• period: T
• amplitude A
• phase  0

• Fundamental frequency in cycles

per second, or Hz
• F0=1/T
–0.99
0 0.02
Time (s)

1 cycle
Simple periodic waves

 Computing the frequency of a wave:

• 5 cycles in .5 seconds = 10 cycles/second = 10 Hz
 Amplitude:
• 1
 Equation:
• Y = A sin(2ft)
Speech sound waves

A little piece from the waveform of the vowel [iy]

Y axis:
•Amplitude = amount of air pressure at that time point
•Positive is compression
•Zero is normal air pressure,
•negative is rarefaction ( ‫(تخلخالت‬
Digitizing Speech
Analog to
Digital
Converter
Digitizing Speech

Analog-to-digital conversion Or A/D conversion.

Three steps
• Sampling
• Quantization
• Coding

Sampler Quantizer Encoder

Mic
Sampling
 Measuring amplitude of signal at time t
 The sampling rate needs to have at least two samples for each
cycle

• Roughly speaking, one for the positive and one for the
negative half of each cycle.
• More than two sample per cycle is ok
• Less than two samples will cause frequencies to be missed
• So the maximum frequency that can be measured is one
that is half the sampling rate.
• The maximum frequency for a given sampling rate called
Nyquist frequency
Sampling
Original signal in red:

If measure at green dots, will

see a lower frequency wave
and miss the correct higher
frequency one!
Sampling
In practice, then, we use the following sample rates.
• 16,000 Hz (samples/sec) Microphone (“Wideband”):
• 8,000 Hz (samples/sec) Telephone
Why?
 Need at least 2 samples per cycle
 max measurable frequency is half sampling rate
 Human speech < 10,000 Hz, so need max 20K
 Telephone filtered at 4K, so 8K is enough
Sampling Theorem:
Sampling Frequency = 2 * maximum frequency of the signal

fs ≥ 2fm
Where fs is the sampling frequency
and fm is the maximum frequency of the signal to be sampled.
Quantization
Definition:
“Representing the real value of each amplitude as an integer”

8-bit (-128 to 127) or 16-bit (-32768 to 32767)

Formats:
16 bits PCM (Pulse Code Modulation)
8 bits log compression
Headers:
Raw (no header) 40 byte
header
Microsoft: filename.wav
Sun: filename.au
WAV format
Fundamental frequency

Waveform of the vowel [iy]

(10 reps in .03875 secs)

Frequency: repetitions/second of a wave

• Above vowel has 10 repetitions in .03875 secs
• So freq is 10/.03875 = 258 Hz
• This is speed that vocal folds move, hence voicing
• Each peak corresponds to an opening of the vocal folds
• The frequency of the complex wave is called the fundamental
frequency of the wave or F0
Amplitude
• We need a way to talk about the amplitude of a
region of a signal (frame) over tune.
• We can’t just average all the values. Why not?
Because the Average ≈ Zero
• So we often talk about the Root Mean Square
(RMS) amplitude
N 2
x[i]
ARMS  
i1
N
“The square Root of the Mean of the Squares of the
samples”
Power and Intensity
Power: related to square of amplitude

1 N
Power   x[i]2
N i1

Intensity in air: power normalized to auditory

threshold, given in dB.

P0 is the auditory threshold pressure = 2x10-5 pa
N
1
Intensity  10 log10 ( power / Po)  10 log10
NP0
 x[
i 1
i ]2
Plot of Intensity
Pitch and Loudness
• Pitch is the mental sensation or perceptual correlated of F0.

• Relationship between pitch and F0 is not linear;

human pitch perception is most accurate between 100Hz and
1000Hz.
Linear in this range
Logarithmic above 1000Hz
Mel scale is one model of this F0-pitch
mapping.
A Mel is a unit of pitch defined so that pairs of
sounds which are perceptually equidistant in
pitch are separated by an equal number of mels

Frequency in mels = 1127 ln (1 + f/700)

Pitch track

•
Pitch

RETONE: manipulate pitch contour.

Record some speech and listen to what happens when you

adjust its pitch contour.
She just had a baby

• Note that vowels all have regular amplitude peaks

• Stop consonant
Closure followed by release
Notice the silence followed by slight bursts of emphasis: very clear for
[b] of “baby”
• Fricative: noisy. [sh] of “she” at beginning
Fricative
Waves have different frequencies
0.99

0
100 Hz

–0.99
0 0.02
Time (s)

0.99

0
1000 Hz

–0.99
0 0.02
Time (s)
Complex waves: Adding a 100 Hz and 1000 Hz
wave together
0.99

–0.9654
0 0.05
Time (s)

The Discrete Fourier Transform (DFT)

 xn    

   xn   xne

j  jn
Xe
n   n  
Notes:
• X(ejω ) is a complex-valued continuous function

• ω = 2π f [rad/sec]

• f is the digital frequency measured in [ C/S]

The Discrete Fourier Transform (DFT)
Spectrum Analysis (Cont.)

   xn   xne

j  jn
Xe
n  

    xne
 
Xe j

n  
 jn
  x(n)cos(n)  j sin(n)
n  
 
  x(n) cos(n)  j  x(n) sin(n)
n   n  

ESynth - Mark Huckvale - University

College London (speechandhearing.net)
Spectrum

Amplitude
Frequency
components (100 and
1000 Hz) on x-axis

100 Frequency in Hz 1000

Fourier analysis:
any wave can be represented as the
(infinite) sum of sine waves of different
frequencies (amplitude, phase)

Spectrum of one instant in an

actual sound wave: many
20

components across frequency

range
0

0 5000
Frequency (Hz)
Part of [ae] waveform from “had”

• Note complex wave repeating nine times in figure

• Plus smaller waves which repeats 4 times for every large
pattern
• Large wave has frequency of 250 Hz (9 times in .036 seconds)
• Small wave roughly 4 times this, or roughly 1000 Hz
• Two little tiny waves on top of peak of 1000 Hz waves
Back to spectrum
Spectrum represents these freq components computed by
Fourier transform, algorithm which separates out each
frequency component of wave.

x-axis shows frequency, y-axis shows magnitude (in decibels, a

log measure of amplitude)
Peaks at 930 Hz, 1860 Hz, and 3020 Hz.
Spectrogram: spectrum + time dimension
f

Note that: The grey level represents the amplitude or energy

Seeing formants: the spectrogram
Third Formant
F3

Second Formant
F2

First Formant
F1

Formants
Vowels largely distinguished by 2 characteristic pitches (F1 and F2).
One of them (the higher of the two) goes downward throughout
the series iy ih eh ae aa ao ou u
The other goes up for the first four vowels and then down for the
next four.
These are called “Formants" of the vowels, lower is 1st formant, higher is 2nd
formant.
Different vowels have different formants
• Vocal tract as "amplifier"; amplifies different frequencies
• Formants are result of different shapes of vocal tract.
• Any body of air will vibrate in a way that depends on its size and shape.
• Air in vocal tract is set in vibration by action of vocal cords.
• Every time the vocal cords open and close, pulse of air from the lungs,
acting like sharp taps on air in vocal tract,
• Setting resonating cavities into vibration so produce a number of
different frequencies.

Again: why is a speech sound wave composed of these peaks?

Articulatory facts:
1. The vocal cord vibrations create harmonics
2. The mouth is an amplifier
3. Depending on shape of mouth, some harmonics are
amplified more than others
How Formants are produced
• Q: Why do vowels have different pitches if the vocal cords are
same rate?

• A: This is a confusion of frequencies of SOURCE and

frequencies of FILTER!

Source Filter Speech

(Vocal Cords)
(Vocal Tract)

Fundamental
frequency Fo Formants F1, F2, F3
Source-filter model of speech production
Input Filter Output

Glottal spectrum Vocal tract frequency

(Source) response function

Glottal :The vocal cords and opening between them

Source and filter are independent, so:

• Different vowels can have same pitch:
When they are produced by the same cavity structure
(Filter responses are identical).
• The same vowel can have different pitch:
e.g.; Different speakers.
Deriving schwa: how shape of mouth (filter function)
creates peaks!

Basic facts about sound waves:

f = c/
c = speed of sound (approx 35,000 cm/sec)
A sound with =10 meters has low frequency f = 35 Hz
(35,000/1000)
A sound with =2 centimeters has high frequency f =
17,500 Hz (35,000/2)
Resonances of the vocal tract
• The human vocal tract as an open tube
Closed end Open end

Length 17.5 cm.

• Air in a tube of a given length will tend to vibrate at resonance
frequency of tube.
Resonances of the vocal tract
The human vocal tract as an open tube

Closed end Open end

Length 17.5 cm.

Air in a tube of a given length will tend

to vibrate at resonance frequency of
tube.
• If vocal tract is cylindrical tube open at one end
• Standing waves form in tubes
• Waves will resonate if their wavelength corresponds to dimensions of tube
• Constraint: Pressure differential should be maximal at (closed)
glottal end and minimal at (open) lip end.
• Next slide shows what kind of length of waves can fit into a tube with this
contsraint
Max Energy at
Closed ends Min Energy at
Open ends
Computing the 3 formants of schwa
Let the length of the tube be L

F1 = c/1 = c/(4L) = 35,000/4*17.5 = 500Hz

F2 = c/2 = c/(4/3L) = 3c/4L = 3*35,000/4*17.5 = 1500Hz
F3 = c/3 = c/(4/5L) = 5c/4L = 5*35,000/4*17.5 = 2500Hz

So we expect a neutral vowel to have 3 resonances at 500,

1500, and 2500 Hz

These vowel resonances are called Formants

Vowel [i] sung at successively higher pitch.

1 2 3

4 5 6

7
Vocal Tract Simulation
Time Total time
ms Segment Duration
JW Jaw Position
TP Tongue Position
TS Tongue Shape
TA Tongue Expansion
LA Lip Aperture ‫بؤرة الشفاه‬
LP Lip Protrusion ‫نتوء‬
LH Larynx Height ‫عرض الحنجرة‬
GA Glottal Aperture ‫بؤرة لسان المزمار‬
FX Fundamental Frequency
NS Velo-pharyngeal port opening ‫فتحة البلعوم‬
Vocal Tract Simulation

VTDEMO: vocal tract synthesizer

How to read spectrograms

bab: closure of lips lowers all formants: so rapid increase in all

formants at beginning of "bab”
dad: first formant increases, but F2 and F3 slight fall
gag: F2 and F3 come together: this is a characteristic of velars.
Formant transitions take longer in velars than in alveolar or labials
‫حلقى‬ ‫الصوت الساكن‬ ‫شفوى‬
She came back and started again

1. lots of high-freq energy

3. closure for k
4. burst of aspiration for k
5. ey vowel;faint 1100 Hz formant is nasalization
6. bilabial nasal
7. short b closure, voicing barely visible.
8. ae; note upward transitions after bilabial stop at beginning
9. note F2 and F3 coming together for "k"
Phonetic Resources
Phonetic dictionaries
CMU dict
CELEX
Phonetically transcribed corpora
TIMIT
Switchboard
TIMIT
Read speech corpus, time aligned

Switchboard
Spontaneous speech corpus
Telephone conversations between strangers
“They’re kind of in between right now” Time alignments
Summary
Acoustic Phonetics
Waves, sound waves, and spectra
Speech waveforms
F0, pitch, intensity
Spectra
Spectrograms
Formants
Reading spectrograms
Deriving schwa: why are formants where they are
PRAAT
Resources: dictionaries and phonetically-labeled corpora.
Examples

pad

bad

spat
Useful Textbooks
Useful Textbooks (Cont.)
Software Resources
• Snack Speech Toolkit
– http://speech.kth.se/snack/
• OGI Speech Toolkit
• University of Colorado SONIC recognizer
– http://cslr.colorado.edu
• Cambridge Hidden Markov Model Toolkit (HTK)
• CMU Sphinx-II Speech Recognizer
• NIST Speech Recognition Scoring Utilities
• SRI Language Model Toolkit
• CMU / Cambridge Language Model Toolkit
Literature Resources
Conference Proceedings
• International Conference on Acoustics, Speech,
and
Signal Processing (ICASSP)
• International Conference on Spoken Language
Processing (ICSLP)
• Eurospeech
Journal Publications
• Speech Communication
• IEEE Transactions on Speech and Audio
Processing
Useful Website

Internet Institute for Speech and Hearing

Unit 4 Sound and Hearing
No ratings yet
Unit 4 Sound and Hearing
12 pages
BTS Fault Codes
100% (1)
BTS Fault Codes
9 pages
Yamaha RX v495 Rds Schematic
No ratings yet
Yamaha RX v495 Rds Schematic
68 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
30 pages
Basic Acoustics + DSP
No ratings yet
Basic Acoustics + DSP
42 pages
Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
29 pages
Phonetics Acoustic Phonetics
0% (1)
Phonetics Acoustic Phonetics
52 pages
Speech Processing Basics
No ratings yet
Speech Processing Basics
86 pages
Physics of Sound
No ratings yet
Physics of Sound
33 pages
University of Education Lahore
No ratings yet
University of Education Lahore
31 pages
Lecture 3
No ratings yet
Lecture 3
7 pages
Introduction To Physics of Sound
No ratings yet
Introduction To Physics of Sound
48 pages
Lec2 Audition
No ratings yet
Lec2 Audition
37 pages
Speech Lab
No ratings yet
Speech Lab
7 pages
Phonolog Y: The Study of Sound Structure in Language
No ratings yet
Phonolog Y: The Study of Sound Structure in Language
21 pages
Acoustic Phonetics PDF
100% (2)
Acoustic Phonetics PDF
82 pages
15 Resonance
No ratings yet
15 Resonance
25 pages
Resonance: November 4, 2011
No ratings yet
Resonance: November 4, 2011
23 pages
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
No ratings yet
2.2 Speech Processing: - Speech Synthesis. - Speech Recognition. - Speech Coding
7 pages
Acoustic Phonetics 2017-18
No ratings yet
Acoustic Phonetics 2017-18
49 pages
Acoustic-phonetics Simple 2
No ratings yet
Acoustic-phonetics Simple 2
39 pages
Acoustic and Auditory Phonetics: Jeffrey Heinz Heinz@udel - Edu
No ratings yet
Acoustic and Auditory Phonetics: Jeffrey Heinz Heinz@udel - Edu
19 pages
2012minimodule Lecture1 PDF
No ratings yet
2012minimodule Lecture1 PDF
6 pages
Zsiga - Ch6-Physics of Sound
No ratings yet
Zsiga - Ch6-Physics of Sound
11 pages
Physics and Music Week1Slides - 57966
No ratings yet
Physics and Music Week1Slides - 57966
44 pages
Audproc 2
No ratings yet
Audproc 2
40 pages
Acoustic Theory Speech Production
100% (1)
Acoustic Theory Speech Production
24 pages
15 Resonance
No ratings yet
15 Resonance
25 pages
01b Fund Acoustics
No ratings yet
01b Fund Acoustics
56 pages
Speech Sound Production: Recognition Using Recurrent Neural Networks
No ratings yet
Speech Sound Production: Recognition Using Recurrent Neural Networks
20 pages
Acoustic Phonetics: Presenting By: Lon MJ Aeronic S. Vargas
No ratings yet
Acoustic Phonetics: Presenting By: Lon MJ Aeronic S. Vargas
15 pages
WINSEM2024-25_TPHY207L_TH_VL2024250506113_2024-12-13_Reference-Material-III
No ratings yet
WINSEM2024-25_TPHY207L_TH_VL2024250506113_2024-12-13_Reference-Material-III
12 pages
Acoustic Phonetics: Sanjukta Ghosh
No ratings yet
Acoustic Phonetics: Sanjukta Ghosh
19 pages
Introduction To Acoustics
No ratings yet
Introduction To Acoustics
7 pages
General Notes
No ratings yet
General Notes
19 pages
The Reference Frequency That Rule Our Music, 440 HZ
No ratings yet
The Reference Frequency That Rule Our Music, 440 HZ
10 pages
Types of Waveform.
No ratings yet
Types of Waveform.
5 pages
Acoustic Phonetics
No ratings yet
Acoustic Phonetics
30 pages
Acoustics
No ratings yet
Acoustics
18 pages
Acoustic-Phonetics
No ratings yet
Acoustic-Phonetics
4 pages
List of Figures: Second Unit: Audio and Speech Descriptors
No ratings yet
List of Figures: Second Unit: Audio and Speech Descriptors
22 pages
Fund Acoustics
100% (1)
Fund Acoustics
56 pages
IMT_2_Tue
No ratings yet
IMT_2_Tue
19 pages
Audio Production 1
No ratings yet
Audio Production 1
7 pages
3.2 Automatic Speech Recognition.pptx
No ratings yet
3.2 Automatic Speech Recognition.pptx
151 pages
Audio Frequencies
No ratings yet
Audio Frequencies
6 pages
S H Li Speech Analysis
No ratings yet
S H Li Speech Analysis
32 pages
V I I X 10 Log (I/I I P/4 R V F V F ML 2L/n, 2L/n, F nv/2L N 1,2,3,... For A Tube Open at Both Ends. 4L/n, F nv/4L N 1,3,5,... For A Tube Open at Only One End
No ratings yet
V I I X 10 Log (I/I I P/4 R V F V F ML 2L/n, 2L/n, F nv/2L N 1,2,3,... For A Tube Open at Both Ends. 4L/n, F nv/4L N 1,3,5,... For A Tube Open at Only One End
8 pages
Physical Sound Parameters and Subjective Audition Phenomenon
No ratings yet
Physical Sound Parameters and Subjective Audition Phenomenon
37 pages
Basics of Architectural Acoustics: Praveen Suthar
No ratings yet
Basics of Architectural Acoustics: Praveen Suthar
46 pages
EEC367 - Lecture 1 - 2023
No ratings yet
EEC367 - Lecture 1 - 2023
48 pages
Acoustics and Illumination
100% (1)
Acoustics and Illumination
109 pages
HG3052 SpeechSynthesisAndRecognition Lecture 11 Update2019-20
No ratings yet
HG3052 SpeechSynthesisAndRecognition Lecture 11 Update2019-20
78 pages
Lecture 16part 2
No ratings yet
Lecture 16part 2
51 pages
What Is Sound
No ratings yet
What Is Sound
22 pages
Musical Intervals in Speech: Deborah Ross, Jonathan Choi, and Dale Purves
No ratings yet
Musical Intervals in Speech: Deborah Ross, Jonathan Choi, and Dale Purves
6 pages
G6 Final Report
No ratings yet
G6 Final Report
6 pages
How Do I Read A Spectrogram?: Rob's Blog
No ratings yet
How Do I Read A Spectrogram?: Rob's Blog
15 pages
Sound Waves
No ratings yet
Sound Waves
2 pages
Play Guitar: Exploration and Analysis of Harmonic Possibilities
From Everand
Play Guitar: Exploration and Analysis of Harmonic Possibilities
Kevin Kriescher
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Acoustics: The Art of Sound
From Everand
Acoustics: The Art of Sound
Steve Marshall
No ratings yet
Ccna Voice 640-461
50% (2)
Ccna Voice 640-461
346 pages
Get Solutions Manual to accompany A Course in Digital Signal Processing 9780471149613 free all chapters
100% (7)
Get Solutions Manual to accompany A Course in Digital Signal Processing 9780471149613 free all chapters
18 pages
Multirate Signal Processing
No ratings yet
Multirate Signal Processing
54 pages
Preamble: The Purpose of This Course Is To Provide The Basic Concepts and
No ratings yet
Preamble: The Purpose of This Course Is To Provide The Basic Concepts and
2 pages
Image Fusion
100% (9)
Image Fusion
44 pages
DSP Proj Lab
No ratings yet
DSP Proj Lab
15 pages
Acoustic Holography
No ratings yet
Acoustic Holography
17 pages
PDF of Digital Signal Processing Ramesh Babu 2 PDF
No ratings yet
PDF of Digital Signal Processing Ramesh Babu 2 PDF
2 pages
Multimedia Module No: CM3106 Laboratory Worksheet Lab 5 (Week 6) : MATLAB Graphics, Images and Video Formats
No ratings yet
Multimedia Module No: CM3106 Laboratory Worksheet Lab 5 (Week 6) : MATLAB Graphics, Images and Video Formats
7 pages
Digital Signal Processing: Laboratory Manual
No ratings yet
Digital Signal Processing: Laboratory Manual
101 pages
DSP Viva Questions
0% (1)
DSP Viva Questions
2 pages
Unit-4 - Kec503 DSP - 2023-24
No ratings yet
Unit-4 - Kec503 DSP - 2023-24
34 pages
Jubilee Health Insurance List of Siscount Centres: S# Medical Centre Name City Address Discount % Discount On Facility
No ratings yet
Jubilee Health Insurance List of Siscount Centres: S# Medical Centre Name City Address Discount % Discount On Facility
4 pages
Pipelining (DSP Implementation) : Concept
No ratings yet
Pipelining (DSP Implementation) : Concept
7 pages
Dip Assignment No 4
No ratings yet
Dip Assignment No 4
9 pages
Multiarte Signal Processing 1
No ratings yet
Multiarte Signal Processing 1
8 pages
9.4 Slides
No ratings yet
9.4 Slides
9 pages
EIE4413
No ratings yet
EIE4413
4 pages
Lab 5: The FFT and Digital Filtering: 1. Goals
No ratings yet
Lab 5: The FFT and Digital Filtering: 1. Goals
3 pages
TMS320VC5502 Library Reference
No ratings yet
TMS320VC5502 Library Reference
144 pages
Ece Viii Embedded System Design (06ec82) Notes
No ratings yet
Ece Viii Embedded System Design (06ec82) Notes
270 pages
Image Processing and Computer Vision Laboratory - DR - Majharoddin
No ratings yet
Image Processing and Computer Vision Laboratory - DR - Majharoddin
64 pages
Tour Guide: Image Acquisition Image Generation
No ratings yet
Tour Guide: Image Acquisition Image Generation
40 pages
DSP Circular Convolution
No ratings yet
DSP Circular Convolution
5 pages
X 70
No ratings yet
X 70
31 pages
Geometric Transformations
No ratings yet
Geometric Transformations
29 pages
Fourier Transforms and Sampling
No ratings yet
Fourier Transforms and Sampling
4 pages
RDA1846 Datasheet V1.2e
No ratings yet
RDA1846 Datasheet V1.2e
16 pages