Chapter 1 Introduction
Chapter 1 Introduction
Course Specification
Course Plan
Chapter 1
Introduction
Simple Period Waves (sine waves)
• Characterized by: 0.99
• period: T
• amplitude A
• phase 0
1 cycle
Simple periodic waves
Three steps
• Sampling
• Quantization
• Coding
• Roughly speaking, one for the positive and one for the
negative half of each cycle.
• More than two sample per cycle is ok
• Less than two samples will cause frequencies to be missed
• So the maximum frequency that can be measured is one
that is half the sampling rate.
• The maximum frequency for a given sampling rate called
Nyquist frequency
Sampling
Original signal in red:
fs ≥ 2fm
Where fs is the sampling frequency
and fm is the maximum frequency of the signal to be sampled.
Quantization
Definition:
“Representing the real value of each amplitude as an integer”
1 N
Power x[i]2
N i1
•
Pitch
0
100 Hz
–0.99
0 0.02
Time (s)
0.99
0
1000 Hz
–0.99
0 0.02
Time (s)
Complex waves: Adding a 100 Hz and 1000 Hz
wave together
0.99
–0.9654
0 0.05
Time (s)
xn
xn xne
j jn
Xe
n n
Notes:
• X(ejω ) is a complex-valued continuous function
• ω = 2π f [rad/sec]
xn xne
j jn
Xe
n
xne
Xe j
n
jn
x(n)cos(n) j sin(n)
n
x(n) cos(n) j x(n) sin(n)
n n
Amplitude
Frequency
components (100 and
1000 Hz) on x-axis
Fourier analysis:
any wave can be represented as the
(infinite) sum of sine waves of different
frequencies (amplitude, phase)
40
0 5000
Frequency (Hz)
Part of [ae] waveform from “had”
Second Formant
F2
First Formant
F1
Formants
Vowels largely distinguished by 2 characteristic pitches (F1 and F2).
One of them (the higher of the two) goes downward throughout
the series iy ih eh ae aa ao ou u
The other goes up for the first four vowels and then down for the
next four.
These are called “Formants" of the vowels, lower is 1st formant, higher is 2nd
formant.
Different vowels have different formants
• Vocal tract as "amplifier"; amplifies different frequencies
• Formants are result of different shapes of vocal tract.
• Any body of air will vibrate in a way that depends on its size and shape.
• Air in vocal tract is set in vibration by action of vocal cords.
• Every time the vocal cords open and close, pulse of air from the lungs,
acting like sharp taps on air in vocal tract,
• Setting resonating cavities into vibration so produce a number of
different frequencies.
(Vocal Cords)
(Vocal Tract)
Fundamental
frequency Fo Formants F1, F2, F3
Source-filter model of speech production
Input Filter Output
1 2 3
4 5 6
7
Vocal Tract Simulation
Time Total time
ms Segment Duration
JW Jaw Position
TP Tongue Position
TS Tongue Shape
TA Tongue Expansion
LA Lip Aperture بؤرة الشفاه
LP Lip Protrusion نتوء
LH Larynx Height عرض الحنجرة
GA Glottal Aperture بؤرة لسان المزمار
FX Fundamental Frequency
NS Velo-pharyngeal port opening فتحة البلعوم
Vocal Tract Simulation
Switchboard
Spontaneous speech corpus
Telephone conversations between strangers
“They’re kind of in between right now” Time alignments
Summary
Acoustic Phonetics
Waves, sound waves, and spectra
Speech waveforms
F0, pitch, intensity
Spectra
Spectrograms
Formants
Reading spectrograms
Deriving schwa: why are formants where they are
PRAAT
Resources: dictionaries and phonetically-labeled corpora.
Examples
pad
bad
spat
Useful Textbooks
Useful Textbooks (Cont.)
Software Resources
• Snack Speech Toolkit
– http://speech.kth.se/snack/
• OGI Speech Toolkit
• University of Colorado SONIC recognizer
– http://cslr.colorado.edu
• Cambridge Hidden Markov Model Toolkit (HTK)
• CMU Sphinx-II Speech Recognizer
• NIST Speech Recognition Scoring Utilities
• SRI Language Model Toolkit
• CMU / Cambridge Language Model Toolkit
Literature Resources
Conference Proceedings
• International Conference on Acoustics, Speech,
and
Signal Processing (ICASSP)
• International Conference on Spoken Language
Processing (ICSLP)
• Eurospeech
Journal Publications
• Speech Communication
• IEEE Transactions on Speech and Audio
Processing
Useful Website