0% found this document useful (0 votes)

97 views29 pages

Acoustics of Speech: Julia Hirschberg CS 4706

This document discusses acoustics of speech and speech analysis. It covers several key topics: 1) How acoustic properties like phrasing, prominence, pitch range convey meaning in speech. Experimental evidence and tools for speech analysis are discussed. 2) Fundamental concepts in acoustics including the nature of sound, periodic and aperiodic waves, speech production mechanisms, and places of articulation. 3) Digital speech analysis including sampling, file formats, pitch tracking, and challenges in analyzing noisy speech. Pitch perception in humans is also briefly covered.

Uploaded by

jcms

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views29 pages

Acoustics of Speech: Julia Hirschberg CS 4706

Uploaded by

jcms

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 29

Acoustics of Speech

Julia Hirschberg
CS 4706

5/22/2019 1
Claim: How things are said can be critical to
understanding
• I.e., Varying phrasing, prominence, pitch range,
speaking rate, pitch contour, voice
quality…conveys meaning
• What is our evidence? How do we prove?
– Observation
– Hypotheses
– Experimentation (perception, production)
– Speech analysis (independent variables)
– Correlation with dependent variable
5/22/2019 2
• What does our data look like?
• What tools do we have for analysis?

5/22/2019 3
What is sound?

• Pressure fluctuations in the air caused by a

musical instrument, a car horn, a voice
– Cause eardrum to move
– Auditory system translates into neural
impulses
– Brain interprets as sound
• Can we tell one sound from another?
• Can we distinguish one particular sound in
‘noise’?
5/22/2019 4
– From a speech-centric point of view, when
sound is not produced by the human voice,
we may term it noise
– Ratio of speech-generated sound to other
simultaneous sound: signal-to-noise ratio

5/22/2019 5
How ‘Loud’ are Common Sounds?

Event Pressure (Pa) Db

Absolute 20 0
Whisper 200 20
Quiet office 2K 40
Conversation 20K 60
Bus 200K 80
Subway 2M 100
Thunder 20M 120
*DAMAGE* 200M 140
5/22/2019 6
Some Sounds are Periodic

• Simple Periodic Waves (sine waves) defined by

– Frequency: how often does pattern repeat per
time unit
• Cycle: one repetition
• Period: duration of cycle
• Frequency=# cycles per time unit, e.g.
– Frequency in Hz = 1sec/period_in_sec
– E.g. 400Hz pitch = 1/.0025 (1 cycle has a period of
.0025; 400 cycles complete in 1 sec)
– Amplitude: peak deviation of pressure from
normal atmospheric pressure
5/22/2019 7
– Phase: timing of waveform relative to a
reference point
• Complex periodic waves
• Cyclic but composed of two or more sine waves
• Fundamental frequency (F0): rate at which largest
pattern repeats (also GCD of component freqs)
• Components not always easily identifiable: power
spectrum graphs amplitude vs. frequency
• Any complex waveform can be analyzed into a set
of sine waves with their own frequencies,
amplitudes, and phases (Fourier’s theorem)
– E.g. some speech sounds (mostly vowels)
cat.wav
5/22/2019 8
Some Sounds are Aperiodic
• Waveforms with random or non-repeating
patterns
– Random aperiodic waveforms: white noise
• Flat spectrum: equal amplitude for all frequency
components
– Transients: sudden bursts of pressure (clicks,
pops, door slams)
• Waveform shows a single impulse (click.wav)
• Fourier analysis shows a flat spectrum
• Some speech sounds, e.g. many consonants
(e.g. cat.wav)
5/22/2019 9
Speech Production

• Voiced and voiceless sounds

• Vocal fold vibration filtered by the Vocal tract
produces complex periodic waveform
– Cycles per sec of lowest frequency
component of signal = fundamental frequency
(F0)
– Fourier analysis yields power spectrum with
component frequencies and amplitudes
• F0 is first (lowest frequency) peak
• Harmonics are resonances of vocal track,
multiples of F0
5/22/2019 10
Vocal fold vibration

[UCLA Phonetics Lab demo]

5/22/2019 11
Places of articulation

alveolar post-alveolar/palatal
dental
velar
uvular
labial
pharyngeal

laryngeal/glottal

http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html
5/22/2019 12
How do we capture speech for analysis?

• Recording conditions
– A quiet office, a sound booth, an anachoic
chamber
• Microphones
• Analog devices (e.g. tape recorders) store and
analyze continuous air pressure variations
(speech) as a continuous signal
• Digital devices (e.g. computers,DAT) first
convert continuous signals into discrete signals
(A-to-D conversion)
5/22/2019 13
• File format:
– .wav, .aiff, .ds, .au, .sph,…
– Conversion programs, e.g. sox
• Storage
– Function of how much information we store
about speech in digitization
• Higher quality, closer to original
• More space (1000s of hours of speech take up a
lot of space)

5/22/2019 14
Sampling

• Sampling rate: how often do we need to

sample?
– At least 2 samples per cycle to capture
periodicity of a waveform component at a
given frequency
• 100 Hz waveform needs 200 samples per sec
• Nyquist frequency: highest-frequency component
captured with a given sampling rate (half the
sampling rate)

5/22/2019 15
Sampling/storage tradeoff

• Human hearing: ~20K top frequency

– Do we really need to store 40K samples per
second of speech?
• Telephone speech: 300-4K Hz (8K sampling)
– But some speech sounds (e.g. fricatives, /f/,
/s/, /p/, /t/, /d/) have energy above 4K!
– Peter/teeter/Dieter
• 44k (CD quality audio) vs.16-22K (usually good
enough to study pitch, amplitude, duration, …)
5/22/2019 16
Sampling Errors

• Aliasing:
– Signal’s frequency higher than half the
sampling rate
– Solutions:
• Increase the sampling rate
• Filter out frequencies above half the sampling rate
(anti-aliasing filter)

5/22/2019 17
Quantization

• Measuring the amplitude at sampling points:

what resolution to choose?
– Integer representation
– 8, 12 or 16 bits per sample
• Noise due to quantization steps avoided by
higher resolution -- but requires more storage
– How many different amplitude levels do we
need to distinguish?
– Choice depends on data and application (44K
16bit stereo requires ~10Mb storage)
5/22/2019 18
– But clipping occurs when input volume is
greater than range representable in digitized
waveform
• Increase the resolution
• Decrease the amplitude

5/22/2019 19
What can we do if our data is ‘noisy’?

• Acoustic filters block out certain frequencies of

sounds
– Low-pass filter blocks high frequency
components of a waveform
– High-pass filter blocks low frequencies
– Reject band (what to block) vs. pass band
(what to let through)
• But if frequencies of two sounds
overlap….source separation

5/22/2019 20
How can we capture pitch contours, pitch
range?
• What is the pitch contour of this utterance? Is
the pitch range of X greater than that of Y?
• Pitch tracking: Estimate F0 over time as fn of
vocal fold vibration
• A periodic waveform is correlated with itself
– One period looks much like another (cat.wav)
– Find the period by finding the ‘lag’ (offset)
between two windows on the signal for which
the correlation of the windows is highest
– Lag duration (T) is 1 period of waveform
– Inverse is F0 (1/T)
5/22/2019 21
• Errors to watch for:
– Halving: shortest lag calculated is too long
(underestimate pitch)
– Doubling: shortest lag too short (overestimate
pitch)
– Microprosody errors (e.g. /v/)

5/22/2019 22
Sample Analysis File: Pitch Track Header

• version 1
• type_code 4
• frequency 12000.000000
• samples 160768
• start_time 0.000000
• end_time 13.397333
• bandwidth 6000.000000
• dimensions 1
• maximum 9660.000000
• minimum -17384.000000
• time Sat Nov 2 15:55:50 1991
• operation record: padding xxxxxxxxxxxx
5/22/2019 23
Sample Analysis File: Pitch Track Data

(F0 Pvoicing Energy A/C Score)

• 147.896 1 2154.07 0.902643
• 140.894 1 1544.93 0.967008
• 138.05 1 1080.55 0.92588
• 130.399 1 745.262 0.595265
• 0 0 567.153 0.504029
• 0 0 638.037 0.222939
• 0 0 670.936 0.370024
• 0 0 790.751 0.357141
• 141.215 1 1281.1 0.904345
5/22/2019 24
Pitch Perception

• But do pitch trackers capture what humans perceive?

• Auditory system’s perception of pitch is non-linear
– Sounds at lower frequencies with same difference in
absolute frequency sound more different than those at
higher frequencies (male vs. female speech)
– Bark scale (Zwicker) and other models of perceived
difference

5/22/2019 25
How do we capture loudness/intensity?

• Is one utterance louder than another?

• Energy closely correlated experimentally with
perceived loudness
• For each window, square the amplitude values
of the samples, take their mean, and take the
root of that mean (RMS energy)
– What size window?
– Longer windows produce smoother amplitude
traces but miss sudden acoustic events

5/22/2019 26
Perception of Loudness

• But the relation is non-linear: sones or decibels (dB)

– Differences in soft sounds more salient than loud
– Intensity proportional to square of amplitude
so…intensity of sound with pressure x vs. reference
sound with pressure r = x2/r2
– bel: base 10 log of ratio
– decibel: 10 bels
– dB = 10log10 (x2/r2)
– Absolute (20 Pa, lowest audible pressure fluctuation of
1000 Hz tone), typical threshold level for tone at frequency

5/22/2019 27
How do we capture….

• For utterances X and Y

• Pitch contour: Same or different?
• Pitch range: Is X larger than Y?
• Duration: Is utterance X longer than utterance
Y?
• Speaker rate: Is the speaker of X speaking
faster than the speaker of Y?
• Voice quality….

5/22/2019 28
Next Class

• Tools for the Masses: Read the Praat tutorial

• Download Praat from the course syllabus page
and play with a speech file (e.g.
http://www.cs.columbia.edu/~julia/cs4706/cc_00
1_sadness_1669.04_August-second-.wav or
record your own)

5/22/2019 29

Acoustics of Speech: Julia Hirschberg CS 4706
No ratings yet
Acoustics of Speech: Julia Hirschberg CS 4706
30 pages
Acoustics and Digital Signal Processing
No ratings yet
Acoustics and Digital Signal Processing
42 pages
Speech Processing Course Guide
No ratings yet
Speech Processing Course Guide
54 pages
Lec2 Audition
No ratings yet
Lec2 Audition
37 pages
Lecture 3
No ratings yet
Lecture 3
7 pages
Understanding Sound Waves and Properties
No ratings yet
Understanding Sound Waves and Properties
19 pages
Digital Audio Foundations PDF
No ratings yet
Digital Audio Foundations PDF
22 pages
Introduction To Physics of Sound
No ratings yet
Introduction To Physics of Sound
48 pages
2012minimodule Lecture1 PDF
No ratings yet
2012minimodule Lecture1 PDF
6 pages
Introduction To Acoustics
No ratings yet
Introduction To Acoustics
7 pages
Audproc 2
No ratings yet
Audproc 2
40 pages
Acoustic and Auditory Phonetics: Jeffrey Heinz Heinz@udel - Edu
No ratings yet
Acoustic and Auditory Phonetics: Jeffrey Heinz Heinz@udel - Edu
19 pages
Acoustic Phonetics
No ratings yet
Acoustic Phonetics
4 pages
Week06 Acoustics LING2004-2024 Handout
No ratings yet
Week06 Acoustics LING2004-2024 Handout
23 pages
Understanding 440 Hz in Music
No ratings yet
Understanding 440 Hz in Music
10 pages
Phonetics and Phonology Explained
No ratings yet
Phonetics and Phonology Explained
21 pages
JNegreira Intro 7nov18 VTAN01
No ratings yet
JNegreira Intro 7nov18 VTAN01
45 pages
Acoustic Theory Speech Production
100% (1)
Acoustic Theory Speech Production
24 pages
IMT 2 Tue
No ratings yet
IMT 2 Tue
19 pages
L1 - L2 Acoustics
No ratings yet
L1 - L2 Acoustics
73 pages
Pitch Detection of Voice Signals
No ratings yet
Pitch Detection of Voice Signals
24 pages
Acoustics for Sound Enthusiasts
No ratings yet
Acoustics for Sound Enthusiasts
56 pages
1 - Sound - Student 3
No ratings yet
1 - Sound - Student 3
11 pages
Understanding Sound: Physics and Psychology
No ratings yet
Understanding Sound: Physics and Psychology
15 pages
Auditary Phonetics
No ratings yet
Auditary Phonetics
5 pages
01a Basics of Arch-Acoustics
No ratings yet
01a Basics of Arch-Acoustics
14 pages
Introduction (UCS749)
No ratings yet
Introduction (UCS749)
59 pages
Acoustic Phonetics PDF
100% (2)
Acoustic Phonetics PDF
82 pages
Pitch Detection of Speech Signals (Project Report)
No ratings yet
Pitch Detection of Speech Signals (Project Report)
9 pages
Understanding Resonance in Waves
No ratings yet
Understanding Resonance in Waves
23 pages
Fund Acoustics
100% (1)
Fund Acoustics
56 pages
Musical: Acoustics & Psychoacoustics
100% (1)
Musical: Acoustics & Psychoacoustics
39 pages
15 Resonance
No ratings yet
15 Resonance
25 pages
Acoustics and Illumination
100% (1)
Acoustics and Illumination
109 pages
3.2 Automatic Speech Recognition
No ratings yet
3.2 Automatic Speech Recognition
151 pages
Steve Harris+Joern Nettingsmeier-Audio Engineering
No ratings yet
Steve Harris+Joern Nettingsmeier-Audio Engineering
57 pages
Sound
No ratings yet
Sound
17 pages
Sound Production and Characteristics
No ratings yet
Sound Production and Characteristics
7 pages
15 Resonance
No ratings yet
15 Resonance
25 pages
The Technology of Computer Music 1969 PDF
No ratings yet
The Technology of Computer Music 1969 PDF
196 pages
Acoustics ARCH 255 - Liapu Wasif 8 10
No ratings yet
Acoustics ARCH 255 - Liapu Wasif 8 10
3 pages
Understanding Speech Analysis Techniques
No ratings yet
Understanding Speech Analysis Techniques
32 pages
Loudness and Amplitude
No ratings yet
Loudness and Amplitude
10 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
Chapter 2 SOUND AUDIO Systems
No ratings yet
Chapter 2 SOUND AUDIO Systems
58 pages
Basics of Acoustics 1
No ratings yet
Basics of Acoustics 1
25 pages
HG3052 SpeechSynthesisAndRecognition Lecture 11 Update2019-20
No ratings yet
HG3052 SpeechSynthesisAndRecognition Lecture 11 Update2019-20
78 pages
Good Day
No ratings yet
Good Day
15 pages
A-Chap 2.sound - 714
No ratings yet
A-Chap 2.sound - 714
52 pages
Understanding Acoustic Phonetics
No ratings yet
Understanding Acoustic Phonetics
5 pages
Decibel Measures in Architectural Acoustics
No ratings yet
Decibel Measures in Architectural Acoustics
46 pages
Understanding Sound Waves and Measurements
No ratings yet
Understanding Sound Waves and Measurements
2 pages
Communication Acoustics Karjalainen
100% (2)
Communication Acoustics Karjalainen
322 pages
ISF Milk Tank
No ratings yet
ISF Milk Tank
2 pages
Nace Mr0175
100% (4)
Nace Mr0175
37 pages
Rust 2126 The Hot Air Balloon Update X64 Version Downloadl PDF
No ratings yet
Rust 2126 The Hot Air Balloon Update X64 Version Downloadl PDF
4 pages
T5 Worksheet 5
No ratings yet
T5 Worksheet 5
4 pages
Past Paper Practice - Paper 1 English
No ratings yet
Past Paper Practice - Paper 1 English
2 pages
IA - Electrical Installation and Maintenance NC II 20151119
100% (1)
IA - Electrical Installation and Maintenance NC II 20151119
25 pages
Rocky Livestock Bylaw Hearing Notice
No ratings yet
Rocky Livestock Bylaw Hearing Notice
3 pages
Cyber Security Trends in India - GenXCoders
No ratings yet
Cyber Security Trends in India - GenXCoders
19 pages
Command in War Creveld PDF
No ratings yet
Command in War Creveld PDF
2 pages
Numerical Solution of Batch Crystallization Models: Qamar S., Seidel-Morgenstern A
No ratings yet
Numerical Solution of Batch Crystallization Models: Qamar S., Seidel-Morgenstern A
6 pages
Trabajo Practico de Ingles N°3
No ratings yet
Trabajo Practico de Ingles N°3
4 pages
SPJ Code of Ethics Overview
No ratings yet
SPJ Code of Ethics Overview
7 pages
Thesis Help for Public Health Dentistry
100% (3)
Thesis Help for Public Health Dentistry
5 pages
HACCP WORKBOOK 2021 Eng
No ratings yet
HACCP WORKBOOK 2021 Eng
14 pages
Halloween Past Simple. Flyers 2. A2 y B1
No ratings yet
Halloween Past Simple. Flyers 2. A2 y B1
1 page
Medically Important Pathogens: Mycology
No ratings yet
Medically Important Pathogens: Mycology
17 pages
Nce 003452
No ratings yet
Nce 003452
939 pages
Struktural BB6 Pelabuhan Ratu (Kam)
No ratings yet
Struktural BB6 Pelabuhan Ratu (Kam)
29 pages
AVEVA Edge License Activation Guide
No ratings yet
AVEVA Edge License Activation Guide
8 pages
Jessica-PRACTICAL FILE
No ratings yet
Jessica-PRACTICAL FILE
26 pages
777 Maintenance & Operations Guide
No ratings yet
777 Maintenance & Operations Guide
10 pages
Telsta A-28D Aerial Lift Specifications
No ratings yet
Telsta A-28D Aerial Lift Specifications
1 page
Class 7 Maths Chapter 6 Triangle and Its Properties Important Questions
No ratings yet
Class 7 Maths Chapter 6 Triangle and Its Properties Important Questions
7 pages
Jessica Clayton Resume
No ratings yet
Jessica Clayton Resume
2 pages
The Effect of Modern Offices Automation On The Productivity of Secretaries in Government Parastatals in Enugu State
No ratings yet
The Effect of Modern Offices Automation On The Productivity of Secretaries in Government Parastatals in Enugu State
13 pages
Grocery Price List for Shoppers
No ratings yet
Grocery Price List for Shoppers
3 pages
Normal Products Price List - W.E.F. 12.02.25 Updated
No ratings yet
Normal Products Price List - W.E.F. 12.02.25 Updated
3 pages
Tetrapakcasestudy 130923142535 Phpapp01
0% (1)
Tetrapakcasestudy 130923142535 Phpapp01
29 pages
IPCRF: Results-Based Management Guide
No ratings yet
IPCRF: Results-Based Management Guide
14 pages
Marketing Strategy of Nestle: BBA LLL Ali Raza 14-Arid-4830
No ratings yet
Marketing Strategy of Nestle: BBA LLL Ali Raza 14-Arid-4830
42 pages

Acoustics of Speech: Julia Hirschberg CS 4706

Uploaded by

Acoustics of Speech: Julia Hirschberg CS 4706

Uploaded by

Acoustics of Speech

• Pressure fluctuations in the air caused by a

Event Pressure (Pa) Db

• Simple Periodic Waves (sine waves) defined by

• Voiced and voiceless sounds

[UCLA Phonetics Lab demo]

• Sampling rate: how often do we need to

• Human hearing: ~20K top frequency

• Measuring the amplitude at sampling points:

• Acoustic filters block out certain frequencies of

(F0 Pvoicing Energy A/C Score)

• But do pitch trackers capture what humans perceive?

• Is one utterance louder than another?

• But the relation is non-linear: sones or decibels (dB)

• For utterances X and Y

• Tools for the Masses: Read the Praat tutorial

You might also like