Speechanalysis S h l i
WhatisSpeechAnalysis? What is Speech Analysis?
Analysisofspeechsoundstakingintoconsiderationtheirmethodof y p g
production Thelevelofprocessingbetweenthedigitisedacousticwaveformandthe The level of processing between the digitised acoustic waveform and the acousticfeaturevectors. Th Theextractionof``interesting''informationasanacousticvector t ti f ``i t ti '' i f ti ti t
waveforms
SpeechWaveforms h f
A waveform is a two dimensional representation of a sound. The two dimensions in a waveform display are time and intensity. The vertical dimension is intensity and the horizontal dimension is time. Waveforms are known as time domain representations of sound as they display changes in intensity over time. The intensity dimension actually displays sound pressure. Sound pressure is a measure of the tiny variations in air pressure that we are able to perceive as sound. I t it i th Intensity in these waveforms i a simple li f is i l linear scaling of sound li f d pressure (not dB).
ResonancesandFormants
Resonances are vibratory characteristics of a resonating body. In the case of an air filled tube the resonance characteristics exist even when there is no sound being produced. When we produce vowel sounds the resonances of the vocal tract selectively enhance sound vibrations close to the resonance frequencies and selectively attenuate sound vibrations remote from the resonance frequencies frequencies. This results in peaks in the acoustic spectrum of the resulting speech sound. These acoustic spectral peaks are called formants, particularly when they occur in vowels and vowellike consonants.
Spectrograms Spectrograms permit the examination of the dynamic changes in a Spectrogramspermittheexaminationofthedynamicchangesina
speechspectrum. This is particularly useful for the examination of rapidly changing Thisisparticularlyusefulfortheexaminationofrapidlychanging consonants(eg.stopbursts)andalsoforvoweltransitions(between vowelsandconsonantsandbetweenthetargetsindiphthongs). Spectrograms,usuallyinconjunctionwithwaveforms,areessential duringthesegmentingandlabelingofspeech. Spectrogramsusuallyprovidetheclearestvisualcuestothe boundariesbetweenphonemes. Spectrogramsdonot,however,provideaccuratemeasurementsof vowelformantsasbroadbandspectrogramshaveapoorfrequency resolution(about300Hz)andsothereisahighdegreeofintrinsic errorinformantmeasurementstakenvisuallyfromspectrograms. error in formant measurements taken visually from spectrograms ThatiswhywetendtouseFFTsandLPCsfortheaccurate measurementofformantfrequencies.
Fig:waveformandbroadbandspectrogramoftheword"heard"
Figure:anarrowbandspectrogramoftheword"heard"
Figure: Thisisabroadbandspectrogramof theword"hide"withtheformanttracksfor formants1to5superimposedoverit.
1_aam 0.0143017892 0.490396511
g1 0
g2
aag
aa1
aa2
aam
m1
m2 0.491
Time (s) ( )
aayvu 1
-1 g 0 Time (s) aa ay y yv v vu u 0.8455
0.18 0 18
0.2
0.1
0.07
0.04
0.07
0.07
0.19
Words aayvu g aa ay y yv v vu u
Duration insecs 0.77 0.19 0 19 0.2 0.1 0.07 0.04 0.07 0 07 0.06 0.2
Intensity indB 80.4 62.4 62 4 81.3 84.0 80.5 78.7 73.4 73 4 78.2 77.8
Pitch inHz 160.2 128 137.1 171.1 179 174.5 162.2 162 2 166.5 167.2
F1 540.7 900.78 900 78 810.4 654.07 362.1 349.3 348.7 348 7 3636.0 387.36
FormantsinHz F2 F3 1484.6 3750.3 1853.0 1853 0 2899.3 2899 3 1181.6 2865.5 1755.3 2599.9 2275.9 2570.3 1928.6 2365.0 1154.98 1154 98 2418.4 2418 4 1147.2 2570.8 1488.5 2611.5
F4 3750.2 4078.2 4078 2 3792.2 3753.5 3878.4 3876.5 3636.0 3636 0 3568.2 3693.2
LPC of aa in aayvu
Sound pressu level (dB/Hz) ure
886.4
1212.5
60
2916.7
40
3754.0
4813.6
20 0 1000 2000 3000 Frequency (Hz) 4000 5000 5500
LPC of ay in aayvu
Sound press sure level (dB/Hz)
671.6
1694.1 2272.1
3679.9
60
40
20 0 1000 2000 3000 Frequency (Hz) 4000 5000 5500
LPC of y in aayvu
Sound pressure level (dB/Hz) d (
80
352.9 2323.9 3939.3 4939.6
60
40
1000
2000
3000 Frequency (Hz)
4000
5000 5500
LPC of v in aayvu
Sound pressure level (d /Hz) dB
60
323.3 1190.2 2346.2 3613.2
40
20 0 1000 2000 3000 Frequency (Hz) 4000 5000 5500
LPC of vu in aayvu
Sound pressure level (dB/Hz) p B
360.3
60
1108.7
2612.9
3583.6
40
20 0 1000 2000 3000 Frequency (Hz) 4000 5000 5500
LPC of u in aayvu
Sound pressure level (d /Hz) dB
397.4
60
1486.3
3583.6 2590.7
40
20 0 1000 2000 3000 Frequency (Hz) 4000 5000 5500
Linear Prediction Coefficient (LPC)
Linear Prediction Coefficient (LPC) analysis attempts to predict the poles (related to resonances or formants) that, when combined with the speech source spectrum (the "residual" in LPC analysis), would result in the original waveform. g
An LPC analysis separates the analysis of the resonant characteristics of a speech sound from the source characteristics of that sound.
The resulting LPC spectrum is a smoothed spectrum with the peaks representing the formants (resulting from the vocal tract resonances) of the spectrum of a vowel or vowel like consonant vowel-like consonant.
Figure:ThisisanLPCanalysisofthevowelinheard.Note thesmoothspectrumclearlyshowingthepositionsofthe mainspectralpeaks(formants)ofthisvowel
Figure:Whitenoiseusedasasimplifiedmodelofafricativesound source. Notetherandompatternofboththewaveform(bottom)andthe spectrum(top).Alsonotethatthespectralenvelope(LPCspectruminred) isapproximatelyflat.
Identification of Speech Waveforms
Figure:Threelongvowelsinan/h_d/context.
Figure:ThreeEnglishvoicelessoralstopsinCVcontext
Figure:ThreeEnglishvoicedoralstopsinCVcontext.
Figure:ThetwoEnglishaffricatesinCVcontext.
Figure9:WaveformsoftwooftheEnglishvoicelessfricativesinCVcontext