An Introduction to Forensic Linguistics:
Language in Evidence
Malcolm Coulthard & Alison Johnson
2007
Routledge/Taylor & Francis
Chapter 7
The work of the forensic phonetician and the document examiner
[Only certain excerpts have been extracted for LEM 212 teaching -learning purposes]
The work of the forensic phonetician
The forensic phonetician is concerned with all aspects of speech as evidence. This ranges
from the creation of accurate transcriptions of what was said, through deriving information
about a speaker’s social and regional background, to expressing an opinion on whether the
speaker in two or more separate tape-recordings is the same. In addition, they help to design
and interpret voice line-ups, which allow victims and witnesses to express an opinion as to
whether the voice of a suspect is that of the criminal.
Transcription
Many court cases involve the presentation of transcriptions of tape-recorded evidence. The
tape-recording(s) concerned may be of people talking about future or past criminal activity
or of them actually committing a crime, as in the case of bomb threats, obscene phone calls,
ransom demands, hoax emergency calls or negotiating the buying or selling of drugs.
Very few of the transcriptions presented in court have been made by someone with a
qualification in phonetics, although occasionally a forensic phonetician is called in,
typically when there is a dispute over a small number of specific items, which could be
single words or even an isolated phoneme. Sometimes the transcription problem is not
phonologically difficult, and the original mis-transcription has resulted from the original
transcriber hearing what they expected rather than what was actually said. So, for example,
in one case in which Coulthard was involved, an indistinct word, in a clandestine recording
of a man later accused of manufacturing the designer drug Ecstasy, was mis-heard by a
police transcriber as ‘hallucinogenic’.
Peter French (in Baldwin and French 1990) reports a much more difficult case, which
appeared to turn on the presence or absence of a single phoneme, the one that distinguishes
can from can’t. Most readers, if they tape record themselves reading these two words aloud,
will notice not one but two phonemic differences between their pronunciations of the words
– the absence/presence of a /t/ and a different vowel phoneme. Using an educated British
accent, at least when the words are produced as citation forms, the vowel in ‘can’t’ is also
longer.
However, in an ordinary speech context, as in the phrase ‘I can’t refuse’, the /t/ often
disappears and the vowel is shortened, so that the phonetic difference between the two
words is very much reduced. It may seem surprising that a language allows such a
significant morphemic distinction, that is the one between positive and negative, to be
realized by such a small phonetic difference and even more surprising that, at least for some
accents in some contexts, the distinction may not be marked by even an extra phoneme, but
simply by a slight difference in the quality of vowel.
Speaker identification by professionals
The vast majority of the cases undertaken by forensic phoneticians are in fact speaker
identification; these are cases where there is a recording of a voice committing a crime and
one or more suspects and the phonetician is asked to express an opinion as to whether any
of the suspect voices does or does not match that of the criminal.
A basic problem to overcome is that there will always be differences between any two
speech samples, even when they come from the same speaker and are recorded on the same
machine and on the same occasion. So, the task for the forensic phonetician
involves being able to tell whether the inevitable differences between samples
are more likely to be within-speaker differences or between-speaker differences.
(Rose 2002: 10)
There are two major traditions for analysing speech samples – the auditory and the acoustic
– which were associated with Britain and the United States respectively, although the
consensus now is very much that a mixture of the two methods should be used.
Voice line-ups and Naïve Speaker Recognition
In 1933, the baby son of the American aviator Charles Lindbergh, famous as the first man
to fly solo across the Atlantic, was kidnapped and later found murdered, but not before a
ransom had been demanded and paid. Eventually the police arrested and charged a suspect.
Lindbergh had talked to the kidnapper twice, once on the telephone, which in those days
would not have provided a very good reproduction and once in person, briefly and at night,
while handing over the ransom money.
Some two years later when the case came to trial Lindbergh testified that he recognized the
voice of the accused as being that of the man he had talked to. The defence set out to
challenge his testimony and employed a psychologist to discover what was and what was
not possible in terms of memory for voices. Seventy-five years on there is now a vast
literature on how to evaluate the evidence of what the professionals call Naïve Speaker
Recognition, ranging from the evaluation of individual abilities, through how memory
decays over time, to the design of voice line-ups, which are the vocal equivalent of identity
parades, to ensure that they are fair to both sides and likely to produce reliable results.
First, there are significant differences in recognition success depending on whether it is a
familiar or an unfamiliar voice. Rose (2002: 98–9) reports experiments which show
listeners being twice as successful in correctly recognizing familiar voices. Sec ond, even
with familiar voices, listeners make mistakes roughly one third of the time.
Third, one cannot extrapolate from these scores for average success to the likelihood of a
given witness being able to recognize a known voice, because there is massive individual
variation; listener success in one experiment which was testing the ability to recognize 25
famous voices ranged ‘all the way from totally correct (100) to chance (46)’ (Rose 2002:
100).
And then, as one would expect, delay has a growing effect on accuracy. McGehee (1937)
reported 87 per cent correct identification after two days, falling to 13 per cent after five
months. Also, voices heard only on the telephone are more difficult to recognize due to the
degradation of the signal (Kunzel 1994). On the other hand, it appears that stress at the time
the voice is heard can enhance memory (Atwood and Hollien 1986) and the longer the
sample of speech presented to the witness the better the success rate. Another study by
Koenig (1986), involving a review of 2,000 FBI cases over a 15-year period, indicated an
error rate of less than one per cent. In contrast, Hollien (1990) claimed error rates of
between 20 and 78 per cent.