Unit 2
Unit 2
PHONOLOGY
PHONOLOGY:
1. PHONE:
A phone is a speech segment that possesses distinct physical or perceptual properties and serves as the basic unit of
phonetic speech analysis. Phones are generally either vowels or consonants. Phone is represented with phonetic
symbols, i.e., [t].
2. PHONEMES:
In linguistics and specifically phonology, a phoneme is any set of similar phones that is perceptually regarded by the
speakers of a language as a single distinct unit, a single basic sound, which helps distinguish one word from another.
Dime Vs Dine
They sound exactly alike but their meanings are different. It is the /m/ and /n/ that made the difference in meaning.
3. ALLOPHONES:
Allophones are a kind of phoneme that changes its sound based on how a word is spelled. Think of the letter t and
what kind of sound it makes in the word "tar" compared with "stuff." It's pronounced with a more forceful, clipped
sound in the first example than it is in the second. Linguists use special punctuation to designate phonemes. The sound
of an l, for instance, is written as "/l/."
[ph] and [p] are the allophones of the same phoneme /p/ in English:
The 'p' in 'paper' is normally pronounced with aspiration. That is, there is a release of a puff of air. The 'p' in 'spill' is
normally not aspirated.
[sphin] "spin"
[spIn] "spin"
Aspiration doesn't affect the meaning of the word. Either [p] or [p h] gives the same meaning. In English they are
considered to be the 'same' sound though they may be phonetically different. Thus, [p] or [p h] are NOT contrastive in
English. They don't affect the meaning of words. They are allophones of the same phoneme /p/.
4. MINIMAL PAIRS:
A minimal pair is a set of words that mean two different things but differ by a single speech sound. If a unit of sound
changes the meaning of a word, it is called a phoneme in linguistics. For example, bat and bad are similar-sounding
words that differ by one consonant sound at the end (/t/ vs /d/). Since this sound changes the meaning of the word, it is
a phoneme. As a non-native speaker, you may find some minimal pairs in English confusing if your native language
doesn’t have those sounds (or those sounds don’t make a difference in meaning):
For example:
/r/ and /l/ such as row and low for Japanese speakers
/b/ and /v/ such as berry and very for Spanish speakers
/w/ and /v/ such as wine and vine for Hindi and Urdu speakers
/ɪə/ and /eə/ such as cheer and chair for Arabic speakers
Phonological rules are based on the observation of changes in the character, quality, loss, or addition of sounds in the
normal, natural use of language.
When two words, or even syllables, interact at the boundary, three main classes of change seem to occur: These are;
1. ASSIMILATION:
Note that the overall phonological environment of a word (or even a series of words) conditions the exact way in
which different allophones of a phoneme are pronounced.
A back vowel, for example, might cause the speakers of a particular language to produce a farther-back version of a
consonant. A nasal consonant might influence a neighbouring vowel to be produced with nasalization. When one
sound is drawn closer to another in this way, the process is called assimilation; one sound is said to have assimilated
to another.
Assimilation can cause sounds to be raised or lowered, voiced or devoiced, aspirated or unaspirated, and more,
depending on the phonological environment and the way in which it "conditions" the sounds around it.
4. You'll have to
Careful, Slow Speech: /juːl hæv tuː/
Rapid, Connected Speech: /juːl hæf tuː/
Rule 1: The phonemes /t/, /d/, and /n/ become bilabial before bilabial consonants /p/, /b/, and /m/:
Rule 2: /t/ assimilates to /k/ before /k/ or /g/, and /d/ assimilates to /g/ before /k/ or /g/:
Rule 4: /s/ assimilates to /ʃ/, /z/ assimilates to /ʒ/, /t/ assimilates to /ʧ/, and finally /d/ assimilates to /ʤ/ before /j/:
Progressive assimilation works from left to right. That is, the assimilated phoneme comes after that which forces the
assimilation process. Good candidates for this type of assimilation are the past tense forms of verbs and plurals of
nouns in English.
This refers to when a sound is influenced by the sound that comes before it. The /s/ sound is influenced by the
previous sound and changes to a /z/ sound, e.g. /bægs/ (bags) → /bægz/ (bagz).
2. ELLISION:
Combined with predictability, certain sounds may not be pronounced due to a weakening process. The reduced
vowel /ə/, also known schwa, can act like a catalyst in this process. The weak vowel /ə/ is basically a middle central
vowel, one whose articulation requires the least effort and energy among all other vowels and consonants. If sounds
disappear under certain circumstances, we are dealing with 'elision'.
Elision produces consonants standing for both themselves and the lost vowel. Therefore, such consonants are called
'syllabic consonants'. Now we look at another case of sound loss. Certain sounds are contracted in languages based on
the principle of 'predictability'. In the following examples, /v/ and /t/ sounds disappear, respectively:
Another type of deleted sounds in English takes place in the use of modals and auxiliaries.
The sound /r/functions as a linking device when two words, the first ending in a vowel and the next beginning with a
vowel, are spontaneously pronounced.
ISOLATED INTERACTIVE
PRONUNCIATION PRONUNCIATION
PRONUNCIATION PRONUNCIATION
1. here /hɪə/ Here are /hɪərɑː/
2. four /fɔː/ Four eggs /fɔːregz/
3. media /mɪːdɪə/ Media event /mɪːdɪərɪvent/
4. Formula /fɔːmjələ/ Formula A /fɔːmjələrer/
5. saw /sɔː/ I saw it. /aɪsɔːrɪt/
6. draw /drɔː/ Nice drawing. /naɪs drɔːrɪŋ/
4. PHONOTACTICS:
Phonotactics is independent of syllable structure. The way sounds in a language are grouped, clustered is known as
'phonotactics'. In other words, phonotactics is about the possible phoneme combinations of a language.
Syllable is a fundamental but elusive unit in phonology. Every word consists of a sequence of some number of
syllables, and even speakers with no knowledge of phonetics or of linguistics usually find it easy to agree on how
many syllables a word contains.
For example, we all agree that girl contain one syllable, that butter contain two, that linguistics contain three, that
education contain four, and so on.
SYLLABLE:
A word has as many syllables as the number of vowels it has. Then, the first letter of English alphabet (a) is a syllable
when used in the meaning of "one" or "some". A notation is useful for a discussion of syllables here.
As such, the onset in the syllable pen is /p/, the nucleus is /e/ and the coda is /n/.
In the pattern of consonant cluster, the English syllable twelfths [twelf θ s], for instance, is divided into the onset /tw/,
the nucleus /e/, and the coda /lfθ s/, and its phonotactic can be described as CCVCCCC
twelfths [twelfθ s]
CC V CCCC → (phonotactic)
O N C → (syllable structure)
(table)
B. SUPRASEGMENTAL PHONOLOGY:
1. WORD STRESS:
Stress is contrastive; that is, for a word to have stress, it has to have more than one syllable: one stressed, one
unstressed. A stressed syllable is produced with greater amount of energy than an unstressed one. A stressed syllable
is spoken and heard louder and longer, since an extra pushing of air from lungs is involved. Clearly, the muscles on
chest, lungs, and larynx expend more energy in producing such syllables.
(table)
Depending on the syllable stressed, the syntactic function (e.g. noun, verb, etc.) and meaning of a word may radically
change
(table)
Depending on where stress is, its noun or verb function can be determined.
(table)
"...the stress pattern of a word is an important part of its identity for the native speaker. There is a great deal
of evidence that native speakers rely very much on the stress pattern of a words when they are listening. In fact,
experiments have demonstrated that often when a native speaker mishears a word, it is because the foreigner
has put the stress in the wrong place, not because he or she mispronounced the sounds of the word."
The following can be inferred from this quotation. First, stress is definitely an issue in English communication.
Secondly, absence or misplacement of it can cause comprehension problems. Thirdly, stress is relatively more
important than the pronunciation of words. Finally, advanced learners of English need to be observant as well as be
able to identify and use stress in words. It is evident from the preceding exposition of the subject that not all syllables
of a polysyllabic English word receive the same level of stress. In the examples we have examined so far, there are
two levels of stress: stressed syllable and unstressed syllable.
Words that have three syllables can have three levels of stress: primary stress, secondary stress, weak stress. If a word
has more than three syllables, then it can have four levels of stress: primary, secondary, tertiary and weak. It is one
thing to identify levels of stress in a word that is pronounced for experimental purposes; quite another to say a word in
a sentence as part of the flow of a conversation. The above-mentioned levels are usually valid for the former kind.
Although dictionaries can display up to three levels of stress relatively comfortably, they can hardly give a four level
of stress. Remember that dictionaries examine words in isolation, not in context.
As opposed to isolated pronunciation of words, in connected speech, usually two levels of stress appear to be
perceptible, to non-native speakers in particular, regardless of the number of syllables: stressed and unstressed. What
is known as the primary stress is accepted as the stressed syllable while the rest (i.e. secondary, tertiary, and weak) are
rendered as unstressed syllables.
2. SENTENTIAL STRESS:
An examination of quite a number of utterances (i.e. sentences) in English (both in British or American), reveals that
almost every utterance will have a main stress, located in one of its words. Such stress is known as 'sentence stress' in
the literature. Which word in the following sentence has the sentence stress?
I'm a student.
Your guess is right: it is on student. Because student has more than one syllable, we need to take another step and
identify the stressed syllable. Which syllable in student gets the stress? The answer is the first syllable: student
Under normal circumstances, words that carry higher information content in the utterance are given higher stress than
those carrying lower information as well as those that are predictable from the context. It is generally the case that one
word is stressed more than any other since it possesses the highest information content for an utterance in a discourse;
stressed item informs the hearer most.
All words can be divided into two in view of what they do in a sentence: content and function words. Content words
are nouns, verbs, adjectives, and adverbs while function words are articles, prepositions, conjunctions, and modal
auxiliaries.
Generally speaking, then, it is the content words that receive the sentential stress in English.
In other words, the items on the left-hand column are (potentially) stressable in unmarked utterances whereas the ones
on the right column are not stressable in the normal course of speaking. What we have so far explained is in fact one
of the four types of sentence stress, known as 'tonic stress'.
Both sentences have tonic stress, which are in content words at the sentence initial position. Tonic stress is regarded as
the unmarked sentential stress as it is the most widespread one in English. The following three types of stress are so-
called marked stress types, as they deviate from the usual tonic stress.
The second type of stress is called 'New Information Stress'. As the name suggests, the new information in an
utterance is stressed. New information can typically be found in the answers to wh-questions.
Consider the following in which new information stress is underlined:
The third type of stress is called 'Emphatic Stress', which can be common in the so-called emphatic speeches.
Emphatic stress can generally be seen on
(a) emphatic reflexives such as himself and own
(b) adverbs such as very, so, and such
(c) emphatic words by nature such as indeed, terribly, and utterly.
The final type of stress is 'Contrastive Stress', which is used to show contrast in the same sentence or across
sentences.
Study the exchange below how contrastive stress can be used:
3. TONES:
Tones are another paralinguistic tool for speakers to send several indirect messages about their strategies in
conversation, revealing their attitudes and so forth. Imagine you are asked to pronounce Yes in as many different ways
as you can: what changes can you possibly apply? Contemplate the following and try to produce a pronunciation that
can match the meaning given:
(Table)
Let us get to know the terms for these four basic tones:
I. / / indicates the fall tone
II. / / indicates the low-rise tone
III. / / indicates high-rise tone
IV. / / indicates the fall-rise tone
OVERVIEW
PROSODY:
Prosody in phonology refers to the patterns of rhythm, stress, and intonation in spoken language. It encompasses the
aspects of speech that go beyond the actual words and sounds (phonemes) to include how those sounds are delivered.
Prosody plays a crucial role in conveying meaning, emotion, and structure in speech.
Prosody is essential in communication because it helps listeners interpret the speaker's intent, emotion, and the
structure of the information being conveyed. It also contributes to the naturalness and fluency of speech.
COMPUTATIONAL PHONOLOGY
COMPUTATIONAL PHONOLOGY:
Computational phonology is an interdisciplinary field that combines elements of phonology, the branch of linguistics
concerned with the study of sound systems in languages, with computational techniques. Its primary goal is to use
algorithms, formal models, and computer programs to understand, model, and analyze the phonological structure of
languages.
Before diving into computational phonology, it's important to understand some basic concepts in traditional
phonology:
1. Phonemes: The smallest units of sound that can distinguish meaning in a language. For example, in English,
the sounds /p/ and /b/ in "pat" and "bat" represent different phonemes.
2. Allophones: Variations of a phoneme that do not change word meaning. For instance, the /p/ sound in "spin"
(unaspirated) and "pin" (aspirated) are allophones of the same phoneme in English.
3. Phonological Rules: Systematic processes that govern how phonemes are realized as different allophones in
specific linguistic environments. For example, in English, the phonological rule that turns a voiced sound into
its voiceless counterpart at the end of words is a form of devoicing.
4. Phonotactics: The rules governing the permissible combinations of phonemes in a particular language. For
example, in English, a word can start with the cluster /str/ (as in "street"), but not with /sr/.
Computational Phonology refers to the application of computational methods to model and analyze these
phonological concepts. It allows linguists to formalize phonological rules, test theories, and simulate language
processes using computer algorithms.
Modeling Phonological Processes: Create computational models that can simulate how phonological rules
apply in different languages. These models can be used to predict the surface forms of words given their
underlying representations.
Speech Recognition and Synthesis: Improve systems that convert spoken language into text (speech
recognition) and text into spoken language (speech synthesis) by incorporating accurate phonological models.
Language Learning and Acquisition: Use computational models to simulate and study how humans learn
the phonological systems of their native languages or second languages.
Testing Phonological Theories: Provide tools to linguists to test and refine phonological theories by
comparing predicted outcomes from computational models with actual linguistic data.
Cross-Linguistic Phonology: Analyze and compare phonological patterns across different languages to
identify universal trends or language-specific characteristics.
1. Speech Technology:
o Speech Recognition: Computational phonology contributes to more accurate recognition of spoken
words by modeling how sounds vary depending on context (e.g., different pronunciations of the same
word).
o Speech Synthesis: Enhances text-to-speech systems by ensuring that generated speech sounds natural
and adheres to the phonological rules of the target language.
2. Linguistic Research:
o Provides tools for linguists to test hypotheses about phonological rules and patterns, using
computational simulations to validate theoretical models.
3. Language Learning:
o Assists in the development of language learning tools by modeling how phonological knowledge is
acquired and by creating exercises that help learners internalize the sound patterns of a new language.
4. Dialect and Language Variation:
o Analyzes differences in phonological systems across dialects or languages, providing insights into
how sound systems evolve and differ across regions and communities.
5. Clinical Linguistics:
o Used in the study of speech disorders by modeling normal and atypical phonological processes, aiding
in diagnosis and therapy design.
Computational phonology is an evolving field that sits at the intersection of linguistics and computer science. It offers
powerful tools for understanding the sound systems of languages, testing linguistic theories, and developing
applications in speech technology and language education. As computational methods and linguistic theories continue
to advance, the potential for this field to contribute to our understanding of language and to practical applications in
technology will continue to grow.
WORDS:
Words are the smallest free-standing units of meaning in a language.
They can stand alone as independent units and convey meaning.
Examples: "book," "run," "quickly."
MORPHEMES:
WHAT IS A MORPHEME?
A morpheme is the minimal grammatical unit within a language. Every word comprises one or more morphemes. A
standalone morpheme and a word are identical but when a root word becomes modify with addition of affixes, it
becomes word only.
The root is listen is standalone morpheme and a word at a same time. When the root word was modifies with affixes
like -s, -er, -ed and –ing it became a word consisting of two morphemes in each word.
TYPES OF MORPHEMES:
FREE MORPHEMES:
The morpheme that can standalone as a single word (as a meaningful unit) is called free morpheme. The free
morphemes are roots that are identical to words. Free morphemes are set of separate English word forms such as basic
nouns, verbs, adjectives, etc. When a free morpheme is used with bound morphemes, the basic word forms are
technically known as stems or roots.
Examples of free morphemes:
Sun (noun), dog (noun), walk (verb), and happy (adjective)
Free morpheme can standalone and cannot be subdivided further. ‘Sun’ or ‘dog’ are ‘free morphemes because they
cannot be further split up, therefore the stems that cannot divide further are also called roots.
Free morphemes are divided into two categories: Lexical morphemes and functional morphemes.
Lexical morphemes are a set of content words like nouns, verbs, adjectives, and adverbs. They can be
understood fully e.g. run, blue, slow, paper, small, throw, and now. Lexical morphemes depict dictionary
meaning of a word that is attributed to a specific referent.
Functional morphemes are set of functional words like conjunctions, prepositions, articles, pronouns, auxiliary
verbs, modals and quantifiers. Some examples of functional morphemes are and, near, when, on, because, but,
it, in, that, the, and above. Functional morphemes perform as a relationship between one lexical morpheme
and another. A functional morpheme modifies the meaning, rather than supplying the root meaning of the
word. It encodes grammatical meaning e.g.; the players entered the ground. In this sentence, ‘the’ is functional
morpheme, which is specifying players and ground.
BOUND MORPHEMES:
Segments that cannot standalone and occur with another root/stem are called bound morphemes. Bound morphemes
are also called affixes (prefixes, suffixes and infixes) in English. Two bound morpheme cannot occur together, but it is
necessary for a bound morpheme to occur with a root/stem.
The set of affixes that make up the category of bound morphemes can also be divided into two types. Derivational
morphemes and inflectional morphemes.
Derivational Morphemes:
Derivational morphemes change the grammatical categories of words. For example, the word ‘bake’ (verb) is
a root word (free morpheme) and when we add bound morpheme ‘er’(a suffix) with stem: it becomes baker (a
noun), So the grammatical category was changed from verb to noun.
Derivational morphemes can either be class-changing or class-maintaining, depending on the change in the
word. Class-maintaining maintain words’ original form and don’t change grammatical category of the word
while class changing change the meaning of the root word to a new word.
Class Maintaining: These prefixes do not change the grammatical category of the base word.
Examples:
a- (amoral)
ex- (ex-member)
non- (nonsense)
mini- (miniskirt)
Class Changing: These prefixes can change the grammatical category of the base word.
Examples:
a- (asleep)
re- (renew)
en- (enrich)
Class Maintaining: These suffixes do not change the grammatical category of the base word.
Examples:
-ess (actress)
-let (booklet)
-ling (duckling)
Class Changing: These suffixes change the grammatical category of the base word.
Examples:
-ate (passionate)
-er (reader)
-ee (employee)
Inflectional Morphemes:
An inflectional morpheme is a suffix that is added to a word to assign a particular grammatical property to
that word. For example, liste +ing = listening or boy+s = boys. They do not change the essential meaning or
the grammatical category of a word. Inflectional morphemes serve as grammatical markers that indicate tense,
number, possession, or comparison.
Bound Base:
Bound bases serve as roots for only derivational morphemes and cannot stand alone. Example of the bound
base morphemes is given below where –ceive is a bound base morpheme that cannot stand alone but it can
form a word when used with prefixs:
ALLOMORPHS:
Allomorphs are non-meaningful variants of a morpheme. For example, the -s plural takes three distinct phonological
forms, [s], [z], and [ɪz], in the words boys [bɔɪz], books [bʊks], and dishes [dɪʃɪz]. These phonological distinctions are
considered non-meaningful, making these allomorphs of the -s plural morpheme.
SEGMENTATION
SEGMENTATION:
Segmentation in NLP (Natural Language Processing) refers to the process of dividing a continuous sequence of text
into smaller, meaningful units or segments. These segments can be words, phrases, sentences, or even larger chunks,
depending on the specific task or context. Segmentation plays a crucial role in various NLP tasks, and the choice of
segmentation depends on the specific requirements of the task at hand. Here are a few examples of segmentation
techniques commonly used in NLP:
TYPES OF SEGMENTATION:
1. Word Segmentation: Word segmentation involves dividing a sentence or a continuous string of text into individual
words. In languages like English, words are generally separated by spaces, making word segmentation relatively
straightforward. However, in languages like Chinese or Thai, which do not have explicit word delimiters, word
segmentation becomes more challenging.
2. Sentence Segmentation: Sentence segmentation involves dividing a paragraph or a document into individual
sentences. This is particularly useful for tasks such as text summarization, machine translation, or sentiment analysis,
where the meaning of the text is often tied to the sentence level.
3. Phrase Segmentation: Phrase segmentation involves identifying and extracting meaningful phrases or multi-word
expressions from a text. This can be useful in tasks like named entity recognition or extracting collocations, where
certain phrases carry specific information or have a different semantic meaning than their individual words.
4. Chunking: Chunking involves identifying and grouping together syntactically related words in a sentence. This
helps in extracting higher-level information such as noun phrases, verb phrases, or prepositional phrases, which can be
useful for tasks like information extraction or parsing.
[Link] Segmentation: Subword segmentation involves dividing words into smaller subunits or subword units.
This technique is often used in morphologically rich languages or for handling out-of-vocabulary words. Subword
segmentation methods, such as Byte Pair Encoding (BPE) or WordPiece, can capture morphological variations and
improve the performance of various NLP models.
Segmentation techniques are often language-dependent, and different languages may require different approaches.
Additionally, the choice of segmentation can also vary depending on the specific NLP task and the available resources
or tools.
1. Machine Translation: Proper segmentation is crucial for translating text from one language to another,
especially for languages where word boundaries are not explicit.
2. Speech Recognition: Segmentation helps in identifying where words or sentences start and end in continuous
speech, which is critical for accurate transcription.
3. Information Retrieval: Segmenting text into meaningful units allows search engines to retrieve and rank
documents based on relevance more effectively.
4. Text Summarization: Segmentation aids in identifying the most important sentences or paragraphs to include
in a summary.
5. Sentiment Analysis: Sentence segmentation is crucial in determining the sentiment of individual sentences,
especially in reviews or social media posts.
1. Rule-based Methods: These involve using predefined rules, such as punctuation marks for sentence
segmentation or spaces for word segmentation in languages like English. Rule-based methods are simple but
can be limited in handling exceptions or complex cases.
2. Statistical Methods: These methods use probabilistic models like Hidden Markov Models (HMMs) to predict
the most likely segmentation points in a text. They can adapt to various contexts but require large annotated
datasets.
3. Machine Learning-based Methods: These involve training models, such as Conditional Random Fields
(CRFs) or deep learning models, to automatically learn segmentation patterns from data. These methods can
handle complex cases but require substantial computational resources and labeled data.
4. Hybrid Methods: These combine rule-based and statistical/machine learning approaches to leverage the
strengths of each. For example, rules can be used for initial segmentation, with machine learning models
refining the results.
5. Subword Tokenization: Techniques like Byte Pair Encoding (BPE) or WordPiece are used to segment words
into subwords, especially useful in languages with rich morphology or for handling rare words in machine
translation.
CHALLENGES IN SEGMENTATION:
1. Ambiguity: Text can be ambiguous, making it challenging to determine the correct segmentation. For
example, abbreviations, decimal points, or ellipses can confuse sentence segmentation.
2. Language Variability: Different languages have different rules and structures, making it hard to develop a
one-size-fits-all segmentation approach. For instance, Chinese or Japanese text segmentation is more complex
due to the lack of spaces between words.
3. Out-of-vocabulary Words: Handling words that are not in the training data, especially in subword
segmentation, can be challenging, leading to incorrect segmentation.
4. Context Dependency: Segmentation often depends on the context, requiring models to understand the
surrounding text to make accurate decisions. This is particularly challenging in informal or creative writing,
such as social media posts.
5. Resource Requirements: Advanced segmentation techniques, particularly those based on machine learning,
require large amounts of labeled data and computational power, which may not always be available.
Segmentation is a crucial step in NLP that impacts the performance of various downstream tasks. While simple rule-
based methods may suffice for certain applications, more complex tasks often require sophisticated statistical or
machine learning-based techniques. Despite advances, challenges such as ambiguity, language variability, and
resource constraints continue to make segmentation a non-trivial problem in NLP.
4o