0% found this document useful (0 votes)
35 views19 pages

Unit 2

Uploaded by

Suhaana Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views19 pages

Unit 2

Uploaded by

Suhaana Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT II - MORPHOLOGY AND PARTS-OF-SPEECH

Phonology – Computational Phonology – Words and Morphemes – Segmentation – Categorization and


Lemmatization – Word Form Recognition – Valency – Agreement – Regular Expressions – Finite State Automata –
Morphology– Morphological issues of Indian Languages – Transliteration.

[Link] (Phonology PPT)

PHONOLOGY

PHONOLOGY:

 Phonology is the study of the sound systems of languages.


 Phonology is the aspects of language related to the distinctive features of the representation and reception of
sounds of language.
 Phonology is the study of the way sounds function in languages, including phonemes, syllable structure,
stress, accent, intonation, and which sounds are distinctive units within a language: The way sounds function
within a given language.
 Phonology studies abstract mental entities, such as structures and processes.
 This contrasts with phonetics, which deals with the actual production and acoustics of the sounds of language.
 With the use of phonological trees syllables are broken up more easily.
 Syllables are made up of a rhyme and an onset (any consonants before the rhyme).
 The rhyme made up of a nucleus (the vowel sound(s) in the syllable, the key component of all syllables) and a
coda (any consonants following the nucleus).
 Different languages can use different phonemes, or different syllable structures (what sounds can go together
to make sequences or words) and phonology identifies these differences.

BASIC UNITS OF PHONOLOGY:

The basic units of phonology are as follows:


(a) Phone
(b) Phoneme
(c) Allophone
(d) Minimal Pairs

1. PHONE:

A phone is a speech segment that possesses distinct physical or perceptual properties and serves as the basic unit of
phonetic speech analysis. Phones are generally either vowels or consonants. Phone is represented with phonetic
symbols, i.e., [t].

2. PHONEMES:

In linguistics and specifically phonology, a phoneme is any set of similar phones that is perceptually regarded by the
speakers of a language as a single distinct unit, a single basic sound, which helps distinguish one word from another.

Phoneme is a speech sound that signals a difference in meaning. For example,

Dime Vs Dine

They sound exactly alike but their meanings are different. It is the /m/ and /n/ that made the difference in meaning.

3. ALLOPHONES:

Allophones are a kind of phoneme that changes its sound based on how a word is spelled. Think of the letter t and
what kind of sound it makes in the word "tar" compared with "stuff." It's pronounced with a more forceful, clipped
sound in the first example than it is in the second. Linguists use special punctuation to designate phonemes. The sound
of an l, for instance, is written as "/l/."

Allophones are the variations of the same phoneme.

[ph] and [p] are the allophones of the same phoneme /p/ in English:
The 'p' in 'paper' is normally pronounced with aspiration. That is, there is a release of a puff of air. The 'p' in 'spill' is
normally not aspirated.

[sphin] "spin"
[spIn] "spin"

Aspiration doesn't affect the meaning of the word. Either [p] or [p h] gives the same meaning. In English they are
considered to be the 'same' sound though they may be phonetically different. Thus, [p] or [p h] are NOT contrastive in
English. They don't affect the meaning of words. They are allophones of the same phoneme /p/.

4. MINIMAL PAIRS:

A Minimal pair is a pair of words with ONE phonetic difference only.

A minimal pair is a set of words that mean two different things but differ by a single speech sound. If a unit of sound
changes the meaning of a word, it is called a phoneme in linguistics. For example, bat and bad are similar-sounding
words that differ by one consonant sound at the end (/t/ vs /d/). Since this sound changes the meaning of the word, it is
a phoneme. As a non-native speaker, you may find some minimal pairs in English confusing if your native language
doesn’t have those sounds (or those sounds don’t make a difference in meaning):

For example:

 /r/ and /l/ such as row and low for Japanese speakers
 /b/ and /v/ such as berry and very for Spanish speakers
 /w/ and /v/ such as wine and vine for Hindi and Urdu speakers
 /ɪə/ and /eə/ such as cheer and chair for Arabic speakers

COMMON MINIMAL PAIR:


BRANCHES OF PHONOLOGY:
A. SEGMENTAL PHONOLOGY:

Phonological rules are based on the observation of changes in the character, quality, loss, or addition of sounds in the
normal, natural use of language.

When two words, or even syllables, interact at the boundary, three main classes of change seem to occur: These are;

1) changes in sound character = ASSIMILATION


2) loss of sounds = ELISION
3) addition of sounds = LINKING
4) Sound and phoneme combinations = PHONOTACTICS

All of which are features of Segmental Phonology.

1. ASSIMILATION:

Note that the overall phonological environment of a word (or even a series of words) conditions the exact way in
which different allophones of a phoneme are pronounced.

A back vowel, for example, might cause the speakers of a particular language to produce a farther-back version of a
consonant. A nasal consonant might influence a neighbouring vowel to be produced with nasalization. When one
sound is drawn closer to another in this way, the process is called assimilation; one sound is said to have assimilated
to another.

Assimilation can cause sounds to be raised or lowered, voiced or devoiced, aspirated or unaspirated, and more,
depending on the phonological environment and the way in which it "conditions" the sounds around it.

1. Good morning  Phonemic transcription of


RP (careful, slow speech)
 Careful, Slow Speech: /gʊd mɔːnɪŋ/  Phonemic transcription of
 Rapid, Connected Speech: /gʊb mɔːnɪŋ/ RP (rapid, connected
speech)
2. I've lost my voice
 Careful, Slow Speech: /aɪv lɒst maɪ vɔɪs/
 Rapid, Connected Speech: /aɪv lɒs maɪ vɔɪs/

3. Since I can't speak


 Careful, Slow Speech: /sɪns aɪ kɑːnt spiːk/
 Rapid, Connected Speech: /sɪns‿aɪ kɑːn spiːk/

4. You'll have to
 Careful, Slow Speech: /juːl hæv tuː/
 Rapid, Connected Speech: /juːl hæf tuː/

5. Do all you can do to help each other


 Careful, Slow Speech: /duː ɔːl juː kæn duː tuː help iːtʃ ʌðə/
 Rapid, Connected Speech: /duːwɔl jə kən duː tə help iːtʃ‿ʌðə/

6. Before asking me to help, see if your partner can


 Careful, Slow Speech: /bɪfɔːr ɑːskɪŋ miː tuː help siː ɪf jɔː pɑːtnə kæn/
 Rapid, Connected Speech: /bɪfɔrɑskɪŋ miː tə help siːjɪf jə pɑːtnə kæn/
Assimilation Rules:

Rule 1: The phonemes /t/, /d/, and /n/ become bilabial before bilabial consonants /p/, /b/, and /m/:

Assimilation involving /t/, /d/, and /n/

EXPRESSION SLOW SPONTANEOUS

fat boy /fæt bɔɪ/ /fæpbɔɪ/


good book /gʊd bʊk/ /gʊbbʊk/
ten bikes /ten baɪks/ /tembaɪks/

Rule 2: /t/ assimilates to /k/ before /k/ or /g/, and /d/ assimilates to /g/ before /k/ or /g/:

Assimilation involving /t/, /d/

EXPRESSION SLOW SPONTANEOUS

that cat /ðæt kæt/ /ðækkæt/


good
/gʊd kɒnsət/ /gʊgkɒnsət/
concert
good girl /gʊd gɜːl/ /gʊggɜːl/

Rule 3: /n/ can assimilate to /ŋ/ before /g/ or /k/:

Assimilation involving /n/

EXPRESSION SLOW SPONTANEOUS


I've been
/bɪːn ɡɔɪŋ/ /bɪːŋɡɔɪŋ/
going
his own car /ɔwn kɑː/ /ɔʊŋkɑː/

Rule 4: /s/ assimilates to /ʃ/, /z/ assimilates to /ʒ/, /t/ assimilates to /ʧ/, and finally /d/ assimilates to /ʤ/ before /j/:

Assimilation involving /s/, /z/, /t/, and /d/

EXPRESSION SLOW SPONTANEOUS

We miss you. /s/ /ʃ/


We please
/z/ /ʒ/
you.
We bet you. /t/ /ʧ/
We fed you. /d/ /ʤ/

Progressive Assimilation (Allomorph):

Progressive assimilation works from left to right. That is, the assimilated phoneme comes after that which forces the
assimilation process. Good candidates for this type of assimilation are the past tense forms of verbs and plurals of
nouns in English.
This refers to when a sound is influenced by the sound that comes before it. The /s/ sound is influenced by the
previous sound and changes to a /z/ sound, e.g. /bægs/ (bags) → /bægz/ (bagz).

Allomorphs involving past tense (-ed)

Allomorphs For Progressive Assimilation


Inflecte
Uninflected Pronunciation Pronunciation
d
erase /ɪˈreɪz/ erased /ɪˈreɪzd/
/-d/
stab /stæb/ stabbed /stæbd/

pass /pæs/ passed /pæst/


/-t/
stop /stɒp/ stopped /stɒpt/

/- want /wɒnt/ wanted /wɒntəd/


əd/ plant /plɑːnt/ planted /plɑːntəd/

2. ELLISION:

Combined with predictability, certain sounds may not be pronounced due to a weakening process. The reduced
vowel /ə/, also known schwa, can act like a catalyst in this process. The weak vowel /ə/ is basically a middle central
vowel, one whose articulation requires the least effort and energy among all other vowels and consonants. If sounds
disappear under certain circumstances, we are dealing with 'elision'.

Elision Producing Syllabic Consonants

WORDS UNELIPTED ELLIPTED WORDS UNELIPTED ELLIPTED


little /ˈlɪtəl/ /lɪtl/ perhaps /pəˈhæps/ /ˈphæps/
button /ˈbʌtən/ /bʌtn/ canary /kəˈneərɪ/ /ˈkneərɪ/
potato /pəˈteɪtəʊ/ /ˈpteɪtəʊ/ today /təˈdeɪ/ /ˈtdeɪ/

Elision produces consonants standing for both themselves and the lost vowel. Therefore, such consonants are called
'syllabic consonants'. Now we look at another case of sound loss. Certain sounds are contracted in languages based on
the principle of 'predictability'. In the following examples, /v/ and /t/ sounds disappear, respectively:

Elision Involving /v/ and /t/

EXPRESSIONS CAREFUL STYLE INFORMAL STYLE


lots of them /lɒts əv ðəm/ /lɒtsəðəm/
tallest
/tɔləst bɪldɪŋ/ /tɔləsbɪldɪŋ/
building
Another case of elision occurs with some of the function words in English. For instance, three forms of pronunciation
are obtained for and: /ænd/, /ənd/, /n/. See two examples below:

Elision involving /ænd/

FORMAL STYLE INFORMAL STYLE


rock and roll /rɒk ænd rɔʊl/ rock’n roll /rɒknrɔʊl/
fish and chips /fɪʃ ænd tʃɪps/ fish’n chips /fɪʃnʧɪps/

Another type of deleted sounds in English takes place in the use of modals and auxiliaries.

Elision Involving Contraction of Modals and Auxiliaries

FORMAL FORMAL WRITING OF NORMAL PRONUNCIATION OF


WRITING PRONUNCIATION SPEECH NORMAL SPEECH
I am /aɪ æm/ I’m /aɪm/
She is /ʃɪː ɪz/ She’s /ʃɪːz/
He is /hɪː ɪz/ He’s /hɪːz/
It is /ɪt ɪz/ It’s /ɪts/
We are /wɪː ɑː(r)/ We’re /wɪːr/
You are /jʊː ɑː(r)/ You’re /jʊər/
They are /ðeɪ ɑː(r)/ They’re /ðeɪr/
I have (got) /aɪ hæv/ I’ve (got) /aɪv/
I had (written) /aɪ hæd/ I’d (written) /aɪd/
She has (done) /ʃɪ hæz/ She’s (done) /ʃɪːz/
I would (prefer) /aɪ wʊd/ I’d (prefer) /aɪd/
3. LINKING:

The sound /r/functions as a linking device when two words, the first ending in a vowel and the next beginning with a
vowel, are spontaneously pronounced.

Linking Across Words

ISOLATED INTERACTIVE
PRONUNCIATION PRONUNCIATION
PRONUNCIATION PRONUNCIATION
1. here /hɪə/ Here are /hɪərɑː/
2. four /fɔː/ Four eggs /fɔːregz/
3. media /mɪːdɪə/ Media event /mɪːdɪərɪvent/
4. Formula /fɔːmjələ/ Formula A /fɔːmjələrer/
5. saw /sɔː/ I saw it. /aɪsɔːrɪt/
6. draw /drɔː/ Nice drawing. /naɪs drɔːrɪŋ/
4. PHONOTACTICS:

Phonotactics is independent of syllable structure. The way sounds in a language are grouped, clustered is known as
'phonotactics'. In other words, phonotactics is about the possible phoneme combinations of a language.

Syllable is a fundamental but elusive unit in phonology. Every word consists of a sequence of some number of
syllables, and even speakers with no knowledge of phonetics or of linguistics usually find it easy to agree on how
many syllables a word contains.

For example, we all agree that girl contain one syllable, that butter contain two, that linguistics contain three, that
education contain four, and so on.

SYLLABLE:

A word has as many syllables as the number of vowels it has. Then, the first letter of English alphabet (a) is a syllable
when used in the meaning of "one" or "some". A notation is useful for a discussion of syllables here.

Vowels are abbreviated as (V) while consonants as (C).

1. Minimal syllable = V (e.g. a)


2. Medial syllable = VC (e.g. at)
3. Medial syllable = CV (e.g. to)
4. Maximal syllable = CVC (e.g. pen)

A typical syllable consists of three main parts:


1. Onset (beginning sound)
2. Nucleus (center)
3. Coda (final sound)

As such, the onset in the syllable pen is /p/, the nucleus is /e/ and the coda is /n/.

In the pattern of consonant cluster, the English syllable twelfths [twelf θ s], for instance, is divided into the onset /tw/,
the nucleus /e/, and the coda /lfθ s/, and its phonotactic can be described as CCVCCCC

twelfths [twelfθ s]

CC V CCCC → (phonotactic)

O N C → (syllable structure)

Here are some examples of phonotactics of the English words:

(table)

B. SUPRASEGMENTAL PHONOLOGY:

1. WORD STRESS:

Stress is contrastive; that is, for a word to have stress, it has to have more than one syllable: one stressed, one
unstressed. A stressed syllable is produced with greater amount of energy than an unstressed one. A stressed syllable
is spoken and heard louder and longer, since an extra pushing of air from lungs is involved. Clearly, the muscles on
chest, lungs, and larynx expend more energy in producing such syllables.

Note that, misplaced stresses can yield unintended meanings

(table)

Depending on the syllable stressed, the syntactic function (e.g. noun, verb, etc.) and meaning of a word may radically
change

(table)
Depending on where stress is, its noun or verb function can be determined.

(table)

"...the stress pattern of a word is an important part of its identity for the native speaker. There is a great deal
of evidence that native speakers rely very much on the stress pattern of a words when they are listening. In fact,
experiments have demonstrated that often when a native speaker mishears a word, it is because the foreigner
has put the stress in the wrong place, not because he or she mispronounced the sounds of the word."

The following can be inferred from this quotation. First, stress is definitely an issue in English communication.
Secondly, absence or misplacement of it can cause comprehension problems. Thirdly, stress is relatively more
important than the pronunciation of words. Finally, advanced learners of English need to be observant as well as be
able to identify and use stress in words. It is evident from the preceding exposition of the subject that not all syllables
of a polysyllabic English word receive the same level of stress. In the examples we have examined so far, there are
two levels of stress: stressed syllable and unstressed syllable.

Words that have three syllables can have three levels of stress: primary stress, secondary stress, weak stress. If a word
has more than three syllables, then it can have four levels of stress: primary, secondary, tertiary and weak. It is one
thing to identify levels of stress in a word that is pronounced for experimental purposes; quite another to say a word in
a sentence as part of the flow of a conversation. The above-mentioned levels are usually valid for the former kind.
Although dictionaries can display up to three levels of stress relatively comfortably, they can hardly give a four level
of stress. Remember that dictionaries examine words in isolation, not in context.

As opposed to isolated pronunciation of words, in connected speech, usually two levels of stress appear to be
perceptible, to non-native speakers in particular, regardless of the number of syllables: stressed and unstressed. What
is known as the primary stress is accepted as the stressed syllable while the rest (i.e. secondary, tertiary, and weak) are
rendered as unstressed syllables.

2. SENTENTIAL STRESS:

An examination of quite a number of utterances (i.e. sentences) in English (both in British or American), reveals that
almost every utterance will have a main stress, located in one of its words. Such stress is known as 'sentence stress' in
the literature. Which word in the following sentence has the sentence stress?

I'm a student.

Your guess is right: it is on student. Because student has more than one syllable, we need to take another step and
identify the stressed syllable. Which syllable in student gets the stress? The answer is the first syllable: student
Under normal circumstances, words that carry higher information content in the utterance are given higher stress than
those carrying lower information as well as those that are predictable from the context. It is generally the case that one
word is stressed more than any other since it possesses the highest information content for an utterance in a discourse;
stressed item informs the hearer most.

All words can be divided into two in view of what they do in a sentence: content and function words. Content words
are nouns, verbs, adjectives, and adverbs while function words are articles, prepositions, conjunctions, and modal
auxiliaries.

Generally speaking, then, it is the content words that receive the sentential stress in English.

Table 3.19 Stressable lexical items in normal discourse

In other words, the items on the left-hand column are (potentially) stressable in unmarked utterances whereas the ones
on the right column are not stressable in the normal course of speaking. What we have so far explained is in fact one
of the four types of sentence stress, known as 'tonic stress'.

Types of Sentence Stress:

Four major types of stress are identified at the utterance level:


• Tonic Stress (i.e. Sentential),
• New Information Stress,
• Emphatic Stress, and
• Contrastive Stress.

Tonic stress is characterized by reporting, proclaiming statements.


(a) She dislikes lectures.
(b) Show them how they can get out of the building.

Both sentences have tonic stress, which are in content words at the sentence initial position. Tonic stress is regarded as
the unmarked sentential stress as it is the most widespread one in English. The following three types of stress are so-
called marked stress types, as they deviate from the usual tonic stress.

The second type of stress is called 'New Information Stress'. As the name suggests, the new information in an
utterance is stressed. New information can typically be found in the answers to wh-questions.
Consider the following in which new information stress is underlined:

A) Where are you from?


B) I'm from Bonn.

The third type of stress is called 'Emphatic Stress', which can be common in the so-called emphatic speeches.
Emphatic stress can generally be seen on
(a) emphatic reflexives such as himself and own
(b) adverbs such as very, so, and such
(c) emphatic words by nature such as indeed, terribly, and utterly.

(i) I cook my own dinner.


(ii) She was so furious.
(iii) I'm terribly sorry.

The final type of stress is 'Contrastive Stress', which is used to show contrast in the same sentence or across
sentences.
Study the exchange below how contrastive stress can be used:

A) How are you?


B) I'm OK. How are you?
A) I'm not too bad.

3. TONES:

Tones are another paralinguistic tool for speakers to send several indirect messages about their strategies in
conversation, revealing their attitudes and so forth. Imagine you are asked to pronounce Yes in as many different ways
as you can: what changes can you possibly apply? Contemplate the following and try to produce a pronunciation that
can match the meaning given:

(Table)

Let us get to know the terms for these four basic tones:
I. /  / indicates the fall tone
II. /  / indicates the low-rise tone
III. /  / indicates high-rise tone
IV. /   / indicates the fall-rise tone

OVERVIEW
PROSODY:

Prosody in phonology refers to the patterns of rhythm, stress, and intonation in spoken language. It encompasses the
aspects of speech that go beyond the actual words and sounds (phonemes) to include how those sounds are delivered.
Prosody plays a crucial role in conveying meaning, emotion, and structure in speech.

Key elements of prosody include:


1. Intonation: The variation in pitch during speech. It helps convey questions, statements, emphasis, and
emotional tones. For example, a rising intonation at the end of a sentence often signals a question.
2. Stress: The emphasis placed on certain syllables or words. Stress can change the meaning of a sentence or
distinguish between words that otherwise look the same. For instance, in English, the word "record" can be a
noun (with stress on the first syllable) or a verb (with stress on the second syllable).
3. Rhythm: The pattern of sounds and silences in speech. Rhythm is influenced by the alternation of stressed
and unstressed syllables, as well as the timing of speech segments.
4. Tempo: The speed at which speech is delivered. Variations in tempo can reflect different meanings or
emotions, such as urgency, calmness, or hesitation.
5. Pauses: Breaks or silences in speech that can help structure sentences, emphasize points, or indicate a change
in thought.

Prosody is essential in communication because it helps listeners interpret the speaker's intent, emotion, and the
structure of the information being conveyed. It also contributes to the naturalness and fluency of speech.

COMPUTATIONAL PHONOLOGY

COMPUTATIONAL PHONOLOGY:

Computational phonology is an interdisciplinary field that combines elements of phonology, the branch of linguistics
concerned with the study of sound systems in languages, with computational techniques. Its primary goal is to use
algorithms, formal models, and computer programs to understand, model, and analyze the phonological structure of
languages.

CORE CONCEPTS OF PHONOLOGY:

Before diving into computational phonology, it's important to understand some basic concepts in traditional
phonology:

1. Phonemes: The smallest units of sound that can distinguish meaning in a language. For example, in English,
the sounds /p/ and /b/ in "pat" and "bat" represent different phonemes.
2. Allophones: Variations of a phoneme that do not change word meaning. For instance, the /p/ sound in "spin"
(unaspirated) and "pin" (aspirated) are allophones of the same phoneme in English.
3. Phonological Rules: Systematic processes that govern how phonemes are realized as different allophones in
specific linguistic environments. For example, in English, the phonological rule that turns a voiced sound into
its voiceless counterpart at the end of words is a form of devoicing.
4. Phonotactics: The rules governing the permissible combinations of phonemes in a particular language. For
example, in English, a word can start with the cluster /str/ (as in "street"), but not with /sr/.

COMPUTATIONAL PHONOLOGY: OVERVIEW:

Computational Phonology refers to the application of computational methods to model and analyze these
phonological concepts. It allows linguists to formalize phonological rules, test theories, and simulate language
processes using computer algorithms.

GOALS OF COMPUTATIONAL PHONOLOGY:

 Modeling Phonological Processes: Create computational models that can simulate how phonological rules
apply in different languages. These models can be used to predict the surface forms of words given their
underlying representations.
 Speech Recognition and Synthesis: Improve systems that convert spoken language into text (speech
recognition) and text into spoken language (speech synthesis) by incorporating accurate phonological models.
 Language Learning and Acquisition: Use computational models to simulate and study how humans learn
the phonological systems of their native languages or second languages.
 Testing Phonological Theories: Provide tools to linguists to test and refine phonological theories by
comparing predicted outcomes from computational models with actual linguistic data.
 Cross-Linguistic Phonology: Analyze and compare phonological patterns across different languages to
identify universal trends or language-specific characteristics.

TASKS OF COMPUTATIONAL PHONOLOGY:

TECHNIQUES AND TOOLS IN COMPUTATIONAL PHONOLOGY:

1. Finite-State Automata and Transducers (FSTs):


o Finite-State Automata (FSA) are abstract machines used to model phonological patterns by
representing possible sequences of sounds (phonemes) in a language.
o Finite-State Transducers (FSTs) extend FSAs by allowing transformations between different levels
of representation (e.g., underlying phonemes to surface allophones). They are useful for modeling
phonological rules that change one sound into another in specific contexts.
2. Constraint-Based Models:
o Optimality Theory (OT): A prominent theory in phonology that can be implemented
computationally. OT posits that surface forms of words are determined by a set of competing
constraints. These constraints are ranked differently in different languages, which explains cross-
linguistic variation in phonological patterns. Computational models of OT evaluate and rank possible
word forms to select the most optimal one according to the given constraints.
3. Feature-Based Models:
o Phonemes can be represented by distinctive features (e.g., [+voice], [-nasal]), and computational
models can manipulate these features to generate or analyze phonological forms. These models allow
for the encoding and manipulation of phonological knowledge in a more detailed and structured way.
4. Machine Learning and Statistical Models:
o Used to identify patterns in large phonological corpora or to predict phonological phenomena based
on statistical regularities in the data. For example, machine learning can be used to automatically
induce phonological rules from a dataset of word forms.
5. Corpus Phonology:
o Involves the use of large databases of spoken or transcribed language data to analyze phonological
patterns on a large scale. Computational tools can be used to search for, analyze, and model these
patterns, providing insights into how sounds behave across different languages and dialects.

APPLICATIONS OF COMPUTATIONAL PHONOLOGY:

1. Speech Technology:
o Speech Recognition: Computational phonology contributes to more accurate recognition of spoken
words by modeling how sounds vary depending on context (e.g., different pronunciations of the same
word).
o Speech Synthesis: Enhances text-to-speech systems by ensuring that generated speech sounds natural
and adheres to the phonological rules of the target language.
2. Linguistic Research:
o Provides tools for linguists to test hypotheses about phonological rules and patterns, using
computational simulations to validate theoretical models.
3. Language Learning:
o Assists in the development of language learning tools by modeling how phonological knowledge is
acquired and by creating exercises that help learners internalize the sound patterns of a new language.
4. Dialect and Language Variation:
o Analyzes differences in phonological systems across dialects or languages, providing insights into
how sound systems evolve and differ across regions and communities.
5. Clinical Linguistics:
o Used in the study of speech disorders by modeling normal and atypical phonological processes, aiding
in diagnosis and therapy design.

CHALLENGES AND FUTURE DIRECTIONS:

1. Complexity of Human Language:


o Human phonological systems are incredibly complex, and fully capturing this complexity in
computational models is a significant challenge.
2. Data Availability:
o High-quality, annotated phonological data for many languages is scarce, making it difficult to develop
and test models for less-studied languages.
3. Integration with Other Linguistic Levels:
o Integrating phonology with other levels of linguistic analysis (such as morphology, syntax, and
semantics) in computational models is an ongoing area of research.
4. Advances in Machine Learning:
o With the rise of deep learning and neural networks, new approaches to modeling phonology are
emerging, potentially offering more powerful and flexible tools but also raising questions about
interpretability and linguistic relevance.

Computational phonology is an evolving field that sits at the intersection of linguistics and computer science. It offers
powerful tools for understanding the sound systems of languages, testing linguistic theories, and developing
applications in speech technology and language education. As computational methods and linguistic theories continue
to advance, the potential for this field to contribute to our understanding of language and to practical applications in
technology will continue to grow.

WORDS AND MORPHEMES

WORDS:
 Words are the smallest free-standing units of meaning in a language.
 They can stand alone as independent units and convey meaning.
 Examples: "book," "run," "quickly."

MORPHEMES:

 Morphemes are the smallest units of meaning in a language.


 They may or may not stand alone as independent words.
 Morphemes can be classified into two main types: free morphemes and bound morphemes.

WHAT IS A MORPHEME?

A morpheme is the minimal grammatical unit within a language. Every word comprises one or more morphemes. A
standalone morpheme and a word are identical but when a root word becomes modify with addition of affixes, it
becomes word only.

Look at the examples:


 Listen, listener, listened, listening

The root is listen is standalone morpheme and a word at a same time. When the root word was modifies with affixes
like -s, -er, -ed and –ing it became a word consisting of two morphemes in each word.

TYPES OF MORPHEMES:

There are two main types of morphemes


1. Free morpheme
2. Bound morpheme

FREE MORPHEMES:

The morpheme that can standalone as a single word (as a meaningful unit) is called free morpheme. The free
morphemes are roots that are identical to words. Free morphemes are set of separate English word forms such as basic
nouns, verbs, adjectives, etc. When a free morpheme is used with bound morphemes, the basic word forms are
technically known as stems or roots.
Examples of free morphemes:
Sun (noun), dog (noun), walk (verb), and happy (adjective)
Free morpheme can standalone and cannot be subdivided further. ‘Sun’ or ‘dog’ are ‘free morphemes because they
cannot be further split up, therefore the stems that cannot divide further are also called roots.

Free morphemes are divided into two categories: Lexical morphemes and functional morphemes.

 Lexical morphemes (Open Class):

Lexical morphemes are a set of content words like nouns, verbs, adjectives, and adverbs. They can be
understood fully e.g. run, blue, slow, paper, small, throw, and now. Lexical morphemes depict dictionary
meaning of a word that is attributed to a specific referent.

 Functional Morphemes (Close Class):

Functional morphemes are set of functional words like conjunctions, prepositions, articles, pronouns, auxiliary
verbs, modals and quantifiers. Some examples of functional morphemes are and, near, when, on, because, but,
it, in, that, the, and above. Functional morphemes perform as a relationship between one lexical morpheme
and another. A functional morpheme modifies the meaning, rather than supplying the root meaning of the
word. It encodes grammatical meaning e.g.; the players entered the ground. In this sentence, ‘the’ is functional
morpheme, which is specifying players and ground.

BOUND MORPHEMES:

Segments that cannot standalone and occur with another root/stem are called bound morphemes. Bound morphemes
are also called affixes (prefixes, suffixes and infixes) in English. Two bound morpheme cannot occur together, but it is
necessary for a bound morpheme to occur with a root/stem.

Examples of bound morphemes are :


 Opened: (Open + ed) = root + suffix
 Reopen: (Re + open) = Prefix + root
 Men: (Man + plural) = root + infix (infix makes a change inside a root word)

The set of affixes that make up the category of bound morphemes can also be divided into two types. Derivational
morphemes and inflectional morphemes.

 Derivational Morphemes:

Derivational morphemes change the grammatical categories of words. For example, the word ‘bake’ (verb) is
a root word (free morpheme) and when we add bound morpheme ‘er’(a suffix) with stem: it becomes baker (a
noun), So the grammatical category was changed from verb to noun.
Derivational morphemes can either be class-changing or class-maintaining, depending on the change in the
word. Class-maintaining maintain words’ original form and don’t change grammatical category of the word
while class changing change the meaning of the root word to a new word.

Class Maintaining: These prefixes do not change the grammatical category of the base word.

Examples:
 a- (amoral)
 ex- (ex-member)
 non- (nonsense)
 mini- (miniskirt)

Class Changing: These prefixes can change the grammatical category of the base word.

Examples:
 a- (asleep)
 re- (renew)
 en- (enrich)

Class Maintaining: These suffixes do not change the grammatical category of the base word.
Examples:
 -ess (actress)
 -let (booklet)
 -ling (duckling)

Class Changing: These suffixes change the grammatical category of the base word.

Examples:
 -ate (passionate)
 -er (reader)
 -ee (employee)

 Inflectional Morphemes:

An inflectional morpheme is a suffix that is added to a word to assign a particular grammatical property to
that word. For example, liste +ing = listening or boy+s = boys. They do not change the essential meaning or
the grammatical category of a word. Inflectional morphemes serve as grammatical markers that indicate tense,
number, possession, or comparison.

 Bound Base:

Bound bases serve as roots for only derivational morphemes and cannot stand alone. Example of the bound
base morphemes is given below where –ceive is a bound base morpheme that cannot stand alone but it can
form a word when used with prefixs:

-ceive (receive, perceive, deceive)

ALLOMORPHS:

Allomorphs are non-meaningful variants of a morpheme. For example, the -s plural takes three distinct phonological
forms, [s], [z], and [ɪz], in the words boys [bɔɪz], books [bʊks], and dishes [dɪʃɪz]. These phonological distinctions are
considered non-meaningful, making these allomorphs of the -s plural morpheme.

SEGMENTATION

SEGMENTATION:

Segmentation in NLP (Natural Language Processing) refers to the process of dividing a continuous sequence of text
into smaller, meaningful units or segments. These segments can be words, phrases, sentences, or even larger chunks,
depending on the specific task or context. Segmentation plays a crucial role in various NLP tasks, and the choice of
segmentation depends on the specific requirements of the task at hand. Here are a few examples of segmentation
techniques commonly used in NLP:

TYPES OF SEGMENTATION:

1. Word Segmentation: Word segmentation involves dividing a sentence or a continuous string of text into individual
words. In languages like English, words are generally separated by spaces, making word segmentation relatively
straightforward. However, in languages like Chinese or Thai, which do not have explicit word delimiters, word
segmentation becomes more challenging.

2. Sentence Segmentation: Sentence segmentation involves dividing a paragraph or a document into individual
sentences. This is particularly useful for tasks such as text summarization, machine translation, or sentiment analysis,
where the meaning of the text is often tied to the sentence level.

3. Phrase Segmentation: Phrase segmentation involves identifying and extracting meaningful phrases or multi-word
expressions from a text. This can be useful in tasks like named entity recognition or extracting collocations, where
certain phrases carry specific information or have a different semantic meaning than their individual words.

4. Chunking: Chunking involves identifying and grouping together syntactically related words in a sentence. This
helps in extracting higher-level information such as noun phrases, verb phrases, or prepositional phrases, which can be
useful for tasks like information extraction or parsing.

[Link] Segmentation: Subword segmentation involves dividing words into smaller subunits or subword units.
This technique is often used in morphologically rich languages or for handling out-of-vocabulary words. Subword
segmentation methods, such as Byte Pair Encoding (BPE) or WordPiece, can capture morphological variations and
improve the performance of various NLP models.

Segmentation techniques are often language-dependent, and different languages may require different approaches.
Additionally, the choice of segmentation can also vary depending on the specific NLP task and the available resources
or tools.

APPLICATIONS OF SEGMENTATION IN NLP:

1. Machine Translation: Proper segmentation is crucial for translating text from one language to another,
especially for languages where word boundaries are not explicit.

2. Speech Recognition: Segmentation helps in identifying where words or sentences start and end in continuous
speech, which is critical for accurate transcription.

3. Information Retrieval: Segmenting text into meaningful units allows search engines to retrieve and rank
documents based on relevance more effectively.

4. Text Summarization: Segmentation aids in identifying the most important sentences or paragraphs to include
in a summary.

5. Sentiment Analysis: Sentence segmentation is crucial in determining the sentiment of individual sentences,
especially in reviews or social media posts.

TECHNIQUES FOR SEGMENTATION:

1. Rule-based Methods: These involve using predefined rules, such as punctuation marks for sentence
segmentation or spaces for word segmentation in languages like English. Rule-based methods are simple but
can be limited in handling exceptions or complex cases.

2. Statistical Methods: These methods use probabilistic models like Hidden Markov Models (HMMs) to predict
the most likely segmentation points in a text. They can adapt to various contexts but require large annotated
datasets.

3. Machine Learning-based Methods: These involve training models, such as Conditional Random Fields
(CRFs) or deep learning models, to automatically learn segmentation patterns from data. These methods can
handle complex cases but require substantial computational resources and labeled data.

4. Hybrid Methods: These combine rule-based and statistical/machine learning approaches to leverage the
strengths of each. For example, rules can be used for initial segmentation, with machine learning models
refining the results.

5. Subword Tokenization: Techniques like Byte Pair Encoding (BPE) or WordPiece are used to segment words
into subwords, especially useful in languages with rich morphology or for handling rare words in machine
translation.

CHALLENGES IN SEGMENTATION:

1. Ambiguity: Text can be ambiguous, making it challenging to determine the correct segmentation. For
example, abbreviations, decimal points, or ellipses can confuse sentence segmentation.

2. Language Variability: Different languages have different rules and structures, making it hard to develop a
one-size-fits-all segmentation approach. For instance, Chinese or Japanese text segmentation is more complex
due to the lack of spaces between words.
3. Out-of-vocabulary Words: Handling words that are not in the training data, especially in subword
segmentation, can be challenging, leading to incorrect segmentation.

4. Context Dependency: Segmentation often depends on the context, requiring models to understand the
surrounding text to make accurate decisions. This is particularly challenging in informal or creative writing,
such as social media posts.

5. Resource Requirements: Advanced segmentation techniques, particularly those based on machine learning,
require large amounts of labeled data and computational power, which may not always be available.

Segmentation is a crucial step in NLP that impacts the performance of various downstream tasks. While simple rule-
based methods may suffice for certain applications, more complex tasks often require sophisticated statistical or
machine learning-based techniques. Despite advances, challenges such as ambiguity, language variability, and
resource constraints continue to make segmentation a non-trivial problem in NLP.
4o

You might also like