0% found this document useful (0 votes)

10 views52 pages

NLP3 - Lecture 3

Uploaded by

im.second007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views52 pages

NLP3 - Lecture 3

Uploaded by

im.second007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Finite State Transducers

Transducers
• Recognisers either accept or reject a
word.
• Although this is useful, networks can
actually return more substantial
information.
• This is achieved by providing networks
with the ability to write as well as to read.

27
Basic Transducer
• Each transition of a transducer is labelled with a
pair of symbols rather than with a single symbol.

• Analysis proceeds as before, except that input

symbols are matched against the lower-side
symbols on transitions.

• If analysis succeeds, return the string of upper-

side symbols on the path to the final state
Finite State Transducers
• The simple story
– Add another tape
– Add extra symbols to the transitions

– On one tape we read “cats”, on the other we

write “cat +N +PL”
Confusing Terminology
• Lower side = surface side.
• Upper side = "deep" side.
• Analysis proceeds from lower to upper.
• Synthesis (generation) proceeds from
upper to lower.
Finite-State Transducers (FST)

The symbols of the FST are complex: they’re

really pairs of symbols, one for each of two
“tapes” or levels.
Recognizer: decides if a given pair of
representations fits together “OK”
Generator: generates pairs of representations that
fit together
Translator: takes a representation on one level
and produces the appropriate representation on
the other level
Finite state transducers

• can be inverted, or

• Composed and you get another FST.

Four-Fold View of FSTs
• As a recognizer
• As a generator
• As a translator
• As a set relater –
A machine that computes within the
set of input string and output string
Formally, a finite transducer T is a 6-tuple (Σ,∆,Q E, I, F, ) such
that:
•Q is a finite set, the set of states;
•Σ is a finite set, called the input labels;
•∆ is a finite set, called the output labels;
•I is a subset of Q, the set of initial states;
•F is a subset of Q, the set of final states; and
• (where ε is the empty string) is the transition relation.
Finite State Transducers – Black
Box View
• We say T transduces w2  L2
the lower string w1
into string upper
string w2 in the
upward direction
(lookup) Finite State
Transducer
• We say T transduces T
the upper string w2
into string lower string
w1 in the downward
direction (lookdown)
• A given string may w1  L1
map to >=0 strings
Lexical Transducers

• In common parlance, a transducer is a

device which converts one form of energy
into another, e.g. a microphone converts
from sound to electrical signals.

• Lexical transducers which convert one

string of symbols into another.
29
Example
• A lexical transducer is a specialized finite-state
automaton that maps lexical forms and
morphological specifications to corresponding
inflected forms, and vice versa.
• For example, a lexical transducer for English
might relate the word
dining - dine+PresPart
swam - swim+Past.
Lexical Transducer Example
lexical string
C A S A
C A S E surface string

• Input: CASE
• Output: CASA

30
Morphological Analysis

C O N T A
C O N T 

 R

 O  
+SG +1P +V E

• Input: CONTO
• Output: CONTARE +V +1P +SG 31
Nominal Inflection FST
Remarks
 stands for "epsilon". During analysis, epsilon
transitions are taken freely without consuming any
input.

• Note also single symbols with multi-character

print names (e.g. +SG).

• The order of these symbols, and the choice of

infinitive as baseform, is determined by linguists.
Synthesis

• Transducers are reversible. This means

that they can be used to perform the
inverse transduction from an transducers.

• The process of synthesis is the inverse of

analysis
The Process of Synthesis
• Start at the start state and at the beginning
of the input string.
• Match the input symbols against the
upper-side symbols of the arcs,
consuming symbols until a final state is
reached.
• If successful, return the string of lower-
side symbols (else nothing).
35
Morphological Synthesis
C O N T A
C O N T 

 R

 O  
+SG +1P +V E

•Input: CONTARE +V +1P +SG

•Output: CONTO
•N.B.  symbols are ignored on output 36
Analysis and Synthesis
• Upper Side Language (Lexical Strings).
• Lower Side Language (Surface Strings).
• Transducer maps between the two.
• However large the lexical transducer may
become, analysis and synthesis are
performed by the same language-
independent matching techniques.

37
FSTs
Transitions
c:c a:a t:t +N:ε +PL:s

• c:c means read a c on one tape and write a c on the

other
• +N:ε means read a +N symbol on one tape and write
nothing on the other
• +PL:s means read +PL and write an s
English Plural
surface lexical
cat cat+N+Sg
cats cat+N+Pl
foxes fox+N+Pl
mice mouse+N+Pl
sheep sheep+N+Pl
sheep+N+Sg
Morphological Anlayser
To build a morphological analyser we need:
• lexicon: the list of stems and affixes, together
with basic information about them
• morphotactics: the model of morpheme
ordering (eg English plural morpheme follows
the noun rather than a verb)
• orthographic rules: these spelling rules are
used to model the changes that occur in a word,
usually when two morphemes combine (e.g.,
fly+s = flies)
Lexicon & Morphotactics
• Typically list of word parts (lexicon) and
the models of ordering can be combined
together into an FSA which will recognise
the all the valid word forms.
• For this to be possible the word parts must
first be classified into sublexicons.
• The FSA defines the morphotactics
(ordering constraints).
Sublexicons
to classify the list of word parts
reg-noun irreg-pl- irreg-sg- plural
noun noun
cat mice mouse -s

fox sheep sheep

geese goose
FSA Expresses Morphotactics
(ordering model)
Intermediate Form to Surface
• The reason we need to have an
intermediate form is that funny things
happen at morpheme boundaries, e.g.
cat^s  cats
fox^s  foxes
fly^s  flies
• The rules which describe these changes
are called orthographic rules or "spelling
rules".
More English Spelling Rules
• consonant doubling: beg / begging
• y replacement: try/tries
• k insertion: panic/panicked
• e deletion: make/making
• e insertion: watch/watches
• Each rule can be stated in more detail ...
Spelling Rules
• Chomsky & Halle (1968) invented a
special notation for spelling rules.
• A very similar notation is embodied in the
"conditional replacement" rules of xfst.
E -> F || L _ R
which means replace E with F when it
appears between left context L and right
context R
A Particular Spelling Rule
This rule does e-insertion

^ -> e || x _ s#
Typical Uses
• Typically, we’ll read from one tape using
the first symbol on the machine transitions
(just as in a simple FSA).
• And we’ll write to the second tape using
the other symbols on the transitions.
Composing Transducers -
Bin iki yüz yetmiş üç
Example Bin iki yüz yetmiş üç

Number to
Turkish Numeral Transducer
Compose English Numeral
to
1273 Turkish Numeral
Transducer
English Numeral to
Number Transducer

One thousand two hundred seventy three One thousand two hundred seventy three
Multi-Tape Machines

• To deal with this we can simply add more

tapes and use the output of one tape
machine as the input to the next

• So to handle irregular spelling changes

we’ll add intermediate tapes with
intermediate symbols
Multi-Level Tape Machines

• We use one machine to transduce between the

lexical and the intermediate level, and another to
handle the spelling changes to the surface tape
Stage 1:
Lexical  Intermediate Levels
• Example:
– g o o s e +N +PL (lexical)
– g e e s e # (intermediate)
• Example:
– g o o s e +N +SG (lexical)
– g o o s e # (intermediate)
• Example:
– m o u s e +N +PL (lexical)
– m i  c e # (intermediate)
• Example:
– s h e e p +N +PL (lexical)
– s h e e p # (intermediate)
Morphological Analysis
• Morphological happy+Adj+Sup
analysis can be seen
as a finite state
transduction
Finite State
Transducer
T

happiest  English_Words
Morphological Analysis as FS
Transduction
• First approximation

• Need to describe
– Lexicon (of free and bound morphemes)

– Spelling change rules in a finite state

framework.
The Lexicon as a Finite State
Transducer
• Assume words have the form prefix+root+suffix where the prefix
and the suffix are optional.
So:
Prefix = [P1 | P2 | …| Pk]
Root = [ R1 | R2| … | Rm]
Suffix = [ S1 | S2 | … | Sn]
Lexicon = (Prefix) Root (Suffix)

(R) = [R | ], that is, R is optional.

Tie, embark, happy, un+tie, dis+embark+ing,

in+decent, un+happy √
un+embark, in+happy+ing …. X
The Lexicon as a Finite State
Transducer
• Lexicon =
[ ([ u n +]) [ [t i e] | [ f a s t e n]] ([[+e d] | [ + i n g] | [+ s]]) ]
|
[ ([d i s +]) [ e m b a r k ] ([[+e d] | [ + i n g] | [+ s]])]
|
[ ([u n +]) [ h a p p y] ([+ e r])]
|
[ (i n +) [ d e c e n t] ]

Note that some patterns are now emerging tie, fasten, embark are verbs, but
differ in prefixes, happy and decent are adjectives, but behave differently.
The Lexicon

• The lexicon structure can be refined to a

point so that all and only valid forms are
accepted and others rejected.

• This is very painful to do manually for any

(natural) language.
Describing Lexicons
• Current available systems for morphology
provide a simple scheme for describing finite
state lexicons.
– Xerox Finite State Tools
– PC-KIMMO

• Roots and affixes are grouped and linked to

each other as required by the morphotactics.
• A compiler (lexc) converts this description to a
finite state transducer
Lexicon as a FS Transducer
h a p p y +Adj +Sup 0 0
h a p p y + e s t
.
s .
s
a. v e +Verb +Past 0
a v e + e d
t .
t .
a
. b l e +Noun +Pl
a b l e + s

. +
+Verb
. +Pres +3sg
. s 0

A typical lexicon will be represented with 105 to 106 states.

Lexicon as a FS Transducer
h a p p y +Adj +Sup 0 0
h a p p y + e s t
.
s .
s
a. v e +Verb +Past 0
a v e + e d
t .
t .
a
. b l e +Noun +Pl
a b l e + s

. +
+Verb
. +Pres +3sg
. s 0

Nondeterminism
The Lexicon Transducer
• Note that the lexicon transducer solves
part of the problem.
– It maps from a sequence of morphemes to
root and features.

– Where do we get the sequence of

morphemes?
Morphological Analyzer Structure
happy+Adj+Sup

Lexicon Transducer

happy+est

Morphographemic ????????
Transducer

happiest
Sneak Preview (of things to
happy+Adj+Sup
come) happy+Adj+Sup

Lexicon Transducer

Morphological
happy+est Compose
Analyzer/Generator

Morphographemic
Transducer

happiest happiest
The Morphographemic
Transducer
• The morphographemic transducer
generates
– all possible ways the input word can be
segmented and “unmangled”
– As sanctioned by the alternation rules of the
language
• Graphemic conventions
• Morphophonological processes (reflected to the
orthography)
The Morphographemic
Transducer
• The morphographic
transducers thinks:
happy+est – There may be a
morpheme boundary
between i and e, so let
me mark that with a +.
Morphographemic
Transducer – There is i+e situation,
now and
happiest – There is a rule that
says, change the i to a
y in this context.
– So let me output
happy+est
The Morphographemic
…
Transducer
happy+est
h+ap+py+e+st
• However, the
…. morphographemic
happiest
… transducer is
oblivious to the
Morphographemic lexicon,
Transducer – it does not really know
about words and
happiest morphemes,
– but rather about what
happens when you
combine them
Only some of these will actually be sanctioned by the lexicon
What kind of changes does the MG
Transducer handle?
• Insertions
– brag+ed  bragged
• Deletions
– (T) koy+nHn  koyun (of the bay)
– (T) alın+Hm+yA  alnıma (to my forehead)
• Changes
– happy+est  happiest
– (T) tarak+sH  tarağı (his comb)
– (G) Mann+er  Männer

Lecture 3
No ratings yet
Lecture 3
55 pages
NLP 39-48
No ratings yet
NLP 39-48
11 pages
Finite State Automata in Language Processing
No ratings yet
Finite State Automata in Language Processing
40 pages
7-Morphology Part2
No ratings yet
7-Morphology Part2
28 pages
Scan 27 Nov 23 09 21 15
No ratings yet
Scan 27 Nov 23 09 21 15
11 pages
Inf2a L15 Slides
No ratings yet
Inf2a L15 Slides
31 pages
Morphological Analysis
No ratings yet
Morphological Analysis
35 pages
Morphology in Natural Language Processing
No ratings yet
Morphology in Natural Language Processing
44 pages
English Morphology
No ratings yet
English Morphology
32 pages
Word Level Analysis and Morphology
No ratings yet
Word Level Analysis and Morphology
54 pages
Lexical and Morphological Analysis in NLP
No ratings yet
Lexical and Morphological Analysis in NLP
9 pages
Wordlevel Analysis - Chap2
No ratings yet
Wordlevel Analysis - Chap2
97 pages
Linguistics: Morphology Basics
No ratings yet
Linguistics: Morphology Basics
41 pages
Morphology and Finite-State Transducers
No ratings yet
Morphology and Finite-State Transducers
30 pages
675469663
No ratings yet
675469663
33 pages
Finnish, Turkish and Hungarian
100% (2)
Finnish, Turkish and Hungarian
12 pages
Word Level Analysis
No ratings yet
Word Level Analysis
49 pages
NLP Morphology for Linguists
No ratings yet
NLP Morphology for Linguists
8 pages
Morphology FST
No ratings yet
Morphology FST
47 pages
NLP Unit-1
No ratings yet
NLP Unit-1
12 pages
8-Morphology Part3
No ratings yet
8-Morphology Part3
27 pages
Understanding English Morphology Basics
No ratings yet
Understanding English Morphology Basics
5 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
10 pages
Natural Langauge Processsing Unit 2
No ratings yet
Natural Langauge Processsing Unit 2
16 pages
Lecture2 436n
No ratings yet
Lecture2 436n
140 pages
NLP - Sem
No ratings yet
NLP - Sem
31 pages
03-fs Morphology
No ratings yet
03-fs Morphology
37 pages
Finite Automata and Morphological Parsing
No ratings yet
Finite Automata and Morphological Parsing
18 pages
Introduction to Natural Language Processing
No ratings yet
Introduction to Natural Language Processing
19 pages
Finite State Transducers in NLP
No ratings yet
Finite State Transducers in NLP
4 pages
Finite-State Automata and Morphology
No ratings yet
Finite-State Automata and Morphology
17 pages
مورفولوجي
No ratings yet
مورفولوجي
10 pages
CCS369 - TSS-Unit 4
No ratings yet
CCS369 - TSS-Unit 4
30 pages
Morphological Analysis & Tools
No ratings yet
Morphological Analysis & Tools
27 pages
Understanding Regular Expressions
No ratings yet
Understanding Regular Expressions
85 pages
Finnish Morphology Lexicon Construction
No ratings yet
Finnish Morphology Lexicon Construction
64 pages
Morphological Analysis in NLP
No ratings yet
Morphological Analysis in NLP
30 pages
NLP Morphology & Transducers
No ratings yet
NLP Morphology & Transducers
26 pages
Linguistics Students' Guide
No ratings yet
Linguistics Students' Guide
4 pages
The Theory of Automata in Natural Language Processing
100% (1)
The Theory of Automata in Natural Language Processing
6 pages
Finite-State Transducers in NLP
No ratings yet
Finite-State Transducers in NLP
16 pages
Automata Theory for CS Students
No ratings yet
Automata Theory for CS Students
40 pages
Morphology
No ratings yet
Morphology
26 pages
Morphology Notes
No ratings yet
Morphology Notes
5 pages
Nlplect 10
No ratings yet
Nlplect 10
18 pages
Unit 1
No ratings yet
Unit 1
30 pages
Natural Language Processing Syllabus
No ratings yet
Natural Language Processing Syllabus
114 pages
Finite Automata For NLP
No ratings yet
Finite Automata For NLP
8 pages
Module 3: Morphology Morphological Parsing With Finite State
No ratings yet
Module 3: Morphology Morphological Parsing With Finite State
29 pages
Lecture 1 FS22
No ratings yet
Lecture 1 FS22
35 pages
2 NLP
No ratings yet
2 NLP
36 pages
Lecture 3 FS22
No ratings yet
Lecture 3 FS22
31 pages
Unit3 - Morphology and Finite State Transducers
100% (1)
Unit3 - Morphology and Finite State Transducers
55 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
74 pages
Phonological Rules
No ratings yet
Phonological Rules
41 pages
Speech Recognition With Weighted Finite-State Transducers: Mehryar Mohri Fernando Pereira Michael Riley
No ratings yet
Speech Recognition With Weighted Finite-State Transducers: Mehryar Mohri Fernando Pereira Michael Riley
31 pages
Morphology and Finite-State Transducers
No ratings yet
Morphology and Finite-State Transducers
98 pages
NLP 6 Lecture 1-R
No ratings yet
NLP 6 Lecture 1-R
37 pages
Example POS Tagging With HMM
No ratings yet
Example POS Tagging With HMM
2 pages
NLP1 Lecture1
No ratings yet
NLP1 Lecture1
22 pages
Memory Management in Operating Systems
No ratings yet
Memory Management in Operating Systems
162 pages
Networking in AWS
No ratings yet
Networking in AWS
76 pages
10th Rntse 2024-25 For Print
No ratings yet
10th Rntse 2024-25 For Print
16 pages
IoT Networking Fundamentals Explained
100% (1)
IoT Networking Fundamentals Explained
16 pages
Confinement Effects on Dye Diffusivity
No ratings yet
Confinement Effects on Dye Diffusivity
9 pages
Fertilizing Cotton for High Yield Quality
No ratings yet
Fertilizing Cotton for High Yield Quality
52 pages
Lecture 2
No ratings yet
Lecture 2
35 pages
Sui Smart Contracts Economic Overview
No ratings yet
Sui Smart Contracts Economic Overview
26 pages
CAH Series: Comfort Air Handling Units
No ratings yet
CAH Series: Comfort Air Handling Units
20 pages
System of Linear Equation
No ratings yet
System of Linear Equation
19 pages
Decision Making For Students
No ratings yet
Decision Making For Students
20 pages
IIT-JAM Physics Exam Scheme & Syllabus
No ratings yet
IIT-JAM Physics Exam Scheme & Syllabus
5 pages
?revision - Physics Practical Exam - Grade 12 - 24-25
No ratings yet
?revision - Physics Practical Exam - Grade 12 - 24-25
25 pages
GEO Lab Manual - 2024 bcv303
No ratings yet
GEO Lab Manual - 2024 bcv303
35 pages
KNX Project Design Guidelines
No ratings yet
KNX Project Design Guidelines
32 pages
FLC Cummins Celect Plus
100% (3)
FLC Cummins Celect Plus
18 pages
Tire Performance Measurement Techniques
No ratings yet
Tire Performance Measurement Techniques
49 pages
Sample ICSF EMV
No ratings yet
Sample ICSF EMV
21 pages
A Model of Musical Motifs
100% (1)
A Model of Musical Motifs
8 pages
16 J MST 88 2023 123 130 143
No ratings yet
16 J MST 88 2023 123 130 143
8 pages
NeurIPS 2023 Are Emergent Abilities of Large Language Models A Mirage Paper Conference
No ratings yet
NeurIPS 2023 Are Emergent Abilities of Large Language Models A Mirage Paper Conference
17 pages
Bridge Cost Design Manual PDF
No ratings yet
Bridge Cost Design Manual PDF
50 pages
5 For 5 Inverse Trig Topic 3.4 Video 2 Key
No ratings yet
5 For 5 Inverse Trig Topic 3.4 Video 2 Key
3 pages
Costing M23 Fast Track Concept Notes - CAPRANAV
No ratings yet
Costing M23 Fast Track Concept Notes - CAPRANAV
161 pages
KD240GX LPB
No ratings yet
KD240GX LPB
2 pages
National NBT45 PDF
No ratings yet
National NBT45 PDF
20 pages
Inventory Management Analysis with ROP
No ratings yet
Inventory Management Analysis with ROP
5 pages
Hyperloop Attitude Control Analysis
No ratings yet
Hyperloop Attitude Control Analysis
6 pages
EH3500AC Presentation
86% (7)
EH3500AC Presentation
69 pages
BSNL - SDCA - LDCA-2-Network Plans-I
80% (5)
BSNL - SDCA - LDCA-2-Network Plans-I
27 pages
Data Modeling in System Analysis
No ratings yet
Data Modeling in System Analysis
50 pages
Alk Crmu Pmgo 2211904
No ratings yet
Alk Crmu Pmgo 2211904
1 page

NLP3 - Lecture 3

Uploaded by

NLP3 - Lecture 3

Uploaded by

Finite State Transducers

• Analysis proceeds as before, except that input

• If analysis succeeds, return the string of upper-

– On one tape we read “cats”, on the other we

The symbols of the FST are complex: they’re

• Composed and you get another FST.

• In common parlance, a transducer is a

• Lexical transducers which convert one

• Note also single symbols with multi-character

• The order of these symbols, and the choice of

• Transducers are reversible. This means

• The process of synthesis is the inverse of

•Input: CONTARE +V +1P +SG

• c:c means read a c on one tape and write a c on the

fox sheep sheep

• To deal with this we can simply add more

• So to handle irregular spelling changes

• We use one machine to transduce between the

– Spelling change rules in a finite state

(R) = [R | ], that is, R is optional.

Tie, embark, happy, un+tie, dis+embark+ing,

• The lexicon structure can be refined to a

• This is very painful to do manually for any

• Roots and affixes are grouped and linked to

A typical lexicon will be represented with 105 to 106 states.

– Where do we get the sequence of

You might also like