Module Lecture2 2x2
Module Lecture2 2x2
Lexical Semantic
Parsing
Source Analysis AnalysisDecorated
Tokens AST
code AST
We are here
Last modified: Thu Mar 8 15:13:16 2018 CS164: Lecture #2 1 Last modified: Thu Mar 8 15:13:16 2018 CS164: Lecture #2 2
• Token consists of syntactic category (like “noun” or “adjective”) plus • Regular expressions denote formal languages, which are sets of strings
semantic information (like a particular name). (of symbols from some alphabet).
• Parsing (the “customer”) only needs syntactic category: • Appropriate since internal structure not all that complex yet.
– “Joe went to the store” and “Harry went to the beach” have same • Expression R denotes language L(R):
grammatical structure.
– L(ǫ) = L("") = {""}.
• For programming, semantic information might be text of identifier – If c is a character, L(c) = {"c"}.
or numeral.
– If R1, R2 are r.e.s, L(R1R2) = {x1x2|x1 ∈ L(R1), x2 ∈ L(R2)}.
• Example from Notes: – L(R1|R2) = L(R1) ∪ L(R2).
Last modified: Thu Mar 8 15:13:16 2018 CS164: Lecture #2 3 Last modified: Thu Mar 8 15:13:16 2018 CS164: Lecture #2 4
Abbreviations Extensions
• Character lists, such as [abcf-mxy] in Java, Perl, or Python. • “Capture” parenthesized expressions:
• Negative character lists, such as [^aeiou]. – After m = re.match(r’\s*(\d+)\s*,\s*(\d+)\s*’, ’12,34’), have
m.group(1) == ’12’, m.group(2) == ’34’.
• Character classes such as . (dot), \d, \s in Java, Perl, Python.
• Lazy vs. greedy quantifiers:
• L(R+) = L(RR∗).
– re.match(r’(\d+).*’, ’1234ab’) makes group(1) match ’1234’.
• L(R?) = L(ǫ|R).
– re.match(r’(\d+?).*’, ’1234ab’) makes group(1) match ’1’.
• Boundaries:
– re.search(r’(^abc|qef)’, L) matches abc only at beginning of
string, and qef anywhere.
– re.search(r’(?m)(^abc|qef)’, L) matches abc only at begin-
ning of string or of any line.
– re.search(r’rowr(?=baz)’, L) matches an instance of ‘rowr’,
but only if ‘baz’ follows (does not match baz).
– re.search(r’(?<=rowr)baz’, L) matches an instance of ‘baz’,
but only if immediately preceded by ‘rowr’ (does not match rowr).
• Non-linear patterns: re.search(r’(\S+),\1’, L) matches a word
followed by the same word after a comma.
Last modified: Thu Mar 8 15:13:16 2018 CS164: Lecture #2 5 Last modified: Thu Mar 8 15:13:16 2018 CS164: Lecture #2 6
An Example Problems
Last modified: Thu Mar 8 15:13:16 2018 CS164: Lecture #2 7 Last modified: Thu Mar 8 15:13:16 2018 CS164: Lecture #2 8
Some Problem Solutions