0% found this document useful (0 votes)
52 views

Syntax Analysis

The parser obtains tokens from the lexical analyzer and verifies that the string of tokens can be generated by the grammar of the source program. It constructs a parse tree and passes it to the rest of the compiler. There are three main types of parsers: universal, top-down, and bottom-up. Top-down parsers build the parse tree from the top down while bottom-up parsers build from the leaves up. Context-free grammars are used to formally describe the syntax or structure of a language. They consist of terminals, non-terminals, production rules, and a start symbol. Derivations apply production rules to generate strings from the start symbol. Parse trees provide a graphical representation of derivations. Ambiguous gramm

Uploaded by

Nakib Ahsan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Syntax Analysis

The parser obtains tokens from the lexical analyzer and verifies that the string of tokens can be generated by the grammar of the source program. It constructs a parse tree and passes it to the rest of the compiler. There are three main types of parsers: universal, top-down, and bottom-up. Top-down parsers build the parse tree from the top down while bottom-up parsers build from the leaves up. Context-free grammars are used to formally describe the syntax or structure of a language. They consist of terminals, non-terminals, production rules, and a start symbol. Derivations apply production rules to generate strings from the start symbol. Parse trees provide a graphical representation of derivations. Ambiguous gramm

Uploaded by

Nakib Ahsan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Syntax Analysis

Md Mehrab Hossain Opi


Role of the Parser
 The parser
 Obtains a string of token from the lexical analyzer.
 Verifies the string can be generated by the grammar of
source program.
 Report any syntax errors.
 Recover from commonly occurring errors.
Role of the Parser
 The parser constructs a parse tree and passes it to
the rest of the compiler.
token
source Lexical parse Rest of Intermediate
Parser
program Analyzer get next tree Front End representation
token

Symbol Table

Fig 1: Position of Parser in Compiler Model.


Role of the Parser
 There are three general types of parser for grammars
 Universal
 Top-down
 Bottom-up
 Universal methods like CYK or Earley’s algorithm can
parse any grammar.
 Too slow for compiler.
Role of the Parser
 Top-down
 Builds parse tree from the top (root) to the bottom (leaves).
 Bottom-up
 Starts from the leaves and work their way up to the root.
 Input is scanned from left to right.
 Most efficient top-down and bottom-up methods work
only for sub-classes of grammars.
Syntax Error Handling
 Goal of error handler
 Report the presence of errors clearly and accurately.
 Recover from each error quickly enough to detect subsequent
errors.
 Add minimal overhead to the processing of correct programs.
Error-Recovery Strategies
 Common recovery strategies
 Panic-Mode Recovery
 Phrase-Level Recovery
 Error-Productions
 Global-Correction.
Panic-Mode Recovery
 On discovering error
 Parser discards input symbol one at a time
 Until one of a designated set of synchronizing tokens in found.

 Synchronizing tokens are usually delimiters


 Semicolon or }, whose role is clear and unambiguous.
 Simple and guaranteed not to go into an infinite loop.
Phase-Level Recovery
 On discovering error
 Perform local correction on remaining input.
 Replace a prefix to continue.

 Replacement must not lead to infinite loop.


 Can’t perform if the actual error has occurred earlier.
Error Productions
 Anticipate common errors
 Construct productions that generate the erroneous
lines.
Global Correction
 There are algorithms for choosing a minimal sequence
of changes to obtain a globally least-cost correction.
 Given an incorrect string x and Grammar G
 These algorithm will find a parse tree for a related string y,
such that the number of insertions, deletions, and change of
tokens is as small as possible.
 Too costly to implement in terms of time and space.
Context-Free Grammars
 A formal notation to describe the syntax or structure
of a formal language.
 Formally, a CFG consists of
 A finite set of Terminals
 A finite set of Non-terminals
 A finite set of production rules
 A start symbol.
Context-Free Grammars
 Terminals
 Basic symbols from which strings are formed.
 token name is a synonym for terminal.
 First component of tokens output of lexical analyzer.
 Non-terminals
 Syntactical variables that denote sets of strings.
 Help define the language generated by the grammar.
 Impose a hierarchical structure on the language.
Context-Free Grammar
 Production rules
 Specify the manner in which the terminals an non-terminals
can be combined.
 Each production consists of
 A non-terminal called the head or left side of the production
 The symbol →
 A body or right side consisting of zero on more terminals and
non-terminals.
 One non-terminal is distinguished as the start symbol.
Notational Conventions
 Terminals
 Lowercase letters early in the alphabet. a, b, c.
 Operator symbols such as +, -, *, etc.
 Punctuation symbols – parentheses, comma, etc.
 Digits
 Boldface strings id, if, etc.
Notational Conventions
 Non-terminals
 Uppercase letters –A, B, C.
 The letter S normally the start symbol
 Lowercase, italic names – expr, stmt
 Uppercase letters late in the alphabet – X, Y, Z
represent grammar symbol.
 Lowercase letters late in the alphabet – x, y, z
represents string of terminals.
 Greek letters α, β, γ string of grammar symbol.
Notational Convention
 A set of productions with a common head A can be
written as A→α1| α2 …| αk.
 Unless stated otherwise, the head of the first
production is the start symbol.
Example
 We will be using the grammar a lot

expression → expression + term


expression → expression – term
expression → term
term → term * factor
term → term / factor
term → factor
factor → (expression)
factor → id
Example
 Using the notational convention

E→E+T|E–T|T
T → T * F | T/F | F
F → ( E ) | id
Derivations
 Start with the start terminal.
 At each step replace a non-terminal by the body of
one of its production.
 Consider the grammar

E → E+E | E*E | -E | (E) | id


Derivations
 For the statement E → -E, we say
 E derives –E.
 A sequence of replacement is called derivation.

E → -E [ E → -E]
→ -(E) [ E → (E)]
→ -(id) [ E → id ]

 Derivation of –(id) from E.


 Proves that –(id) is an instance of an expression.
Derivations
 For a sequence of derivation
α1 → α2 → . . . → αn
 We say α1 derives αn in 0 or more steps.
 We write

 Similarly means derived in one or more steps.


Derivations
 If , where S is the start symbol of grammar G,
then is a sentential form of G.
 A sentence of G is a sentential form with no non-
terminals.
 The language generated by a grammar is its set of
sentences.
Derivations
 At each step of derivation we make two choices
 Which non-terminal to replace.
 Which production of that non-terminal to use.

 Leftmost Derivations
 Leftmost non-terminal is always chosen.
 Defined as
 Rightmost derivations
 Rightmost non-terminal is always chosen.
 Defined as
 Also called canonical derivation.
Parse Tree
 A graphical representation of a derivation.
 Each interior node represents the application of a
production.
 Interior node is labeled with the non-terminal A in the
head of the production.
 The children of the node are labeled from left to right,
by the symbols in the body of the production.
Parse Tree
E
Parse tree for the derivation of
-(id + id)
- E

( E )

E + E

id id
Ambiguity
 A grammar that produces more than one parse tree
for some sentence is said to ambiguous.
 Consider the two leftmost derivations for the sentence

id+id*id

𝐸 𝐸
Ambiguity

E E

E + E E * E

id E * E E + E id

id id id id
CFG vs Regular Expression
 CFG are more powerful than regular expressions.
 Every construct that can be described by a regular
expression can be described by a grammar.
 Not vice versa.
Lexical vs Syntactic Analysis
 Why use both regular expression and CFG?
 Separation modularizes the front end of a compiler
into two manageable-sized component.
 Lexical rules are quite simple
 No need of CFG.
 RE provides more concise and easier-to-understand
notation for tokens than grammar.
Eliminating Ambiguity
 Rewriting an ambiguous grammar can resolve
ambiguity sometimes.
 Consider the grammar

Here, other stands for any other statement.


Eliminating Ambiguity
 The grammar is ambiguous.
 Consider the sentence

stmt

if expr then stmt

E1

if expr then stmt else stmt

E2 S1 S2
Eliminating Ambiguity
 Another parse tree for

stmt

if expr then stmt else stmt

E1
S2

if expr then stmt

E2 S1
Dangling else
 Which parse tree should we consider as correct one?
 The first parse tree is preferred in programming
language.
 The rule is “Match each else with the closest
unmatched then”.
Eliminating Ambiguity
 We can convert the grammar into an unambiguous
one.
Left Recursion
 A grammar is left recursive if it has a non terminal A
such that there is a derivation for some string .
 Immediate left recursion occurs when there’s
production
 Top down parsing method can not handle it.
 How do we resolve it?
Immediate Left Recursion Elimination
 Any production can be replaced with

 To eliminate any number of immediate left recursion


 First group the production

 No begins with an A.
 Replace the A-productions by
Immediate Left Recursion Elimination
 Consider the example.

E→E+T E → TE’
E→E–T E’ → +TE’ | - TE’ |
E→T T → FT’
T→T*F T’ → *FT’ |
T → T/F F → ( E ) | id
T→F
F → ( E ) | id
Left Recursion Problem
 Look at the following grammar

 The non-terminal S is recursive because


 But it is not immediate left recursive.
 How do we eliminate this?
Elimination of Left Recursion
 Algorithm to remove left recursion.

Input: Grammar G with no cycles or -production


Output: An equivalent grammar with no left recursion.
Method
1. Arrange the non-terminal in some order A1,A2,…,An.
2. for (each i from 1 to n){
3. for(each j from 1 to i-1){
4. replace each production of the form by the
production where
are all current productions
5. }
6. eliminate the immediate left recursion among the productin
7. }
Elimination of Left Recursion
 Let’s go back to our previous grammar

 We have non-terminals S and A.


 Let’s order them as S,A.
 No left recursion with S. Nothing happens on first
outer loop.
 For i=2, substitute for S in .

 Now eliminate the immediate left recursion.


Elimination of Left Recursion
 Finally we get
Left Factoring
 A grammar transformation
 Useful for producing grammar suitable for predictive, or top-
down parsing.
 Consider the grammar

 We can not decide which production to choose upon


seeing if.
Left Factoring
 In general, if where is non-empty.
 We do not know which grammar to expand if we find .
 However expanding might help.
 Rewriting the grammar we get

 Now we can expand A to upon finding


Left Factoring
 Algorithm to left factor a grammar

Input: Grammar G
Output: An equivalent left-factored grammar.
Method
For each non-terminal A, find the longest prefix common to two or more of its
alternatives. If replace all of the A-productions , where represents all alternatives that do
not begin with , by

Repeatedly apply this transformation until no two alternatives for a nonterminal have a
common prefix.
Left Factoring Example
 Consider the dangling-else example

 Here i , t, and e stands for if, then, else.


 E and S stands for conditional expression and
statement.
 Left-factored, we get
To be Continued.

You might also like