0% found this document useful (0 votes)
89 views63 pages

Csc3205-Syntax - Analysis PDF

The document discusses syntax analysis in compiler design. It covers: - Syntax analysis parses a program by verifying it can be generated by the grammar rules of the language. It produces an internal representation called a parse tree. - Context-free grammars like BNF are used to describe the syntax of programming languages. Grammars benefit language designers and compiler writers by precisely defining a language's structure. - Parsing involves determining if a string of tokens can be generated by a grammar. Top-down and bottom-up parsers are two common approaches to parsing. Top-down builds the tree from the root node down, while bottom-up builds from the leaves up.

Uploaded by

KANSIIME KATE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views63 pages

Csc3205-Syntax - Analysis PDF

The document discusses syntax analysis in compiler design. It covers: - Syntax analysis parses a program by verifying it can be generated by the grammar rules of the language. It produces an internal representation called a parse tree. - Context-free grammars like BNF are used to describe the syntax of programming languages. Grammars benefit language designers and compiler writers by precisely defining a language's structure. - Parsing involves determining if a string of tokens can be generated by a grammar. Top-down and bottom-up parsers are two common approaches to parsing. Top-down builds the tree from the root node down, while bottom-up builds from the leaves up.

Uploaded by

KANSIIME KATE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Syntax Analysis

CSC 3205: Compiler Design

Marriette Katarahweire

26th February 2020

CSC 3205: Compiler Design 1/63


Phases of a Compiler

CSC 3205: Compiler Design 2/63


Phases of a Compiler

The analysis phase of a compiler breaks up a source program


into constituent pieces and produces an internal
representation for it, called intermediate code.
The synthesis phase translates the intermediate code into the
target program.
The syntax of a programming language describes the proper
form of its programs,
The semantics of the language defines what its programs
mean; that is, what each program does when it executes

CSC 3205: Compiler Design 3/63


Syntax Analysis - Basics

2nd phase in the compilation process


Part of front end analysis
Also known as parsing
The parser obtains a string of tokens from the lexical analyzer
and verifies that the string of token names can be generated
by the grammar for the source language.
Where lexical analysis splits the input into tokens, the purpose
of syntax analysis is to recombine these tokens. Not back into
a list of characters, but into something that reflects the
structure of the text. This “something” is typically a data
structure called the syntax tree of the text.
The parser should report any syntax errors in an intelligible
fashion and recover from commonly occurring errors to
continue processing the remainder of the program

CSC 3205: Compiler Design 4/63


Syntax Analysis - Basics ...

Every PL has precise rules that prescribe the syntactic


structure of well-formed programs.
for example in C, a program is made up of functions, a
function out of declarations and statements, a statement out
of expressions, and so on.
The syntax of PL constructs can be specified by context-free
grammars or BNF (Backus-Naur Form) notation
What benefits do Grammars offer for both language designers
and compiler writers?

CSC 3205: Compiler Design 5/63


Syntax Analysis

For well-formed programs, the parser constructs a parse tree and


passes it to the rest of the compiler for further processing. Parsing
is the process of determining how a string of terminals can be
generated by a grammar.

CSC 3205: Compiler Design 6/63


Context-Free Grammar

The syntax of a programming is described by a context-free


grammar CFG .
We will use BNF (Backus-Naur Form) notation in the
description of CFGs
The parser checks whether a given source program satisfies
the rules implied by a context-free grammar or not.
If it satisfies, the parser creates the parse tree of that program.
Otherwise the parser gives the error messages.
3 general types of parsers for grammars: universal, top-down,
and bottom-up.
The methods commonly used in compilers can be classified as
being either top-down or bottom-up

CSC 3205: Compiler Design 7/63


Parsers
Top-Down Parser: the parse tree is created top to bottom,
starting from the root to the leaves.
Bottom-Up Parser: the parse is created bottom to top;
starting from the leaves working their way up to the root.
Both top-down and bottom-up parsers scan the input from
left to right (one symbol at a time).
Efficient top-down and bottom-up parsers can be
implemented only for sub-classes of context-free grammars
because they are expressive enough:
LL for top-down parsing(Left-to-right scanning of input,
Left-most derivation)
LR for bottom-up parsing(Left-to-right scanning of input,
Right-most derivation)
Parsers implemented by hand often use LL grammars
Parsers for the larger class of LR grammars are usually
constructed using automated tools
CSC 3205: Compiler Design 8/63
Context Free Grammars

Inherently recursive structures of a programming language are


defined by a context-free grammar. In a context-free grammar, we
have:
A finite set of terminals (in our case, this will be the set of
tokens)
A finite set of non-terminals (syntactic-variables)
A finite set of productions rules in the following form:
A → α where A is a non-terminal and α is a string of
terminals and non-terminals (including the empty string)
A start symbol (one of the non-terminal symbol)
Example:
E → E + E |E − E |E ∗ E |E /E | − E
E → (E )
E → id

CSC 3205: Compiler Design 9/63


Context Free Grammars

Recall(Sections 2.2 and 4.2 in the Dragon Book):


Derivations: leftmost and rightmost
Grammar ambiguity and syntax trees
Operator precedence and associativity

CSC 3205: Compiler Design 10/63


Bottom-up Parsing

parsing starts with the input symbols and tries to construct


the parse tree up to the start symbol.
Input string: a + b * c
Production rules:
S →E
E →E +T
E →E ∗T
E →T
T → id

CSC 3205: Compiler Design 11/63


Bottom-up Parsing

Read the input and check if any production matches with the
input:
a+b∗c
T +b∗c
E +b∗c
E +T ∗c
E ∗c
E ∗T
E
S

CSC 3205: Compiler Design 12/63


Top-Down Parsing
The parse tree is constructed From the top; From left to right
Pick a production & try to match the input
• Bad “pick” - may need to backtrack
• Some grammars are backtrack-free (Predictive parsing)
Terminals are seen in order of appearance in the token stream:
t2 t5 t6 t8 t9
Methods: Recursive descent and Predictive parsing

CSC 3205: Compiler Design 13/63


Top-Down Parsing Methods

CSC 3205: Compiler Design 14/63


Top-Down Parsing

The grammar in the figure generates a subset of the statements of


C or Java.

CSC 3205: Compiler Design 15/63


Top-Down Parsing

The top-down construction of the parse tree.

CSC 3205: Compiler Design 16/63


Top-Down Parsing

The top-down construction of the parse tree: starting with the


root, labeled with the starting nonterminal stmt, and repeatedly
performing the following two steps:
1: At node N, labeled with nonterminal A, select one of the
productions for A and construct children at N for the symbols
in the production body.
2: Find the next node at which a subtree is to be constructed,
typically the leftmost unexpanded nonterminal of the tree
The key is picking the right production in step 1. That choice
should be guided by the input string
The current terminal being scanned in the input is frequently
referred to as the lookahead symbol.

CSC 3205: Compiler Design 17/63


Parse Tree Construction

Given a string for (; expr ; expr )other

CSC 3205: Compiler Design 18/63


Top-Down Parsing - Example

Expression Grammar And the input x – 2 * y

0 Goal → Expr
1 Expr → Expr + Term
2 | Expr - Term
3 | Term
4 Term → Term * Factor
5 | Term / Factor
6 | Factor
7 Factor → ( Expr )
8 | Number
9 | id

CSC 3205: Compiler Design 19/63


Recursive Descent Parsing

Is a top-down method of syntax analysis


Uses recursive procedures to model the parse tree to be
constructed
For each nonterminal in the grammar, a procedure, which
parses a nonterminal, is constructed.
Each of these procedures may read input, match terminal
symbols or call other procedures to read input and match
terminals in the right-hand side of a production
Recursively parses the input to make a parse tree, which may
or may not require back-tracking
is regarded recursive as it uses context-free grammar which is
recursive in nature

CSC 3205: Compiler Design 20/63


Backtracking

If one derivation of a production fails, the syntax analyzer


restarts the process using different rules of same production.
This technique may process the input string more than once
to determine the right production.
It would be better if we always knew the correct action to take
Backtracking is time consuming and therefore, inefficient.
Thus a special case predictive parsing was developed where no
backtracking is required

CSC 3205: Compiler Design 21/63


Left Recursion

A production is left recursive if the same nonterminal that


appears on the LHS appears first on the RHS of the
production.
Recursive descent parsers cannot deal with left recursion.
However, we can rewrite the grammar to represent the same
language without the need for left recursion.

CSC 3205: Compiler Design 22/63


Removing Left Recursion

In general, we can eliminate all immediate left recursion:


A → Ax|y
By changing the grammar to:
A → yA0
A0 → xA0 |
Not all left recursion is immediate may be hidden in multiple
production rules
A → BC |D
B → AE |F
There is a general approach for removing indirect left
recursion, but we’ll not worry about if for this course.

CSC 3205: Compiler Design 23/63


Removing Left Recursion

Given the grammar:


Fee → Fee α | β
Introduce a new nonterminal, Fee’, and transfer the recursion
onto Fee’.
Fee → β Fee’
Fee’ → α Fee’
Add the rule Fee’ →  where  represents the empty string
Final grammar:
Fee → β Fee’
Fee’ → α Fee’ | 

CSC 3205: Compiler Design 24/63


Left Factoring

another useful grammar transformation technique used in


parsing
It consists of ”factoring out” prefixes which are common to
two or more productions
For example: from A → αβ|αγ
to:
A → α A’
A’ → β|γ

CSC 3205: Compiler Design 25/63


Left Factoring

Given the grammar: A − − > ab1 |ab2 |ab3


for every production, there is a common prefix & if we choose
any production here, it is not confirmed that we will not need
to backtrack
It is non deterministic, because we cannot choose any
production and be assured that we will reach at our desired
string by making the correct parse tree
we rewrite the grammar in a way that is deterministic and also
leaves us to be flexible enough to make it any string that may
be possible without backtracking
Becomes: A − − > aA0
A0 − − > b1 |b2 |b3

CSC 3205: Compiler Design 26/63


Left Refactoring

Suppose a grammar, S− > abS|aSb


Rewrite the production to:
S− > aS 0
S 0 − > bS|Sb

CSC 3205: Compiler Design 27/63


Question

What is the difference between Left Factoring and Left Recursion?


What is the result of Left factoring the given grammar below?
S − > if E then S|if E then S else S|a
E −>b

CSC 3205: Compiler Design 28/63


Predictive Parsing

a simple form of recursive-descent parsing that does not


require any back-tracking
parser can “predict” which production to use
A predictive parser always knows which production to use to
avoid backtracking
the lookahead symbol unambiguously determines the flow of
control through the procedure body for each nonterminal.
The sequence of procedure calls during the analysis of an
input string implicitly defines a parse tree for the input, and
can be used to build an explicit parse tree, if desired

CSC 3205: Compiler Design 29/63


Predictive Parsing

Example: for the productions

stmt -> if ( expr ) stmt else stmt


| while ( expr ) stmt
| for ( stmt expr stmt ) stmt

A recursive descent parser would always know which production to


use, depending on the input token.

CSC 3205: Compiler Design 30/63


Predictive Parsing

If it picks the wrong production, a top-down parser may backtrack


Alternative is to look ahead in input & use context to pick correctly
Predictive parsers accept LL(k) grammars
L means “left-to-right” scan of input
L means “leftmost derivation”
k means “predict based on k tokens of lookahead”
In practice, LL(1) is used

CSC 3205: Compiler Design 31/63


LL(1) Grammars

A class of grammars that can be used to construct Predictive


parsers, that is, recursive-descent parsers needing no
backtracking
The first ”L” in LL(1) stands for scanning the input from left
to right
The second ”L” for producing a leftmost derivation
The ” 1 ” for using one input symbol of lookahead at each
step to make parsing action decisions
In recursive-descent, for each non-terminal and input token
there may be a choice of production.
LL(1) means that for each non-terminal and token there is
only one production.

CSC 3205: Compiler Design 32/63


Backtrack Free Grammars

Need to formalize the property that makes the right-recursive


expression grammar backtrack free.
At each point in the parse, the choice of an expansion is
obvious because each alternative for the leftmost nonterminal
leads to a distinct terminal symbol.
Comparing the next word in the input stream against those
choices reveals the correct expansion.
The intuition is clear, but formalizing it will require some
notation

CSC 3205: Compiler Design 33/63


FIRST and FOLLOW Sets

FIRST and FOLLOW computations for a grammar help to


construct ”predictive parsing tables”
”predictive parsing tables” make explicit the choice of
production during top-down parsing
The construction of both top-down and bottom-up parsers is
aided by two functions, FIRST and FOLLOW, associated with
a grammar G.
During topdown parsing, FIRST and FOLLOW allow us to
choose which production to apply, based on the next input
symbol
During panic-mode error recovery, sets of tokens produced by
FOLLOW can be used as synchronizing tokens
To build the parsing table, we need the notion of nullability
and the two functions: FIRST and FOLLOW

CSC 3205: Compiler Design 34/63


FIRST and FOLLOW Sets

Given a grammar G, we may define the functions FIRST and


FOLLOW on the strings of symbols of G.
For a grammar symbol α,
FIRST (α) is the set of terminal symbols that can appear as
the first word in some string derived from α
FOLLOW (α) is the set of all terminals that may follow α in a
derivation.

CSC 3205: Compiler Design 35/63


Nullability

A nonterminal A is nullable if A ⇒∗ 
Clearly, A is nullable if it has a production A → 
But A is also nullable if there are, for example, productions
A → BC
B → A | aC |
C → aB | Cb |

CSC 3205: Compiler Design 36/63


Nullability

In other words, A is nullable if there is a production A → ,


or there is a production A → B1 , B2 . . . Bn , where B1 , B2 , ...,
Bn are all nullable.

CSC 3205: Compiler Design 37/63


Nullability

In the grammar
E → TE 0
E 0 → +TE 0 |
T → FT 0
T 0 → ∗FT 0 |
F → (E )|id|num
E’ and T’ are nullable.
E, T, and F are not nullable

CSC 3205: Compiler Design 38/63


Nullability

Nonterminal Nullable?
E ??
E’ ??
T ??
T’ ??
F ??

CSC 3205: Compiler Design 39/63


FIRST Set

For a grammar symbol X, FIRST(X) is defined as follows:


For every terminal X, FIRST(X) = X.
For every nonterminal X, if X → Y1 , Y2 , . . . Yn is a
production, then FIRST(Y1 ) ⊆ FIRST(X).
Furthermore, if Y1 , Y2 , . . . , Yk are nullable, then FIRST(Yk+1 )
⊆ FIRST(X).

CSC 3205: Compiler Design 40/63


FIRST Set

We are concerned with FIRST(X) only for the nonterminals of


the grammar.
FIRST(X) for terminals is trivial.
According to the definition, to determine FIRST(A), we must
inspect all productions that have A on the left.

CSC 3205: Compiler Design 41/63


FIRST Set Example

Find FIRST(E) given the grammar below


E → TE 0
E 0 → +TE 0 |
T → FT 0
T 0 → ∗FT 0 |
F → (E )|id|num

CSC 3205: Compiler Design 42/63


FIRST Set Example

Find FIRST(E).
E occurs on the left in only one production
E → TE 0 .
Therefore, FIRST(T) ⊆ FIRST(E).
Furthermore, T is not nullable. Therefore, FIRST(E) =
FIRST(T).
We have yet to determine FIRST(T).

CSC 3205: Compiler Design 43/63


FIRST Set Example

Find FIRST(T).
T occurs on the left in only one production
T → FT 0
Therefore, FIRST(F) ⊆ FIRST(T).
Furthermore, F is not nullable. Therefore, FIRST(T) =
FIRST(F).
We have yet to determine FIRST(F).

CSC 3205: Compiler Design 44/63


FIRST Set Example

Find FIRST(F).
FIRST(F) = {(, id, num}.
Therefore, FIRST(E) = {(, id, num}.
FIRST(T) = {(, id, num}.

CSC 3205: Compiler Design 45/63


FIRST Set Example

Find FIRST(E’).
FIRST(E’) = {+}.
Find FIRST(T’).
FIRST(T’) = {*}.

CSC 3205: Compiler Design 46/63


Summary

Nonterminal Nullable FIRST


E No {(, id, num}
E’ Yes {+}
T No {(, id, num}
T’ Yes {*}
F No {(, id, num}

CSC 3205: Compiler Design 47/63


FOLLOW Set

For a grammar symbol X, FOLLOW(X) is defined as follows


If S is the start symbol, then $ ∈ FOLLOW(S).
If A → α B β is a production, then FIRST(β) ⊆ FOLLOW(B).
If A → α B is a production, or A→ α B β is a production and
β is nullable, then FOLLOW(A) ⊆ FOLLOW(B).

CSC 3205: Compiler Design 48/63


FOLLOW Set

We are concerned about FOLLOW(X) only for the


nonterminals of the grammar.
According to the definition, to determine FOLLOW(A), we
must inspect all productions that have A on the right.

CSC 3205: Compiler Design 49/63


FOLLOW Set Example

Let the grammar be


E → TE 0
E 0 → +TE 0 |
T → FT 0
T 0 → ∗FT 0 |
F → (E )|id|num

CSC 3205: Compiler Design 50/63


FOLLOW Set Example

Find FOLLOW(E)
E is the start symbol, therefore $ ⊆ FOLLOW(E).
E occurs on the right in only one production. F → (E).
Therefore FOLLOW(E) = {$, )}.

CSC 3205: Compiler Design 51/63


FOLLOW Set Example

Find FOLLOW(E’).
E’ occurs on the right in two productions.
E → T E’
E’ → + T E’
Therefore, FOLLOW(E’) = FOLLOW(E) = {$, )}.

CSC 3205: Compiler Design 52/63


FOLLOW Set Example

Find FOLLOW(T).
T occurs on the right in two productions.
E → T E’
E’ → + T E’
Therefore, FOLLOW(T) contains FIRST(E’) = {+}
However, E’ is nullable, therefore it also contains FOLLOW(E)
= {$, )} and FOLLOW(E’) = {$, )}
Therefore, FOLLOW(T) = {+, $, )}

CSC 3205: Compiler Design 53/63


FOLLOW Set Example

Find FOLLOW(T’).
T’ occurs on the right in two productions.
T → F T’
T’ →* F T’
Therefore, FOLLOW(T’) = FOLLOW(T) = {$, ), +}

CSC 3205: Compiler Design 54/63


FOLLOW Set Example

Find FOLLOW(F).
F occurs on the right in two productions.
T → F T’
T’ → * F T’
Therefore, FOLLOW(F) contains FIRST(T’) = {*}
However, T’ is nullable, therefore it also contains
FOLLOW(T) = {+, $, )} and FOLLOW(T’) = {$, ), +}
Therefore, FOLLOW(F) = {*, $, ), +}.

CSC 3205: Compiler Design 55/63


Summary

Nonterminal Nullable FIRST FOLLOW


E No {(, id, num} {$, )}
E’ Yes {+} {$, )}
T No {(, id, num} {$, ), +}
T’ Yes {*} {$, ), +}
F No {(, id, num} {*, $, ), +}

CSC 3205: Compiler Design 56/63


Example 2

Consider the grammar


Z → d|XYZ
Y → c|
X → Y |a
Find:
Nullable Set?
Nonnullable Set?
FIRST(X), FIRST(Y), FIRST(Z)?
FOLLOW(X), FOLLOW(Y), FOLLOW(Z)?

CSC 3205: Compiler Design 57/63


Example 2

Nullable Set: {X, Y}


Nonnullable Set: {Z}
FIRST(X) = {a, c}
FIRST(Y) = {c}
FIRST(Z) = {d, a, c}
FOLLOW(X) = {a, c, d}
FOLLOW(Y) = {a, c, d}
FOLLOW(Z) = {$}

CSC 3205: Compiler Design 58/63


Exercise

Consider the grammar


E → TX
X → +E |
T → intY |(E )
Y → ∗T |
Find:
Nullable Set?
Nonnullable Set?
FIRST(E), FIRST(X), FIRST(Y), FIRST(T)?
FOLLOW(E), FOLLOW(X), FOLLOW(Y), FOLLOW(T)?

CSC 3205: Compiler Design 59/63


Predictive Parsing Table

Grammar:
S → E$
E → TE 0
E 0 → |” + ”TE 0
T → FT 0
T 0 → |” ∗ ”FT 0
F → id|num|”(”E ”)”

CSC 3205: Compiler Design 60/63


Predictive Parsing Table

Nullable FIRST FOLLOW


S False ( id num
E False ( id num ) $
E’ True + ) $
T False ( id num ) +$
T’ True * ) +$
F False ( id num ) *+$

CSC 3205: Compiler Design 61/63


Predictive Parsing Table

Rows: Non-terminals
Columns: Terminals
Entries: Productions
Enter production X → α in row X, column t for each t in
FIRST(α). If α is nullable, enter the productions in row X, column
t for each t in FOLLOW(X).

+ * id num ( ) $
S S → E$ S → E$ S → E$
E E → TE’ E → TE’ E → TE’
E’ E’ → ”+” T E’ E’ →  E’ → 
T T → FT’ T → FT’ T → FT’
T’ T’ →  T’ → ”*” F T’ T’ →  T’ → 
F F → id F → num F → ”(” E ”)”

CSC 3205: Compiler Design 62/63


Next Topic

Read about:
- Error Handling
- Bottomup Parsing

CSC 3205: Compiler Design 63/63

You might also like