0% found this document useful (0 votes)
11 views26 pages

Lec 4

The document discusses the concepts of parsing and syntax analysis in programming languages, highlighting the role of parsers in recognizing language structures and constructing parse trees. It addresses various types of errors in programs, including lexical, syntactic, semantic, and logical errors, and emphasizes the importance of effective error detection and recovery strategies. Additionally, it introduces context-free grammars (CFG) and the challenges of ambiguous grammars in programming language parsing.

Uploaded by

desk.nishat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views26 pages

Lec 4

The document discusses the concepts of parsing and syntax analysis in programming languages, highlighting the role of parsers in recognizing language structures and constructing parse trees. It addresses various types of errors in programs, including lexical, syntactic, semantic, and logical errors, and emphasizes the importance of effective error detection and recovery strategies. Additionally, it introduces context-free grammars (CFG) and the challenges of ambiguous grammars in programming language parsing.

Uploaded by

desk.nishat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Lecture 04

Parsin
g
Part
I
Language and
Grammars
• Every (programming) language has
precise rules
– In English:
• Subject Verb Object
– In C
• programs are made of functions
» Functions are made of statements etc.
Parsing

• A.K.A. Syntax Analysis


– Recognize sentences in a language.
– Discover the structure of a
document/program.
– Construct (implicitly or explicitly) a tree
(called as a parse tree) to represent the
structure.
– The above tree is used later to guide
translation.
Role of the
parser

tokens Intermediate
Scanner Parser representation
Source Parse Rest of
(lexical (syntax
program tree front end
analysis) Get next analysis)
tokens

Symbol
Table

• Verifies if the string of token can be generated from the grammar


• Error?
• Report with a good descriptive, helpful message
• Recover and continue parsing!
• Build a parse tree !!
Rest of Front End

• Collecting token
information
• Type checking
• Intermediate
code
generation
Errors in
Programs
• Lexical
if x<1 thenn y = 5:
“Typos”
• Syntactic
if ((x<1) & (y>5))) ...
{ ... { ... ... }
• Semantic
if (x+5) then ...
Type Errors
Undefined IDs,
etc.
• Logical Errors
if (i<9) then ...
Should be <=
not <
Bugs
Compiler cannot
Error
Detection
• Much responsibility on Parser
– Many errors are syntactic in nature
– Precision/ efficiency of modern parsing method
– Detect the error as soon as possible

• Challenges for error handler in Parser


– Report error clearly and accurately
– Recover from error and continue..
– Should be efficient in processing

• Good news is
– Simple mechanism can catch most common errors

• Errors don’t occur that frequently!!


• 60% programs are syntactically and semantically
correct
• 80% erroneous statements have only 1 error, 13%
have 2
• Most error are trivial : 90% single token error
• 60% punctuation, 20% operator, 15% keyword, 5%
Adequate Error Reporting is Not a Trivial
Task
• Difficult to generate clear and accurate error
messages.
Example
function foo () {
...
if (...) {
...
} else {
...
... Missing } here
}
<eof> Not detected until here
Example
int myVarr;
...
x = myVar; Misspelled ID here
...
Not detected until here
Error
Recovery
• After first error recovered
– Compiler must go on!
• Restore to some state and process the rest of
the input

• Error-Correcting Compilers
– Issue an error message
– Fix the problem
– Produce an executable

Error on line 23: “myVarr” undefined.


Example
“myVar” was used.

May not be a good Idea!!


– Guessing the programmers intention is not
easy!
Error Recovery May Trigger More
Errors!
• Inadequate recovery may introduce more
errors
– Those were not programmers errors
• Example:
int myVar flag ;
...
Declaration of flag is discarded
x := flag;
...
... Variable flag is undefined
while (flag==0)
...
Variable falg is undefined

Too many Error message may be obscuring


– May bury the real message
– Remedy:
• allow 1 message per token or per
statement
• Quit after a maximum (e.g. 100) number of
errors
Error Recovery Approaches: Panic Mode

• Discard tokens until we see a “synchronizing”


token.
Example

Skip to next occurrence of


} end ;
Resume by parsing the next
statement
• The key...
– Good set of synchronizing tokens
– Knowing what to do then
• Advantage
– Simple to implement
– Does not go into infinite loop
– Commonly used
• Disadvantage
– May skip over large sections of source with some
errors
Error Recovery Approaches: Phrase-Level Recovery
• Compiler corrects the program
by deleting or inserting tokens
...so it can proceed to parse from where
it was.
Example

while (x==4) y:= a +

b Insert do to fix the

• The statement
key...
Don’t get into an infinite loop
...constantly inserting tokens and never scanning the
actual source
• Generally used for error-repairing compilers
– Difficulty: Point of error detection might be much later
the point of error occurrence
Error Recovery Approaches: Error Productions

• Augment the CFG with “Error


Productions”
• Now the CFG accepts anything!
• If “error productions” are
used... Their actions:
{ print (“Error...”) }

• Used with...
– LR (Bottom-up) parsing
– Parser Generators
Error Recovery Approaches: Global Correction

• Theoretical Approach
• Find the minimum change to the source to
yield a valid program
– Insert tokens, delete tokens, swap adjacent
tokens
• Global Correction Algorithm
Input: grammatically incorrect input string x;
grammar G Output: grammatically correct string
y
Algorithm: converts x  y using minimum
number changes (insertion, deletion etc.)
• Impractical algorithms - too time
consuming
Context Free Grammars (CFG)

• A context free grammar is a formal model that


consists of:
• Terminals
Keywords
Token Classes
Punctuation
• Non-
terminals
Any symbol
appearing
on the
lefthand
side of any
rule
• Start Symbol
Usually the
non-terminal
Rule Alternative
Notations
Notational Conventions
Derivations
Leftmost
Derivation
Rightmost
Derivation
Parse
Tree
Parse
Tree
Parse
Tree
Parse
Tree
Ambiguous
Grammar
Ambiguous
Grammar
• More than one Parse Tree for some
sentence.
– The grammar for a programming language
may be ambiguous
– Need to modify it for parsing.

• Also: Grammar may be left


recursive.
• Need to modify it for parsing.

You might also like