Compiler Lab Manual
Compiler Lab Manual
CDGI'S
CHAMELIDEVI SCHOOL OF ENGINEERING, INDORE.
DEPARTMENT OF COMPUTER SCIENCE
COURSE FILE CONTENT
Year
2013-14
Class/Sem
Sem-VII
CS-A &B
Branch
CSE
Subject
Compiler
Design
Faculty Name
Mr. Ajay Jaiswal
Content
1.
2.
3.
4.
5.
6.
List of Practicals
7.
LINUX O/S
8.
C++ / JAVA program backup
Page 1
2013-2014)
Name: ________________________________________
Roll No.: _______________________________________
Branch: _______________________________________
Semester:_______________________________________
Section: ________________________________________
Subject: _______________________________________
Certified by:
Total Practical :
Practicals performed:
Faculty Name/Signature
Page 2
Name of Practical
Implement a Program to count character of a given string without using
space & with using space for the string a handle of a string is a substring
that matches the right side of a production rule. .
2.
Create a file (Comiler.cc) & Implement a Program to read all the content of a
Compiler.cc (how many lines, how many words and how many character in
the file) .
3.
Write a program for implementation of Deterministic Finite Automata (DFA)
for the strings accepted by (abbb, abb, ab,a).
4.
Construction of Minimization of Deterministic Finite Automata for the given
diagram & recognize the string (aa + b)*ab(bb)*.
5.
Construct a program for how to Compute FIRST () & FOLLOW () symbol for
LL(1 ) grammar, if the Context free grammar for LL(1) Construction is..?
6.
Construct a Operator Precedence Parser for the following given grammar
and also compute Leading () and trailing () symbols of the given grammar.
7.
Program using LEX to count the number of characters, words, spaces and
lines in a given input file.
8.
Program using LEX to count the numbers of comment lines in a given C
program. Also eliminate them and copy the resulting program into separate
file.
Page 3
Page 4
Practical List
S.No.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
Practical
Practical 1
Practical 2
Practical 3
Practical 4
Practical 5
Practical 6
Practical 7
Practical 8
Practical 9
Practical 10
Practical 11
Practical 12
Practical 13
Practical 14
Practical 15
Practical 16
Practical 17
Practical 18
Practical 19
Practical 20
Practical 21
Practical 22
Practical 23
Practical 24
Practical 25
Head of Department
Faculty
Page 5
LAB MANUAL
Page 6
Course scope
Aim:
To learn techniques of a modern compiler
Main reference:
Compilers Principles, Techniques and Tools, Second Edition by Alfred V. Aho, Ravi Sethi,
Jeffery D. Ullman
Supplementary references:
Modern compiler construction in Java 2
Implementation by Muchnick.
nd
Subjects
Lexical analysis (Scanning)
Syntax Analysis (Parsing)
Syntax Directed Translation
Intermediate Code Generation
Run-time environments
Code Generation
Machine Independent Optimization
Compiler learning
Page 7
Terminology
Compiler:
a program that translates an executable program in one language into an executable program
in another language. we expect the program produced by the compiler to be better, in some
way, than the original
Interpreter:
a program that reads an executable program and produces the results of running that
program. usually, this involves executing the source program in some fashion. Our course is
mainly about compilers but many of the same issues arise in interpreters.
Disciplines involved
Algorithms
Operating systems
Computer architectures
Page 8
Abstract view
Compilers translate from a source language (typically a high level language) to a functionally
equivalent target language (typically the machine code of a particular machine or a machineindependent virtual machine).
Compilers for high level programming languages are among the larger and more complex
pieces of software
Current focus is on optimization and smart use of resources for modern RISC (reduced
instruction set computer) architectures.
Source
code
Compiler
errors
Mach ine
code
Page 9
1.
Lexical Analysis
Review of lexical analysis: alphabet, token, lexical error, Block schematic of lexical
analyser, Automatic construction of lexical analyser (LEX), LEX specification details.
2. Syntax Analysis
Introduction: Role of parsers, Parsing technique: Top down-RD parser, Predictive LL
(k) parser, Bottom up-shift-Reduce, SLR, LR(k), LALR etc. using ambiguous grammars,
Error detection and recovery, Automatic construction of parser (YACC), YACC
specifications.
semantic analysis
Need of semantic analysis, type checking and type conversation.
3. Syntax directed translation
Syntax directed definitions, construction of syntax trees, bottom-up evaluation of Sattribute definition, L-attributed definition , Top-down translation, Bottom-up evaluation of
inherited attributes.
Intermediate code Generation: Intermediate code generation for declaration,
assignment, iterative statements, case statements, arrays, structures, conditional
statements, Boolean expressions, procedure calls, Intermediate code Generation using
YACC
4. Run Time Storage Organisation
Storage allocation strategies, static, dynamic storage allocation, allocation strategies for
block structured and non-block structured languages; O.S. support required for IO
statements. (e.g. printf, scanf) and memory allocation deallocation related statement.
(e.g. new, malloc)
Page 10
Definition
A compiler is a computer program (or set of programs) that transforms source code written in
a programming language (the source language) into another computer language (the target
language, often having a binary form known as object code).
Interpreters
Page 11
Grouping of phases
Incremental compiler
The term incremental compiler may refer to two different types of compiler.
Imperative programming
Interactive Programming
In imperative programming and software development, an incremental compiler is one that
when invoked, takes only the changes of a known set of source files and updates any
corresponding output files (in the compiler's target language, often bytecode) that may
already exist from previous compilations. By effectively building upon previously compiled
output files, the incremental compiler avoids the wasteful recompilation entire source files,
Lab Manual of Compiler Design
Page 12
Cross compiler
A cross compiler is a compiler capable of creating executable code for a platform other than
the one on which the compiler is run.
Cross compiler tools are used to generate executables for embedded system or multiple
platforms.
It is used to compile for a platform upon which it is not feasible to do the compiling, like micro
controllers that don't support an operating system.
Phases of a Compiler
Source Program
1
3
Symbol-table
Manager
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Error Handler
Intermediate Code
Generator
Code Optimizer
Code Generator
Target Program
Page 13
Program #01
1
ident
"val"
3
assign
-
2
number
10
4
times
-
1
ident
"val"
5
plus
-
1
ident
"i"
token number
token value
Statement
syntax tree
Expression
Term
ident = number * ident + ident
Lexical Analysis
Stream of characters is grouped into tokens
Examples of tokens are identifiers, reserved words, integers, doubles or floats, delimiters,
operators and special symbols
int a; a = a + 2;
int
reserved word
a
identifier
;
special symbol
a
identifier
=
operator
a
identifier
+
operator
2
integer constant
;
special symbol
Examples of Token
Token: A sequence of characters to be treated as a
single unit.
Examples of tokens.
Reserved words (e.g. begin, end, struct, if etc.)
Keywords (integer, true etc.)
Operators (+, &&, ++ etc)
Identifiers (variable names, procedure names, parameter names)
Literal constants (numeric, string, character constants etc.)
Punctuation marks (:, , etc.)
Lab Manual of Compiler Design
Page 14
Input:
Type any strings with combinations of letters
Output:
Total No. of letters/ characters of given string.
Page 15
Page 16
Page 17
Input:
Type any strings with combinations of letters / characters of sentence..
Output:
Total No. of letters/ characters of given string without spaces (Excluding white spaces.).
Page 18
Program #02
/* Create a file (Comiler.cc) & Implement a Program to read all the content
of a Compiler.cc (how many lines, how many words and how many
character in the file) .*/
Source Code:
#include<stdio.h>
int main()
{
int noc=0,now=0,nol=0;
FILE *fw,*fr;
char fname[20],ch;
printf("\n enter the source file name Comiler.cc");
gets(fname);
fr=fopen(fname,"r");
if(fr==NULL)
{
printf("\n error \n");
exit(0);
}
ch=fgetc(fr);
while(ch!=EOF)
{
noc++;
if(ch==' ');
now++;
if(ch=='\n')
{
nol++;
now++;
}
ch=fgetc(fr);
}
fclose(fr);
printf("\n total no of character=%d",noc);
printf("\n total no of words=%d",now);
printf("\n total no of lines=%d",nol);
return 0;
}
Page 19
char ch;
fstream fileName;
Page 20
characters ++;
fileName.clear();
fileName.close();
Page 21
Output:
Sequences of transition states with accepting states.
Page 22
Program #03
/*Write a program for implementation of Deterministic Finite Automata
(DFA) for the strings accepted by (abbb, abb, ab,a).*/
Deterministic finite automata (DFA) :
A deterministic finite automaton (DFA) is a 5-tuple: (S, , T, s, A)
an alphabet ()
a set of states (S)
a transition function (T : S S).
a start state
a set of accept states
The machine starts in the start state and reads in a string of symbols from its alphabet. It
uses the transition function T to determine the next state using the current state and the
symbol just read. If, when it has finished reading, it is in an accepting state, it is said to accept
the string, otherwise it is said to reject the string. The set of strings it accepts form a
language, which is the language the DFA recognizes.
Non-Deterministic Finite Automaton (N-DFA):
A Non-Deterministic Finite Automaton (NFA) is a 5-tuple: (S, , T, s, A)
an alphabet ()
a set of states (S)
a transition function (T: S S).
a start state
a set of accept states
Where P(S) is the power set of S and is the empty string. The machine starts in the start
state and reads in a string of symbols from its alphabet. It uses the transition relation T to
determine the next state(s) using the current state and the symbol just read or the empty
string. If, when it has finished reading, it is in an accepting state, it is said to accept the string,
otherwise it is said to reject the string. The set of strings it accepts form a language, which is
Lab Manual of Compiler Design
Page 23
Source Code:
#include<stdio.h>
#include<iostream.h>
#include<stdlib.h>
#include<conio.h>
void main()
{int n,m,start,nf,ps;
char str[20];
clrscr();
printf("enter no of states");
scanf("%d",&n);
printf("enter no of inputs");
scanf("%d",&m);
//constructing buffers
int **tran=new int* [n];
for(int i=0;i<n;i++)
{
tran[i]=new int[m];
}
for(i=0;i<n;i++)
{
for(int j=0;j<m;j++)
{
printf("enter next state for present state %d on input%d ",i,j);
scanf("%d",&tran[i][j]);
}
}
printf("enter starting state");
scanf("%d",&start);
printf("enter no of final states");
scanf("%d",&nf);
int* final=new int[nf];
for(i=0;i<nf;i++)
{
printf("enter the state");
scanf("%d",&final[i]);
}
printf("enter string");
scanf("%s",str);
i=0;
ps=start;
Lab Manual of Compiler Design
Page 24
Page 25
Program #04
/*Construction of minimization of Deterministic Finite Automata
for the given diagram & recognize the string (aa + b)*ab(bb)*. */
Minimizing Finite Automata
Consider the finite automaton shown in figure 1 which accepts the regular set denoted by the
regular expression (aa + b)*ab(bb)*. Accepting states are colored yellow while rejecting states
are blue.
Page 26
s0
s1
s3
s4
s5
s6
a leads to:
b leads to:
Looking at the table we find that the input b helps us distinguish between two of the states (s1
and s6) and the rest of the states in the group since it leads to group A for these two instead
of group B. Thus the states in the set {s0, s3, s4, s5} cannot be equivalent to those in the set
{s1, s6} and we must partition B into two groups. Now we have the groups:
A = {s2, s7}, B = { s0, s3, s4, s5}, C = { s1, s6}
and the next examination of where the inputs lead shows us that s3 is not equivalent to the
rest of group B. We must partition again.
Page 27
The complexity of this algorithm is O(n2) since we check all of the states each time we
execute the repeat loop and might have to execute the loop n times since it might take an
input of length n to distinguish between two states. A faster algorithm was later developed by
Hopcroft.
Lab Manual of Compiler Design
Page 28
Source Code:
#include<stdio.h>
#include<iostream.h>
#include<string.h>
#include<stdlib.h>
#include<conio.h>
void main()
{
int nstates,minputs,start,nf,ps;
char str[20];
clrscr();
printf("enter no of states");
scanf("%d",&nstates);
printf("enter no of inputs");
scanf("%d",&minputs);
//constructing buffers
int **tran=new int* [nstates];
for(int i=0;i<nstates;i++)
{
tran[i]=new int[minputs];
}
for(i=0;i<nstates;i++)
{
for(int j=0;j<minputs;j++)
{
printf("enter next state for present state %d on input%d ",i,j);
scanf("%d",&tran[i][j]);
}
}
printf("enter starting state");
scanf("%d",&start);
printf("enter no of final states");
scanf("%d",&nf);
int* final=new int[nf];
for(i=0;i<nf;i++)
{
printf("enter the state");
scanf("%d",&final[i]);
}
int *stategroup=new int[nstates],**groupgroup=new int*[nstates];
memset(stategroup,-1,nstates*sizeof(int));
int **groupstate=new int*[nstates];
for(i=0;i<nstates;i++)
{
groupstate[i]=new int[nstates+1];
groupgroup[i]=new int[2];
Lab Manual of Compiler Design
Page 29
Page 30
printf("\n\nGroups\n\n");
for(i=0;i<groupcount;i++)
{
printf("%d ",i);
for(int j=0;j<nstates;j++)
Lab Manual of Compiler Design
Page 31
Page 32
INPUT:
Recognizer for (aa + b)*ab(bb)*
OUTPUT:
A Minimal Automaton for (aa + b)*ab(bb)*
Page 33
Program #05
/*Construct a program for how to calculate FIRST () & FOLLOW () symbol
for LL(1 ) grammar, if the Context free grammar for LL(1) Construction is
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
Compute FIRST () & FOLLOW ( ) symbol for LL(1 ) grammar ? */
The construction of a predictive parser is aided by two functions associated with a grammar
G. These functions, FIRST and FOLLOW, allow us to fill in the entries of a predictive parsing
table for G, whenever possible. Sets of tokens yielded by the FOLLOW function can also be
used as synchronizing tokens during panic-mode error recovery.just suppose for a sec that
you r ll(1) parser and you have n supernatural power of seeing the future of string by one
step.
FIRST()
If is any string of grammar symbols, let FIRST() be the set of terminals that begin the
strings derived from . If then is also in FIRST().
To compute FIRST(X) for all grammar symbols X, apply the following rules until no more
terminals or can be added to any FIRST set:
1. If X is terminal, then FIRST(X) is {X}.
2.If X is a production, then add to FIRST(X).
3.If X is nonterminal and X Y1 Y2 Yk . is a production, then place a in FIRST(X) if for
some i, a is in FIRST(Yi), and is in all of FIRST(Y1), , FIRST(Yi-1); that is, Y1, ,Yi-1
. If is in FIRST(Yj) for all j = 1, 2, , k, then add to FIRST(X). For example, everything in
FIRST(Y1) is surely in
FIRST(X). If Y1 does not derive , then we add nothing more to FIRST(X), but if Y1 , then
we add FIRST(Y2) and so on.
Now, we can compute FIRST for any string X1X2 . . . Xn as follows. Add to FIRST(X1X2
Xn) all the non- symbols of FIRST(X1). Also add the non- symbols of FIRST(X2) if is in
FIRST(X1), the non- symbols of FIRST(X 3) if is in both FIRST(X 1) and FIRST(X2), and so
on. Finally, add to FIRST(X1X2 Xn) if, for all i, FIRST(X i) contains .
Page 34
FOLLOW(A)
Define FOLLOW(A), for nonterminal A, to be the set of terminals a that can appear
immediately to the right of A in some sentential form, that is, the set of terminals a such that
there exists a derivation of the form Sa for some and . Note that there may, at some
time during the derivation, have been symbols between A and a, but if so, they derived and
disappeared. If A can be the rightmost symbol in some sentential form, then $, representing
the input right endmarker, is in FOLLOW(A).
To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be
added to any
FOLLOW set:
1.Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker.
2.If there is a production A , then everything in FIRST(), except for , is placed in
FOLLOW(B).
3.If there is a production A , or a production A where FIRST() contains (i.e.,
),
then everything in FOLLOW(A) is in FOLLOW(B).
EXAMPLE:
Consider the expression grammar :
E T E
E + T E |
T F T
T * F T |
F ( E ) | id
Then:
FIRST(E) = FIRST(T) = FIRST(F) = {( , id}
FIRST(E) = {+, }
FIRST(T) = {*, }
FOLLOW(E) = FOLLOW(E) = {) , $}
Lab Manual of Compiler Design
Page 35
Algorithm:
FIRST :
1. If first character of production is terminal then becomes first.
eg. first(abAb)={a};
2.If a production of this type A->BCD... means all are variable or non-terminal then
if first(B) donot contains null then first(A)=first(B)
stop here.
else
then also check for next non terminal like C here same as above step and
first(A)=First(B)+first(C).
if we get null stop there.
FOLLOW:
if you know first then you can easily go with follow.
1.if a variable is start symbol then follow=$.
2.if a production is of A->(any string1)B(any string2) then follow(B)=first(any string2) {null}
3.if a production is of A->(any string)B then follow(B)=follow(A) .
stop
it is very simple, please try to understand!
Page 36
Source Code:
#include"stdio.h"
#include<conio.h>
char array[10][20],temp[10];
int c,n;void fun(int,int[]);
int fun2(int i,int j,int p[],int key)
{
int k;
if(!key)
{
for(k=0;kc)return 1;
else return 0;
}
}
void fun(int i,int p[])
{
int j,k,key;
for(j=2;array[i][j]!='';j++)
{
if(array[i][j-1]=='/')
{
if(array[i][j]>='A'&&array[i][j]<='Z')
{
key=0;
fun2(i,j,p,key);
}
else
{
key=1;
if(fun2(i,j,p,key))
temp[++c]=array[i][j];
if(array[i][j]=='@'&&p[0]!=-1)
{ //taking ,@, as null symbol.
if(array[p[0]][p[1]]>='A'&&array[p[0]][p[1]]<='Z')
{
key=0;
fun2(p[0],p[1],p,key);
}
else
if(array[p[0]][p[1]]!='/'&&array[p[0]][p[1]]!='')
{
if(fun2(p[0],p[1],p,key))
temp[++c]=array[p[0]][p[1]];
}
}
}
}
}
}
Lab Manual of Compiler Design
Page 37
INPUT:
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
OUTPUT:
Enter the no. of productions :6
Enter the productions :
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
First(S) : [ a ].
First(B) : [ c ].
First(C) : [ b,@ ].
First(D) : [ g,@,f ].
First(E) : [ g,@ ].
First(F) : [ f,@ ].
Page 38
Program #06
/*Construct a Operator Precedence Parser for the following given
grammar and also compute Leading () and trailing () symbols of
the given grammar. */
Operator-Precedence Parser
Operator grammar
small, but an important class of grammars we may have an efficient operator precedence
parser (a shift-reduce parser) for an operator grammar.
In an operator grammar, no production rule can have:
-> at the right side
-> two adjacent non-terminals at the right side.
Precedence Relations
Page 39
Page 40
input
id+id*id$
$id
$
$+
+id*id$
+id*id$
id*id$
$+id
$+
$+*
*id$
*id$
id$
$+*id
$+*
$+
$
$
$
action
.
$ < id
shift
.
id > + reduce
shift
shift
.
id > * reduce
shift
shift
.
id > $ reduce
.
* > $ reduce
.
+ > $ reduce
accept
E id
E id
E id
E E*E
E E+E
Page 41
Page 42
Page 43
Page 44
Page 45
Page 46
Page 47
Program #07
Program using LEX to count the number of characters, words, spaces and
Lines in a given input file.
Lexical Analyzer
The main task of the lexical analyzer is to read the input source program, scanning the
characters, and produce a sequence of tokens that the parser can use for syntactic analysis.
The interface may be to be called by the parser to produce one token at a time Maintain
internal state of reading the input program (with lines) Have a function getNextToken that
will read some characters at the current state of the input and return a token to the parser
Other tasks of the lexical analyzer include Skipping or hiding whitespace and comments
Keeping track of line numbers for error reporting Sometimes it can also produce the
annotated lines for error reports Produce the value of the token Optional: Insert identifiers into
the symbol table
Page 48
Source Code:
%{
int ch=0, bl=0, ln=0, wr=0;
%}
%%
[\n] {ln++;wr++;}
[\t] {bl++;wr++;}
[" "] {bl++;wr++;}
[^\n\t] {ch++;}
%%
int main()
{
FILE *fp;
char file[10];
printf("Enter the filename: ");
scanf("%s", file);
yyin=fp;
yylex();
printf("Character=%d\nBlank=%d\nLines=%d\nWords=%d", ch, bl, ln, wr);
return 0;
}
Page 49
INPUT:
A input file (.doc or any format), counts number of characters, words, spaces and Lines in a
given input file.
OUTPUT:
$cat > input
Girish rao salanke
$lex p1a.l
$cc lex.yy.c ll
$./a.out
Enter the filename: input
Character=16
Blank=2
Lines=1
Word=3
Page 50
Program #08
Program using LEX to count the numbers of comment lines in a given C/
C++/JAVA program. Also eliminate them and copy the resulting program
into separate file.
Compiler-construction tools
Originally, compilers were written from scratch, but now the situation is quite different. A
number of tools are available to ease the burden.
We will study tools that generate scanners and parsers. This will involve us in some theory,
regular expressions for scanners and various grammars for parsers. These techniques are
fairly successful. One drawback can be that they do not execute as fast as hand-crafted
scanners and parsers.
We will also see tools for syntax-directed translation and automatic code generation. The
automation in these cases is not as complete.
Finally, there is the large area of optimization. This is not automated; however, a basic
component of optimization is data-flow analysis (how values are transmitted between parts
of a program) and there are tools to help with this task.
but not
x 3 = y + 3;
Page 51
Page 52
Source Code:
%{
int com=0;
%}
%%
"/*"[^\n]+"*/" {com++;fprintf(yyout, " ");}
%%
int main()
{
printf("Write a C program\n");
yyout=fopen("output", "w");
yylex();
printf("Comment=%d\n",com);
return 0;
}
Page 53
Page 54
Program #09
Program using LEX to recognize a valid arithmetic expression and to
recognize the identifiers and operators present. Print them separately.
Some Regular Expressions for Flex
\"[^"]*\"
string
"\t"|"\n"\" "
[a-zA-Z]
[a-zA-Z_][a-zA-Z0-9_]* identifier: allows a, aX, a45__
[0-9]*"."[0-9]+
[0-9]+"."[0-9]*
[0-9]*"."[0-9]*
allows . by itself !!
The user must supply a lexical analyzer to read the input stream and communicate tokens
(with values, if desired) to the parser. The lexical analyzer is an integer-valued function called
yylex. The function returns an integer, the token number, representing the kind of token read.
If there is a value associated with that token, it should be assigned to the external variable
yylval.
The parser and the lexical analyzer must agree on these token numbers in order for
communication between them to take place. The numbers may be chosen by Yacc, or chosen
by the user. In either case, the ``# define'' mechanism of C is used to allow the lexical
analyzer to return these numbers symbolically. For example, suppose that the token name
DIGIT has been defined in the declarations section of the Yacc specification file. The relevant
portion of the lexical analyzer might look like:
yylex(){
extern int yylval;
int c;
...
c = getchar();
...
switch( c ) {
Lab Manual of Compiler Design
Page 55
Page 56
Source Code:
%{
#include<stdio.h>
int a=0,s=0,m=0,d=0,ob=0,cb=0;
int flaga=0, flags=0, flagm=0, flagd=0;
%}
id [a-zA-Z]+
%%
{id} {printf("\n %s is an identifier\n",yytext);}
[+] {a++;flaga=1;}
[-] {s++;flags=1;}
[*] {m++;flagm=1;}
[/] {d++;flagd=1;}
[(] {ob++;}
[)] {cb++;}
%%
int main()
{
printf("Enter the expression\n");
yylex();
if(ob-cb==0)
{
printf("Valid expression\n");
}
else
{
printf("Invalid expression");
}
printf("\nAdd=%d\nSub=%d\nMul=%d\nDiv=%d\n",a,s,m,d);
printf("Operators are: \n");
Lab Manual of Compiler Design
Page 57
OUTPUT:
$lex p2a.l
$cc lex.yy.c ll
$./a.out
Enter the expression
(a+b*c)
a is an identifier
b is an identifier
c is an identifier
[Ctrl-d]
Valid expression
Add=1
Sub=0
Mul=1
Div=0
Operators are:
+
*
Page 58
Program #13
Program using LEX to recognize whether a given sentence is simple or
compound.
%{
int flag=0;
%}
%%
(""[aA][nN][dD]"")|(""[oO][rR]"")|(""[bB][uU][tT]"") {flag=1;}
%%
int main()
{
printf("Enter the sentence\n");
yylex();
if(flag==1)
printf("\nCompound sentence\n");
else
printf("\nSimple sentence\n");
return 0;
}
Page 59
OUTPUT:
$lex p2b.l
$cc lex.yy.c ll
$./a.out
Enter the sentence
I am Pooja
I am Pooja
[Ctrl-d]
Simple sentence
$./a.out
Enter the sentence
CSE or ISE
CSE or ISE
[Ctrl-d]
Compound sentence
Page 60
Program #14
Program using LEX to recognize and count the number of identifiers in a
given input file.
Lex helps write programs whose control flow is directed by instances of regular expressions in
the input stream. It is well suited for editor-script type transformations and for segmenting
input in preparation for a parsing routine.
Lex source is a table of regular expressions and corresponding program fragments. The table
is translated to a program which reads an input stream, copying it to an output stream and
partitioning the input into strings which match the given expressions. As each such string is
recognized the corresponding program fragment is executed. The recognition of the
expressions is performed by a deterministic finite automaton generated by Lex. The program
fragments written by the user are executed in the order in which the corresponding regular
expressions occur in the input stream.
Source Code:
%{
#include<stdio.h>
int count=0;
%}
op [+-*/]
letter [a-zA-Z]
digitt [0-9]
id {letter}*|({letter}{digitt})+
notid ({digitt}{letter})+
%%
[\t\n]+
("int")|("float")|("char")|("case")|("default")| ("if")|("for")|("printf")|("scanf") {printf("%s is a
keyword\n", yytext);}
{id} {printf("%s is an identifier\n", yytext); count++;}
{notid} {printf("%s is not an identifier\n", yytext);}
%%
int main()
{
FILE *fp;
char file[10];
printf("\nEnter the filename: ");
Lab Manual of Compiler Design
Page 61
Page 62
Program #15
YACC
Basic Specifications
Names refer to either tokens or nonterminal symbols. Yacc requires token names to be
declared as such. In addition, for reasons discussed in Section 3, it is often desirable to
include the lexical analyzer as part of the specification file; it may be useful to include other
programs as well. Thus, every specification file consists of three sections: the declarations,
(grammar) rules, and programs. The sections are separated by double percent ``%%'' marks.
(The percent ``%'' is generally used in Yacc specifications as an escape character.)
In other words, a full specification file looks like
declarations
%%
rules
%%
programs
The declaration section may be empty. Moreover, if the programs section is omitted, the
second %% mark may be omitted also;
thus, the smallest legal Yacc specification is
%%
rules
Blanks, tabs, and newlines are ignored except that they may not appear in names or multicharacter reserved symbols. Comments may appear wherever a name is legal; they are
enclosed in /* . . . */, as in C and PL/I.
The rules section is made up of one or more grammar rules. A grammar rule has the form:
A : BODY ;
A represents a nonterminal name, and BODY represents a sequence of zero or more names
and literals. The colon and the semicolon are Yacc punctuation.
Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``_'',
and non-initial digits. Upper and lower case letters are distinct. The names used in the body of
a grammar rule may represent tokens or nonterminal symbols.
Page 63
:
:
:
B C D ;
E F ;
G ;
:
|
|
;
B C D
E F
G
It is not necessary that all grammar rules with the same left side appear together in the
grammar rules section, although it makes the input much more readable, and easier to
change.
If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
empty : ;
Names representing tokens must be declared; this is most simply done by writing
%token name1 name2 . . .
in the declarations section. (See Sections 3 , 5, and 6 for much more discussion). Every name
not defined in the declarations section is assumed to represent a nonterminal symbol. Every
nonterminal symbol must appear on the left side of at least one rule.
Of all the nonterminal symbols, one, called the start symbol, has particular importance. The
parser is designed to recognize the start symbol; thus, this symbol represents the largest,
most general structure described by the grammar rules. By default, the start symbol is taken
to be the left hand side of the first grammar rule in the rules section. It is possible, and in fact
Lab Manual of Compiler Design
Page 64
'(' B ')'
{
hello( 1, "abc" ); }
and
XXX
YYY ZZZ
{
printf("a message\n");
flag = 25; }
Page 65
Page 66
OUTPUT:
$lex p4a.l
$yacc d p4a.y
$cc lex.yy.c y.tab.c ll
$./a.out
Enter the expression
(a*b+5)
Expression is valid
$./a.out
Enter the expression
(a+6-)
Expression is invalid
Page 67
Program #15
YACC (Yet Another Compiler Compiler ) program to recognize a valid
variable, which starts with a letter, followed by any number of letters or
digits.
Yacc turns the specification file into a C program, which parses the input according to
the specification given. The algorithm used to go from the specification to the parser is
complex, and will not be discussed here (see the references for more information). The
parser itself, however, is relatively simple, and understanding how it works, while not
strictly necessary, will nevertheless make treatment of error recovery and ambiguities
much more comprehensible.
Source Code:
LEX
%{
#include"y.tab.h"
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext); return DIGIT;}
[a-zA-Z]+ {return LETTER;}
[\t] ;
\n return 0;
. {return yytext[0];}
%%
YACC
%{
#include<stdio.h>
%}
%token LETTER DIGIT
%%
variable: LETTER|LETTER rest
;
rest: LETTER rest
|DIGIT rest
|LETTER
|DIGIT
;
%%
Lab Manual of Compiler Design
Page 68
OUTPUT:
$lex p4b.l
$yacc d p4b.y
$cc lex.yy.c y.tab.c ll
$./a.out
input34
The string is a valid variable
$./a.out
89file
This is not a valid variable
Page 69
Program #16
Implement a program of YACC (Yet Another Compiler Compiler ) to recognize
strings aaab, abbb, ab and a using the grammar (anbn, n>= 0).
Yacc: Yet Another Compiler-Compiler
Yacc provides a general tool for imposing structure on the input to a computer program. The
Yacc user prepares a specification of the input process; this includes rules describing the
input structure, code to be invoked when these rules are recognized, and a low-level routine
to do the basic input. Yacc then generates a function to control the input process. This
function, called a parser, calls the user-supplied low-level input routine (the lexical analyzer)
to pick up the basic items (called tokens) from the input stream. These tokens are organized
according to the input structure rules, called grammar rules; when one of these rules has
been recognized, then user code supplied for this rule, an action, is invoked; actions have the
ability to return values and make use of the values of other actions.
Yacc is written in a portable dialect of C[1] and the actions, and output subroutine, are in C as
well. Moreover, many of the syntactic conventions of Yacc follow C.
The heart of the input specification is a collection of grammar rules. Each rule describes an
allowable structure and gives it a name. For example, one grammar rule might be
date : month_name day ',' year ;
Here, date, month_name, day, and year represent structures of interest in the input process;
presumably, month_name, day, and year are defined elsewhere. The comma ``,'' is enclosed
in single quotes; this implies that the comma is to appear literally in the input. The colon and
semicolon merely serve as punctuation in the rule, and have no significance in controlling the
input. Thus, with proper definitions, the input
July 4, 1776
might be matched by the above rule.
Page 70
Page 71
Source Code:
LEX
%{
#include"y.tab.h"
%}
%%
[a] return A;
[b] return B;
%%
YACC
%{
#include<stdio.h>
%}
%token A B
%%
S:A S B
|
Lab Manual of Compiler Design
Page 72
Page 73
Program #17
Program to recognize the Context free grammar (anbn, n>= 10), Where a & b
are input symbols of the grammar.
A context-free grammar (CFG) is a set of recursive rewriting rules (or productions)
used to generate patterns of strings.
a set of terminal symbols, which are the characters of the alphabet that appear in the strings
generated by the grammar.
a set of nonterminal symbols, which are placeholders for patterns of terminal symbols that can
be generated by the nonterminal symbols.
a set of productions, which are rules for replacing (or rewriting) nonterminal symbols (on the
left side of the production) in a string with other nonterminal or terminal symbols (on the right
side of the production).
a start symbol, which is a special nonterminal symbol that appears in the initial string generated
by the grammar.
Page 74
The set of terminal strings we can generate with at most two productions is therefore {s, wcds}.
3. Applying at most three productions, we can generate:
{wcdwcdwcd<S>, wcdwcdb<L>e, wcdwcds, wcdb<L>;<S>e,
wcdb<S>e, bwcd<S>e, bb<L>ee, bse, b<L>;<S>Se,
b<S><S>e, b<L>wcd<S>e, b<L>b<L>ee, b<L>se }
Page 75
Source Code:
LEX
%{
#include"y.tab.h"
%}
%%
[a] return A;
[b] return B;
%%
YACC
%{
#include<stdio.h>
%}
%token A B
%%
stat:exp B
;
exp:A A A A A A A A A exp1
;
exp1:A exp2
|A
|A A exp2
|A A A exp2
|A A A A exp2
;
exp2:A
;
%%
main()
{
printf("Enter the string\n");
if(yyparse()==0)
{
printf("Valid\n");
}
}
yyerror(char *s)
{
printf("error\n");
Lab Manual of Compiler Design
Page 76
OUTPUT:
$lex p6.l
$yacc d p6.y
$cc lex.yy.c y.tab.c ll
$./a.out
Enter the string
aaaaaaaaaaab
Valid
$./a.out
Enter the string
aab
error
Page 77
Program #18
Write a C program to implement the syntax-directed definition of if E then
S1 and if E then S1 else S2.
/* Input to the program is assumed to be syntactically correct. The expression of if statement,
for true condition and statement for false condition are enclosed in parenthesis */
Some programming languages permit the user to use words like ``if'', which are normally
reserved, as label or variable names, provided that such use does not conflict with the legal
use of these names in the programming language. This is extremely hard to do in the
framework of Yacc; it is difficult to pass information to the lexical analyzer telling it ``this
instance of `if' is a keyword, and that instance is a variable''. The user can make a stab at it,
using the mechanism described in the last subsection, but it is difficult.
A number of ways of making this easier are under advisement. Until then, it is better that the
keywords be reserved; that is, be forbidden for use as variable names. There are powerful
stylistic reasons for preferring this, anyway.
10: Advanced Topics
This section discusses a number of advanced features of Yacc.
Simulating Error and Accept in Actions
The parsing actions of error and accept can be simulated in an action by use of macros
YYACCEPT and YYERROR. YYACCEPT causes yyparse to return the value 0; YYERROR
causes the parser to behave as if the current input symbol had been a syntax error; yyerror is
called, and error recovery takes place. These mechanisms can be used to simulate parsers
with multiple endmarkers or context-sensitive syntax checking.
Accessing Values in Enclosing Rules.
An action may refer to values returned by actions to the left of the current rule. The
mechanism is simply the same as with ordinary actions, a dollar sign followed by a digit, but in
this case the digit may be 0 or negative. Consider
sent
;
adj
:
THE
|
YOUNG {
...
;
$$ = THE; }
$$ = YOUNG; }
Page 78
:
|
DOG
{
$$ = DOG; }
CRONE
{
if( $0 == YOUNG ){
printf( "what?\n" );
}
$$ = CRONE;
}
;
...
In the action following the word CRONE, a check is made that the preceding token shifted
was not YOUNG. Obviously, this is only possible when a great deal is known about what
might precede the symbol noun in the input. There is also a distinctly unstructured flavor
about this. Nevertheless, at times this mechanism will save a great deal of trouble, especially
when a few combinations are to be excluded from an otherwise regular structure.
Source Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int parsecondition(char[],int,char*,int);
void gen(char [],char [],char[],int);
int main()
{
int counter = 0,stlen =0,elseflag=0;
char stmt[60]; // contains the input statement
char strB[54]; // holds the expression for 'if'
condition
char strS1[50]; // holds the statement for true
condition
char strS2[45]; // holds the statement for false
condition
printf("Format of if statement \n Example...\n");
printf("if (a<b) then (s=a);\n");
printf("if (a<b) then (s=a) else (s=b);\n\n");
printf("Enter the statement \n");
gets(stmt);
stlen = strlen(stmt);
counter = counter + 2; // increment over 'if'
counter = parsecondition(stmt,counter,strB,stlen);
if(stmt[counter]==')')
Lab Manual of Compiler Design
Page 79
Page 80
Page 81
Page 82