Compiler Lab Manual PDF
Compiler Lab Manual PDF
CDGI'S
Content
1.
Scope of the course
2.
Disciplines involved in it
3.
Abstract view for a compiler
4.
Front-end and back-end tasks
5.
Modules
6.
List of Practicals
7.
LINUX O/S
8.
C++ / JAVA program backup
( Year 2013-2014)
Name: ________________________________________
Branch: _______________________________________
Semester:_______________________________________
Section: ________________________________________
Subject: _______________________________________
Certified by:
Total Practical :
Practicals performed:
Faculty Name/Signature
PRACTICAL LIST
2.
Create a file (Comiler.cc) & Implement a Program to read all the content of a
Compiler.cc (how many lines, how many words and how many character in
the file) .
3.
Write a program for implementation of Deterministic Finite Automata (DFA)
for the strings accepted by (“abbb”, “abb”, “ab”,”a”).
4.
Construction of Minimization of Deterministic Finite Automata for the given
diagram & recognize the string (aa + b)*ab(bb)*.
5.
Construct a program for how to Compute FIRST () & FOLLOW () symbol for
LL(1 ) grammar, if the Context free grammar for LL(1) Construction is……..?
6.
Construct a Operator Precedence Parser for the following given grammar
and also compute Leading () and trailing () symbols of the given grammar.
7.
Program using LEX to count the number of characters, words, spaces and
lines in a given input file.
8.
Program using LEX to count the numbers of comment lines in a given C
program. Also eliminate them and copy the resulting program into separate
file.
9.
Program using LEX to recognize a valid arithmetic expression and to
recognize the identifiers and operators present. Print them separately.
10.
Program using LEX to recognize whether a given sentence is simple or
compound.
11.
Program using LEX to recognize and count the number of identifiers in a
given input file.
12.
Implement a YACC (Yet Another Compiler Compiler ) program to recognize a
valid arithmetic expression that uses operators +, -, * and /.
13.
Implement YACC (Yet Another Compiler Compiler ) program to recognize a
valid variable, which starts with a letter, followed by any number of letters or
digits.
14.
YACC (Yet Another Compiler Compiler ) program to recognize strings
‘aaab’, ‘abbb’, ‘ab’ and ‘a’ using the grammar (anbn, n>= 0).
15.
Program to recognize the Context free grammar (anbn, n>= 10), Where a & b
are input symbols of the grammar.
16.
Write a C program to implement the syntax-directed definition of “if E then
S1” and “if E then S1 else S2”.
Practical List
S.No. Practical Date of Experiment Date of Submission Signature & Remarks
1. Practical 1
2. Practical 2
3. Practical 3
4. Practical 4
5. Practical 5
6. Practical 6
7. Practical 7
8. Practical 8
9. Practical 9
10. Practical 10
11. Practical 11
12. Practical 12
13. Practical 13
14. Practical 14
15. Practical 15
16. Practical 16
17. Practical 17
18. Practical 18
19. Practical 19
20. Practical 20
21. Practical 21
22. Practical 22
23. Practical 23
24. Practical 24
25. Practical 25
LAB MANUAL
SUBJECT NAME--------------
SUBJECT CODE------------------
CLASS------------------------------
SEMESTER-------------------------
Course scope
Aim:
To learn techniques of a modern compiler
Main reference:
Compilers – Principles, Techniques and Tools, Second Edition by Alfred V. Aho, Ravi Sethi,
Jeffery D. Ullman
Supplementary references:
nd
Modern compiler construction in Java 2 edition Advanced Compiler Design and
Implementation by Muchnick.
Subjects
Lexical analysis (Scanning)
Syntax Analysis (Parsing)
Syntax Directed Translation
Intermediate Code Generation
Run-time environments
Code Generation
Machine Independent Optimization
Compiler learning
Isn’t it an old discipline?
Yes, it is a well-established discipline
Algorithms, methods and techniques are researched and developed in early stages of
computer science growth
There are many compilers around and many tools to generate them automatically
So, why we need to learn it?
Although you may never write a full compiler
But the techniques we learn is useful in many tasks like writing an interpreter for a scripting
language, validation checking for forms and so on.
Terminology
Compiler:
a program that translates an executable program in one language into an executable program
in another language. we expect the program produced by the compiler to be better, in some
way, than the original
Interpreter:
a program that reads an executable program and produces the results of running that
program. usually, this involves executing the source program in some fashion. Our course is
mainly about compilers but many of the same issues arise in interpreters.
Disciplines involved
Algorithms
Languages and machines
Operating systems
Computer architectures
Why Study Compilers?
General background information for good software engineer.
Increases understanding of language semantics.
Seeing the machine code generated for language constructs helps understand performance
issues for languages.
Teaches good language design.
New devices may need device-specific languages.
New business fields may need domain-specific languages.
Abstract view
Compilers translate from a source language (typically a high level language) to a functionally
equivalent target language (typically the machine code of a particular machine or a machine-
independent virtual machine).
Compilers for high level programming languages are among the larger and more complex
pieces of software
• Original languages included Fortran and Cobol
• Often multi-pass compilers (to facilitate memory reuse)
• Compiler development helped in better programming language design
• Early development focused on syntactic analysis and optimization
• Commercially, compilers are developed by very large software groups
• Current focus is on optimization and smart use of resources for modern RISC (reduced
instruction set computer) architectures.
Source
code Compiler
Introduction to Compiler:
Translator issues, why to write compiler, compilation process in brief, front end and backend
model, compiler construction tools, Interpreter and the related issues, Cross compiler,
Incremental compiler, Boot strapping.
1. Lexical Analysis
Review of lexical analysis: alphabet, token, lexical error, Block schematic of lexical
analyser, Automatic construction of lexical analyser (LEX), LEX specification details.
2. Syntax Analysis
Introduction: Role of parsers, Parsing technique: Top down-RD parser, Predictive LL
(k) parser, Bottom up-shift-Reduce, SLR, LR(k), LALR etc. using ambiguous grammars,
Error detection and recovery, Automatic construction of parser (YACC), YACC
specifications.
semantic analysis
Need of semantic analysis, type checking and type conversation.
3. Syntax directed translation
Syntax directed definitions, construction of syntax trees, bottom-up evaluation of S-
attribute definition, L-attributed definition , Top-down translation, Bottom-up evaluation of
inherited attributes.
Intermediate code Generation: Intermediate code generation for declaration,
assignment, iterative statements, case statements, arrays, structures, conditional
statements, Boolean expressions, procedure calls, Intermediate code Generation using
YACC
4. Run Time Storage Organisation
Storage allocation strategies, static, dynamic storage allocation, allocation strategies for
block structured and non-block structured languages; O.S. support required for IO
statements. (e.g. printf, scanf) and memory allocation deallocation related statement.
(e.g. new, malloc)
Definition
A compiler is a computer program (or set of programs) that transforms source code written in
a programming language (the source language) into another computer language (the target
language, often having a binary form known as object code).
Interpreters
• Text formatters (e.g. TeX and LaTeX)
• Silicon compilers (e.g. VHDL)
• Query interpreters/compilers (Databases)
Grouping of phases
Incremental compiler
The term incremental compiler may refer to two different types of compiler.
• Imperative programming
• Interactive Programming
In imperative programming and software development, an incremental compiler is one that
when invoked, takes only the changes of a known set of source files and updates any
corresponding output files (in the compiler's target language, often bytecode) that may
already exist from previous compilations. By effectively building upon previously compiled
output files, the incremental compiler avoids the wasteful recompilation entire source files,
where most of the code remains unchanged. For most incremental compilers, compiling a
program with small changes to its source code is usually near instantaneous. It can be said
that an incremental compiler reduces the granularity of a language's traditional compilation
units while maintaining the language's semantics, such that the compiler can append and
replace smaller parts.
Cross compiler
A cross compiler is a compiler capable of creating executable code for a platform other than
the one on which the compiler is run.
Cross compiler tools are used to generate executables for embedded system or multiple
platforms.
It is used to compile for a platform upon which it is not feasible to do the compiling, like micro
controllers that don't support an operating system.
Phases of a Compiler
Source Program
1
Lexical Analyzer
2
Syntax Analyzer
3
Semantic Analyzer
5
Code Optimizer
6
Code Generator
Target Program
Program #01
Statement
syntax tree
Expression
Term
Lexical Analysis
Stream of characters is grouped into tokens
Examples of tokens are identifiers, reserved words, integers, doubles or floats, delimiters,
operators and special symbols int a; a = a + 2; int reserved word a
identifier ; special symbol a identifier = operator a
identifier + operator 2 integer constant ; special symbol
Examples of Token
Token: A sequence of characters to be treated as a
single unit.
• Examples of tokens.
– Reserved words (e.g. begin, end, struct, if etc.)
– Keywords (integer, true etc.)
– Operators (+, &&, ++ etc)
– Identifiers (variable names, procedure names, parameter names)
– Literal constants (numeric, string, character constants etc.)
– Punctuation marks (:, , etc.)
Source Code:
#include<iostream.h>
void main()
{
char c[30];
int n=0;
cout<<"Enter the String" <<"\n";
cin>>c;
for(int i=0;c[i]!='\0';i++)
{
n=n+1;
}
cout<<"Length of the string is"<<n;
getch();
}
Input:
Output:
Source Code:
#include <iostream>
using namespace std;
#include <conio.h>
#include <iomanip>
# include <string>
int main()
{
char string [100] = {'/0'};
char word[100] = {'/0'};
int x = 0; int n = 0;
int letterCount[100] = {0};
input(string);
wordCount(string);
longWord(string, x);
numbers (string);
outputLetterCounts(letterCount);
return 0;
}
void input (char *enter)
{
cout<<"Enter sentence(s) "<< std::endl;
std::cin.getline(enter, SIZE);
int len = strlen( enter );
}
void wordCount (char *word2)
{
int cnt = 0;
while(*word2 != '\0')
{
while(isspace(*word2))
{
++word2;
}
if(*word2 != '\0')
{
++cnt;
while(!isspace(*word2) && *word2 != '\0')
++word2;
}
}
std::cout<<"Number of words: "<<cnt<<endl;
}
void outputLetterCounts(int letterCount[])
{
for (int l = 0; l < 26; l++)
{
if (letterCount[l] > 0)
{
cout << letterCount[l] << " " << char('a' + l) << endl;
}
}
}
int num = 0;
int amount = strlen(word2);
}
std::cout<<"Digits: "<<num<<endl;
}
void longWord (char *temp, int x )
{
int counter = 0;
int max_word = -1;
{
if(counter > max_word)
{
max_word = counter;
}
counter = 0;
}
}
std::cout <<"Longest word:" << max_word;
Input:
Output:
Total No. of letters/ characters of given string without spaces (Excluding white spaces.).
Program #02
/* Create a file (Comiler.cc) & Implement a Program to read all the content
of a Compiler.cc (how many lines, how many words and how many
character in the file) .*/
Source Code:
#include<stdio.h>
int main()
{
int noc=0,now=0,nol=0;
FILE *fw,*fr;
char fname[20],ch;
printf("\n enter the source file name’ Comiler.cc’");
gets(fname);
fr=fopen(fname,"r");
if(fr==NULL)
{
printf("\n error \n");
exit(0);
}
ch=fgetc(fr);
while(ch!=EOF)
{
noc++;
if(ch==' ');
now++;
if(ch=='\n')
{
nol++;
now++;
}
ch=fgetc(fr);
}
fclose(fr);
printf("\n total no of character=%d",noc);
printf("\n total no of words=%d",now);
printf("\n total no of lines=%d",nol);
return 0;
}
#include <fstream>
#include <iostream>
#include <string>
//#include <cctype>
int main()
{
string name; // name of the file to be opened
do
{
int count; // just a counter
// Prompt for user input and open the specified file
if (count != 1)
{
cout << "Enter the name of a file: ";
}
else
{
cout << "File Not Found!\nEnter the name of a file: ";
}
cin >> name;
cin.ignore(); // ignore the next character in the buffer
fileName.open(name.c_str()); // convert name to a c-style string
count = 1;
} while (!fileName);
prevChar = ch;
return 0;
}
Input:
Output:
Program #03
· an alphabet (Σ)
· a start state
The machine starts in the start state and reads in a string of symbols from its alphabet. It
uses the transition function T to determine the next state using the current state and the
symbol just read. If, when it has finished reading, it is in an accepting state, it is said to accept
the string, otherwise it is said to reject the string. The set of strings it accepts form a
language, which is the language the DFA recognizes.
· an alphabet (Σ)
· a start state
Where P(S) is the power set of S and ε is the empty string. The machine starts in the start
state and reads in a string of symbols from its alphabet. It uses the transition relation T to
determine the next state(s) using the current state and the symbol just read or the empty
string. If, when it has finished reading, it is in an accepting state, it is said to accept the string,
otherwise it is said to reject the string. The set of strings it accepts form a language, which is
Source Code:
#include<stdio.h>
#include<iostream.h>
#include<stdlib.h>
#include<conio.h>
void main()
{int n,m,start,nf,ps;
char str[20];
clrscr();
printf("enter no of states");
scanf("%d",&n);
printf("enter no of inputs");
scanf("%d",&m);
//constructing buffers
int **tran=new int* [n];
for(int i=0;i<n;i++)
{
tran[i]=new int[m];
}
for(i=0;i<n;i++)
{
for(int j=0;j<m;j++)
{
printf("enter next state for present state %d on input%d ",i,j);
scanf("%d",&tran[i][j]);
}
}
printf("enter starting state");
scanf("%d",&start);
printf("enter string");
scanf("%s",str);
i=0;
ps=start;
while(str[i]!='\0')
{
ps=tran[ps][str[i]-48];
i++;
}
for(i=0;i<n;i++)
{
if(ps==final[i])
{
printf("accepted");
break;
}
}
//deleting buffer
delete final;
for(i=0;i<n;i++)
{
delete tran[i];
}
delete tran;
}
Program #04
Closer examination reveals that states s2 and s7 are really the same since they are both
accepting states and both go to s6 under the input b and both go to s3 under an a. So, why
not merge them and form a smaller machine? In the same manner, we could argue for
merging states s0 and s5. Merging states like this should produce a smaller automaton that
accomplishes exactly the same task as our original one.
From these observations, it seems that the key to making finite automata smaller is to
recognize and merge equivalent states. To do this, we must agree upon the definition of
equivalent states. Here is one formulation of what Moore defined as indistinguishable states.
Definition. Two states in a finite automaton M are equivalent if and only if for
every string x, if M is started in either state with x as input, it either accepts in
both cases or rejects in both cases.
Another way to say this is that the machine does the same thing when started in either state.
This is especially necessary when finite automata produce output.
Two questions remain. First, how does one find equivalent states, and then, exactly how
valuable is this information? We shall answer the second question first by providing a
corollary to a famous theorem proven long ago by Myhill [3] and Nerode [4].
With one more observation, we shall be able to present an algorithm for transforming an
automaton into its smallest equivalent machine.
Now we know that if we can find the equivalence classes (or groups of equivalent states) for
an automaton, then we can use these as the states of the smallest equivalent machine. The
machine shown in figure 1 will be used as an example for the intuitive discussion that follows.
Let us first divide the machine's states into two groups: accepting and rejecting states. These
groups are: A = {s2, s7} and B = {s0, s1, s3, s4, s5, s6}. Note that these are equivalent under
the empty string as input.
Then, let us find out if the states in these groups go to the same group under inputs a and b.
As we noted at the beginning of this discussion, the states of group A both go to states in
group B under both inputs. Things are different for the states of group B. The following table
shows the result of applying the inputs to these states. (For example, the input a leads from
s1 to s5 in group B and input b leads to to s2 in group A.)
in state: s0 s1 s3 s4 s5 s6
a leads to: B B B B B B
b leads to: B A B B B A
Looking at the table we find that the input b helps us distinguish between two of the states (s1
and s6) and the rest of the states in the group since it leads to group A for these two instead
of group B. Thus the states in the set {s0, s3, s4, s5} cannot be equivalent to those in the set
{s1, s6} and we must partition B into two groups. Now we have the groups:
and the next examination of where the inputs lead shows us that s3 is not equivalent to the
rest of group B. We must partition again.
Continuing this process until we cannot distinguish between the states in any group by
employing our input tests, we end up with the groups:
In view of the above theoretical definitions and results, it is easy to argue that all of the states
in each group are equivalent because they all go to the same groups under the inputs a and
b. Thus in the sense of Moore the states in each group are truly indistinguishable. We also
can claim that due to the corollary to the Myhill-Nerode theorem, any automaton that accepts
(aa + b)*ab(bb)* must have at least five states.Building the minimum state finite automaton is
now rather straightforward. We merely use the equivalence classes (our groups) as states
and provide the proper transitions. This gives us the finite automaton pictured in figure 2.
The complexity of this algorithm is O(n2) since we check all of the states each time we
execute the repeat loop and might have to execute the loop n times since it might take an
input of length n to distinguish between two states. A faster algorithm was later developed by
Hopcroft.
Source Code:
#include<stdio.h>
#include<iostream.h>
#include<string.h>
#include<stdlib.h>
#include<conio.h>
void main()
{
int nstates,minputs,start,nf,ps;
char str[20];
clrscr();
printf("enter no of states");
scanf("%d",&nstates);
printf("enter no of inputs");
scanf("%d",&minputs);
//constructing buffers
int **tran=new int* [nstates];
for(int i=0;i<nstates;i++)
{
tran[i]=new int[minputs];
}
for(i=0;i<nstates;i++)
{
for(int j=0;j<minputs;j++)
{
printf("enter next state for present state %d on input%d ",i,j);
scanf("%d",&tran[i][j]);
}
}
printf("enter starting state");
scanf("%d",&start);
memset(groupgroup[i],-1,2*sizeof(int));
memset(groupstate[i],-1,(nstates+1)*sizeof(int));
}
for(i=0;i<nf;i++)
{
stategroup[final[i]]=0;
groupstate[0][final[i]]=1;
}
for(i=0;i<nstates;i++)
{
if(stategroup[i]!=0)
{
stategroup[i]=1;
groupstate[1][i]=1;
}
}
{
change=1;
//find for any group going to presentgroup
int flag=0;
for(int
anygroup=0;anygroup<latestgroupcount;anygroup++)
{
if(groupgroup[anygroup]
[0]==presentgroup
&&groupgroup[anygroup]
[1]==groupgroup[group][1])
{flag=1;
break;
}
}
//change groupgroup
//change stategroup
//change groupstate and groupcount
anygroup=flag==1?
anygroup:latestgroupcount++;
groupgroup[anygroup][0]=presentgroup;
groupgroup[anygroup][1]=groupgroup[group]
[1];
stategroup[count]=anygroup;
groupstate[anygroup][count]=1;
groupstate[group][count]=-1;
groupstate[anygroup]
[nstates]=groupgroup[group][1];
}
}
count++;
}//end of while
if(maxgroupcount<latestgroupcount){maxgroupcount=latestgroupcount;}
}//checking all the groups for loop
groupcount=maxgroupcount;
}//checking all the inputs for loop
}while(change!=0);
/////////////////////////////end of minimization////////////////////////////////////////////////
printf("\n\nGroups\n\n");
for(i=0;i<groupcount;i++)
{
printf("%d ",i);
for(int j=0;j<nstates;j++)
{
if(groupstate[i][j]!=-1)
printf(" %d ",j);
}
printf("\n");
}
//deleting buffer
delete stategroup;
delete final;
for(i=0;i<nstates;i++)
{
delete tran[i];
delete groupstate[i];
delete groupgroup[i];
}
delete groupgroup;
delete tran;
delete groupstate;
}
INPUT:
OUTPUT:
Program #05
/*Construct a program for how to calculate FIRST () & FOLLOW () symbol
for LL(1 ) grammar, if the Context free grammar for LL(1) Construction is
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
The construction of a predictive parser is aided by two functions associated with a grammar
G. These functions, FIRST and FOLLOW, allow us to fill in the entries of a predictive parsing
table for G, whenever possible. Sets of tokens yielded by the FOLLOW function can also be
used as synchronizing tokens during panic-mode error recovery.just suppose for a sec that
you r ll(1) parser and you have n supernatural power of seeing the future of string by one
step.
FIRST(α)
If α is any string of grammar symbols, let FIRST(α) be the set of terminals that begin the
strings derived from α. If α ⇒ ε then ε is also in FIRST(α).
To compute FIRST(X) for all grammar symbols X, apply the following rules until no more
terminals or ε can be added to any FIRST set:
FIRST(X). If Y1 does not derive ε, then we add nothing more to FIRST(X), but if Y1⇒ ε, then
we add FIRST(Y2) and so on.
Now, we can compute FIRST for any string X1X2 . . . Xn as follows. Add to FIRST(X1X2 …
Xn) all the non-ε symbols of FIRST(X1). Also add the non-ε symbols of FIRST(X2) if ε is in
FIRST(X1), the non-ε symbols of FIRST(X 3) if ε is in both FIRST(X 1) and FIRST(X2), and so
on. Finally, add ε to FIRST(X1X2 … Xn) if, for all i, FIRST(X i) contains ε.
FOLLOW(A)
Define FOLLOW(A), for nonterminal A, to be the set of terminals a that can appear
immediately to the right of A in some sentential form, that is, the set of terminals a such that
there exists a derivation of the form S⇒αΑaβ for some α and β. Note that there may, at some
time during the derivation, have been symbols between A and a, but if so, they derived ε and
disappeared. If A can be the rightmost symbol in some sentential form, then $, representing
the input right endmarker, is in FOLLOW(A).
To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be
added to any
FOLLOW set:
1.Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker.
2.If there is a production A ⇒ αΒβ, then everything in FIRST(β), except for ε, is placed in
FOLLOW(B).
3.If there is a production A ⇒ αΒ, or a production A ⇒ αΒβ where FIRST(β) contains ε (i.e., β
⇒ε),
EXAMPLE:
E → T E’
E’→ + T E’ | ε
T → F T’
T’→ * F T’ | ε
F → ( E ) | id
Then:
FIRST(E’) = {+, ε}
FIRST(T’) = {*, ε}
FOLLOW(E) = FOLLOW(E’) = {) , $}
FOLLOW(F) = {+, *, ), $
Algorithm:
FIRST :
eg. first(abAb)={a};
2.If a production of this type A->BCD... means all are variable or non-terminal then
if first(B) donot contains null then first(A)=first(B)
stop here.
else
then also check for next non terminal like C here same as above step and
first(A)=First(B)+first(C).
if we get null stop there.
FOLLOW:
Source Code:
#include"stdio.h"
#include<conio.h>
char array[10][20],temp[10];
int c,n;void fun(int,int[]);
int fun2(int i,int j,int p[],int key)
{
int k;
if(!key)
{
for(k=0;kc)return 1;
else return 0;
}
}
void fun(int i,int p[])
{
int j,k,key;
for(j=2;array[i][j]!='';j++)
{
if(array[i][j-1]=='/')
{
if(array[i][j]>='A'&&array[i][j]<='Z')
{
key=0;
fun2(i,j,p,key);
}
else
{
key=1;
if(fun2(i,j,p,key))
temp[++c]=array[i][j];
if(array[i][j]=='@'&&p[0]!=-1)
{ //taking ,@, as null symbol.
if(array[p[0]][p[1]]>='A'&&array[p[0]][p[1]]<='Z')
{
key=0;
fun2(p[0],p[1],p,key);
}
else
if(array[p[0]][p[1]]!='/'&&array[p[0]][p[1]]!='')
{
if(fun2(p[0],p[1],p,key))
temp[++c]=array[p[0]][p[1]];
}
}
}
}
}
}
void main()
{
int p[2],i,j;
clrscr();
printf("Enter the no. of productions :");
scanf("%d",&n);
printf("Enter the productions :\n");
for(i=0;i<n;i++)
scanf("%s",array[i]);
for(i=0;i<n;i++)
{
c=-1,p[0]=-1,p[1]=-1;
fun(i,p);
printf("First(%c) : [ ",array[i][0]);
for(j=0;j<=c;j++)
printf("%c,",temp[j]);
printf("\b ].\n");
getch();
}
}
INPUT:
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
OUTPUT:
Program #06
Operator-Precedence Parser
Operator grammar
small, but an important class of grammars we may have an efficient operator precedence
parser (a shift-reduce parser) for an operator grammar.
Precedence Relations
In operator-precedence parsing, we define three disjoint precedence relations between
certain pairs of terminals.
.
a < b b has higher precedence than a
a =· b b has same precedence as a
.
a > b b has lower precedence than a
The determination of correct precedence relations between terminals are based on the
traditional notions of associativity and precedence of operators. (Unary minus causes a
problem).
The intention of the precedence relations is to find the handle of a right-sentential form,
.
< with marking the left end,
=· appearing in the interior of the handle, and
.
> marking the right hand.
In our input string $a a ...a $, we insert the precedence relation between the pairs of
1 2 n
terminals (the precedence relation holds between the terminals in that pair).
Then the input string id+id*id with the precedence relations inserted will be:
. . . . . .
$ < id > + < id > * < id > $
Algorithm:
set p to point to the first symbol of w$ ;
repeat forever
if ( $ is on top of the stack and p points to $ ) then return
else {
let a be the topmost terminal symbol on the stack and let b be the symbol pointed to
by p;
if ( a <. b or a =· b ) then { /* SHIFT */
push b onto the stack;
advance p to the next input symbol;
}
else if ( a .> b ) then /* REDUCE */
repeat pop stack
until ( the top of stack terminal is related by <. to the terminal most recently popped
);
else error();
}
Advantages :
simple
Source Code:
#include<iostream.h>
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
#include<string.h>
int *flagstate;// to see whether leading has already been found for a Non terminal
char **foundlead;//contains the already found leading for a Non terminal
int *NtSymbols;//used to reduce time complexity by storing where the
productions for a non terminal are stored in arr
char **foundtrail;//contains the already found trailing for a Non terminal
int **trailgoesto;//to tell which Non Terminals trailing goes to whose trailing
int **leadgoesto;//to tell which Non Terminals leading goes to whose leading
}
}
return change;
}
void leading(int no_of_nonterminals)
{
int nonterminals=0;
char Gamma,str[10]={'\0'};
while(nonterminals<no_of_nonterminals)
{
for(int eachletter=1;arr[nonterminals][eachletter]!='\0';eachletter++)
{
Gamma=arr[nonterminals][eachletter];
if(isupper(Gamma))
{
leadgoesto[ NtSymbols[toascii(Gamma)-65] ][
leadgoesto[NtSymbols[toascii(Gamma)-65]][0]+1 ]=nonterminals;
leadgoesto[ NtSymbols[toascii(Gamma)-65] ][0]++;
continue;
}
else
{
if(Gamma=='\x0')
{break;}
if(Gamma=='/')
{continue;}
str[0]=Gamma;
str[1]='\0';
strmergeunique(foundlead[nonterminals],str);
while(arr[nonterminals][eachletter+1]!='\x0'&&
arr[nonterminals][eachletter+1]!='/')
{
eachletter++;
}
}
nonterminals++;
}
int change=0;
Lab Manual of Compiler Design Page 43
Chameli Devi School Of Engineering, Indore
do
{
change=0;
for(int i=0;i<nonterminals;i++)
{
for(int j=1;j<=leadgoesto[i][0];j++)
{
change|=strmergeunique(foundlead[leadgoesto[i][j]],foundlead[i]);
}
}
}
while(change);
}
void trailing(int no_of_nonterminals)
{
int nonterminals=0;
char Delta,str[10]={'\0'};
while(nonterminals<no_of_nonterminals)
{
int eachletter=strlen(arr[nonterminals])-1;
for(;eachletter>0;eachletter--)
{
Delta=arr[nonterminals][eachletter];
// *******alpha B
if(isupper(Delta))
{
trailgoesto[ NtSymbols[toascii(Delta)-65] ][
trailgoesto[NtSymbols[toascii(Delta)-65]][0]+1 ]=nonterminals;
trailgoesto[ NtSymbols[toascii(Delta)-65] ][0]++;
if(arr[nonterminals][eachletter-1]!='/'&&
eachletter-1>0)
{Delta=arr[nonterminals][eachletter-1];
if(!isupper(Delta))
{
str[0]=Delta;
str[1]='\0';
strmergeunique(foundtrail[nonterminals],str);
}
}
Lab Manual of Compiler Design Page 44
Chameli Devi School Of Engineering, Indore
}
// B alpha
// ***** alpha
else
{
if(Delta=='/')
{continue;}
str[0]=Delta;
str[1]='\0';
strmergeunique(foundtrail[nonterminals],str);
Delta=arr[nonterminals][eachletter-1];
if(isupper(Delta)&&eachletter-1>0)
{
trailgoesto[ NtSymbols[toascii(Delta)-65] ][
trailgoesto[NtSymbols[toascii(Delta)-65]][0]+1 ]=nonterminals;
trailgoesto[ NtSymbols[toascii(Delta)-65] ][0]++;
}
}
while(eachletter-1>0&&
arr[nonterminals][eachletter-1]!='/')
{
eachletter--;
}
}
nonterminals++;
}
int change=0;
do
{
change=0;
for(int i=0;i<nonterminals;i++)
{
for(int j=1;j<=trailgoesto[i][0];j++)
{
change|=strmergeunique(foundtrail[trailgoesto[i][j]],foundtrail[i]);
}
}
}
while(change);
Lab Manual of Compiler Design Page 45
Chameli Devi School Of Engineering, Indore
}
void main()
{
int nt;
clrscr();
printf("Enter no.of nonterminals :");
scanf("%d",&nt);
arr=new char*[nt];
foundlead=new char*[nt];
foundtrail=new char*[nt];
flagstate=new int[nt];
leadgoesto=new int*[nt];
trailgoesto=new int*[nt];
NtSymbols=new int[26];
for (int i=0;i<nt;i++)
{
arr[i]=new char[100];
foundlead[i]=new char[10];
memset(foundlead[i],'\0',10);
foundtrail[i]=new char[10];
memset(foundtrail[i],'\0',10);
flagstate[i]=0;
leadgoesto[i]=new int[nt];
leadgoesto[i][0]=0;
trailgoesto[i]=new int[nt];
trailgoesto[i][0]=0;
printf("Enter non terminal ");
cin>>arr[i][0];
flushall();
printf("Enter Production for %c------>",arr[i][0]);
gets(arr[i]+1);
NtSymbols[toascii(arr[i][0])-65]=i;
char prod[50];
leading(nt);
trailing(nt);
cout<<endl<<endl;
for(i=0;i<nt;i++)
{
printf("leading (%c)--> { %s }\n",arr[i][0],foundlead[i]);
Lab Manual of Compiler Design Page 46
Chameli Devi School Of Engineering, Indore
Program #07
Program using LEX to count the number of characters, words, spaces and
Lines in a given input file.
Lexical Analyzer
The main task of the lexical analyzer is to read the input source program, scanning the
characters, and produce a sequence of tokens that the parser can use for syntactic analysis.
The interface may be to be called by the parser to produce one token at a time Maintain
internal state of reading the input program (with lines) Have a function “getNextToken” that
will read some characters at the current state of the input and return a token to the parser
Other tasks of the lexical analyzer include Skipping or hiding whitespace and comments
Keeping track of line numbers for error reporting Sometimes it can also produce the
annotated lines for error reports Produce the value of the token Optional: Insert identifiers into
the symbol table
E.g. don’t want invisible characters in error messages For every end-of-line, keep track of line
numbers for error reporting Skip over or hide whitespace and comments If comments are
nested (not common), must keep track of nesting to find end of comments May produce
hidden tokens, for convenience of scanner structure Always produce an end-of-file token
Important that quoted strings and comments don’t get stuck if an unexpected end of file
occurs
Source Code:
%{
int ch=0, bl=0, ln=0, wr=0;
%}
%%
[\n] {ln++;wr++;}
[\t] {bl++;wr++;}
[" "] {bl++;wr++;}
[^\n\t] {ch++;}
%%
int main()
{
FILE *fp;
char file[10];
printf("Enter the filename: ");
scanf("%s", file);
yyin=fp;
yylex();
printf("Character=%d\nBlank=%d\nLines=%d\nWords=%d", ch, bl, ln, wr);
return 0;
}
INPUT:
A input file (.doc or any format), counts number of characters, words, spaces and Lines in a
given input file.
OUTPUT:
Program #08
Compiler-construction tools
Originally, compilers were written “from scratch”, but now the situation is quite different. A
number of tools are available to ease the burden.
We will study tools that generate scanners and parsers. This will involve us in some theory,
regular expressions for scanners and various grammars for parsers. These techniques are
fairly successful. One drawback can be that they do not execute as fast as “hand-crafted”
scanners and parsers.
We will also see tools for syntax-directed translation and automatic code generation. The
automation in these cases is not as complete.
Finally, there is the large area of optimization. This is not automated; however, a basic
component of optimization is “data-flow analysis” (how values are transmitted between parts
of a program) and there are tools to help with this task.
The character stream input is grouped into meaningful units called lexemes, which are then
mapped into tokens, the latter constituting the output of the lexical analyzer. For example,
any one of the following
x3 = y + 3;
x3 = y + 3 ;
x3 =y+ 3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;.
1. The lexeme x3 would be mapped to a token such as <id,1>. The name id is short for
identifier. The value 1 is the index of the entry for x3 in the symbol table produced by
the compiler. This table is used to pass information to subsequent phases.
2. The lexeme = would be mapped to the token <=>. In reality it is probably mapped to a
pair, whose second component is ignored. The point is that there are many different
identifiers so we need the second component, but there is only one assignment symbol
=.
3. The lexeme y is mapped to the token <id,2>
Note that non-significant blanks are normally removed during scanning. In C, most blanks are
non-significant. Blanks inside strings are an exception.
Note that we can define identifiers, numbers, and the various symbols and punctuation
without using recursion (compare with parsing below).
Source Code:
%{
int com=0;
%}
%%
"/*"[^\n]+"*/" {com++;fprintf(yyout, " ");}
%%
int main()
{
printf("Write a C program\n");
yyout=fopen("output", "w");
yylex();
printf("Comment=%d\n",com);
return 0;
}
OUTPUT:
$lex p1b.l
$cc lex.yy.c –ll
$./a.out
Write a C program
#include<stdio.h>
int main()
{
int a, b;
/*float c;*/
printf(“Hai”);
/*printf(“Hello”);*/
}
[Ctrl-d]
Comment=1
$cat output
#include<stdio.h>
int main()
{
int a, b;
printf(“Hai”);
}
Program #09
Program using LEX to recognize a valid arithmetic expression and to
recognize the identifiers and operators present. Print them separately.
The user must supply a lexical analyzer to read the input stream and communicate tokens
(with values, if desired) to the parser. The lexical analyzer is an integer-valued function called
yylex. The function returns an integer, the token number, representing the kind of token read.
If there is a value associated with that token, it should be assigned to the external variable
yylval.
The parser and the lexical analyzer must agree on these token numbers in order for
communication between them to take place. The numbers may be chosen by Yacc, or chosen
by the user. In either case, the ``# define'' mechanism of C is used to allow the lexical
analyzer to return these numbers symbolically. For example, suppose that the token name
DIGIT has been defined in the declarations section of the Yacc specification file. The relevant
portion of the lexical analyzer might look like:
yylex(){
extern int yylval;
int c;
...
c = getchar();
...
switch( c ) {
...
case '0':
case '1':
...
case '9':
yylval = c-'0';
return( DIGIT );
...
}
...
The intent is to return a token number of DIGIT, and a value equal to the numerical value of
the digit. Provided that the lexical analyzer code is placed in the programs section of the
specification file, the identifier DIGIT will be defined as the token number associated with the
token DIGIT.
This mechanism leads to clear, easily modified lexical analyzers; the only pitfall is the need to
avoid using any token names in the grammar that are reserved or significant in C or the
parser; for example, the use of token names if or while will almost certainly cause severe
difficulties when the lexical analyzer is compiled. The token name error is reserved for error
handling, and should not be used naively.
As mentioned above, the token numbers may be chosen by Yacc or by the user. In the default
situation, the numbers are chosen by Yacc. The default token number for a literal character is
the numerical value of the character in the local character set. Other names are assigned
token numbers starting at 257.
To assign a token number to a token (including literals), the first appearance of the token
name or literal in the declarations section can be immediately followed by a nonnegative
integer. This integer is taken to be the token number of the name or literal. Names and literals
not defined by this mechanism retain their default definition. It is important that all token
numbers be distinct.
For historical reasons, the endmarker must have token number 0 or negative. This token
number cannot be redefined by the user; thus, all lexical analyzers should be prepared to
return 0 or negative as a token number upon reaching the end of their input.
A very useful tool for constructing lexical analyzers is the Lex program developed by Mike
Lesk.[8] These lexical analyzers are designed to work in close harmony with Yacc parsers.
The specifications for these lexical analyzers use regular expressions instead of grammar
rules. Lex can be easily used to produce quite complicated lexical analyzers, but there remain
some languages (such as FORTRAN) which do not fit any theoretical framework, and whose
lexical analyzers must be crafted by hand.
Source Code:
%{
#include<stdio.h>
int a=0,s=0,m=0,d=0,ob=0,cb=0;
int flaga=0, flags=0, flagm=0, flagd=0;
%}
id [a-zA-Z]+
%%
{id} {printf("\n %s is an identifier\n",yytext);}
[+] {a++;flaga=1;}
[-] {s++;flags=1;}
[*] {m++;flagm=1;}
[/] {d++;flagd=1;}
[(] {ob++;}
[)] {cb++;}
%%
int main()
{
printf("Enter the expression\n");
yylex();
if(ob-cb==0)
{
printf("Valid expression\n");
}
else
{
printf("Invalid expression");
}
printf("\nAdd=%d\nSub=%d\nMul=%d\nDiv=%d\n",a,s,m,d);
printf("Operators are: \n");
if(flaga)
printf("+\n");
if(flags)
printf("-\n");
if(flagm)
printf("*\n");
if(flagd)
printf("/\n");
return 0;
}
OUTPUT:
$lex p2a.l
$cc lex.yy.c –ll
$./a.out
Enter the expression
(a+b*c)
a is an identifier
b is an identifier
c is an identifier
[Ctrl-d]
Valid expression
Add=1
Sub=0
Mul=1
Div=0
Operators are:
+
*
Program #13
%{
int flag=0;
%}
%%
(""[aA][nN][dD]"")|(""[oO][rR]"")|(""[bB][uU][tT]"") {flag=1;}
%%
int main()
{
printf("Enter the sentence\n");
yylex();
if(flag==1)
printf("\nCompound sentence\n");
else
printf("\nSimple sentence\n");
return 0;
}
OUTPUT:
$lex p2b.l
$cc lex.yy.c –ll
$./a.out
Enter the sentence
I am Pooja
I am Pooja
[Ctrl-d]
Simple sentence
$./a.out
Enter the sentence
CSE or ISE
CSE or ISE
[Ctrl-d]
Compound sentence
Program #14
Lex helps write programs whose control flow is directed by instances of regular expressions in
the input stream. It is well suited for editor-script type transformations and for segmenting
input in preparation for a parsing routine.
Lex source is a table of regular expressions and corresponding program fragments. The table
is translated to a program which reads an input stream, copying it to an output stream and
partitioning the input into strings which match the given expressions. As each such string is
recognized the corresponding program fragment is executed. The recognition of the
expressions is performed by a deterministic finite automaton generated by Lex. The program
fragments written by the user are executed in the order in which the corresponding regular
expressions occur in the input stream.
Source Code:
%{
#include<stdio.h>
int count=0;
%}
op [+-*/]
letter [a-zA-Z]
digitt [0-9]
id {letter}*|({letter}{digitt})+
notid ({digitt}{letter})+
%%
[\t\n]+
("int")|("float")|("char")|("case")|("default")| ("if")|("for")|("printf")|("scanf") {printf("%s is a
keyword\n", yytext);}
{id} {printf("%s is an identifier\n", yytext); count++;}
{notid} {printf("%s is not an identifier\n", yytext);}
%%
int main()
{
FILE *fp;
char file[10];
printf("\nEnter the filename: ");
Lab Manual of Compiler Design Page 61
Chameli Devi School Of Engineering, Indore
scanf("%s", file);
fp=fopen(file,"r");
yyin=fp;
yylex();
printf("Total identifiers are: %d\n", count);
return 0;
}
OUTPUT:
$cat > input
int
float
78f
90gh
a
d
are case
default
printf
scanf
$lex p3.l
$cc lex.yy.c –ll
$./a.out
Enter the filename: input
int is a keyword
float is a keyword
78f is not an identifier
90g is not an identifier
h is an identifier
a is an identifier
d is an identifier
are is an identifier
case is a keyword
default is a keyword
printf is a keyword
scanf is a keyword
total identifiers are: 4
Program #15
YACC (Yet Another Compiler Compiler ) program to recognize a valid
arithmetic expression that uses operators +, -, * and /.
Basic Specifications
Names refer to either tokens or nonterminal symbols. Yacc requires token names to be
declared as such. In addition, for reasons discussed in Section 3, it is often desirable to
include the lexical analyzer as part of the specification file; it may be useful to include other
programs as well. Thus, every specification file consists of three sections: the declarations,
(grammar) rules, and programs. The sections are separated by double percent ``%%'' marks.
(The percent ``%'' is generally used in Yacc specifications as an escape character.)
declarations
%%
rules
%%
programs
The declaration section may be empty. Moreover, if the programs section is omitted, the
second %% mark may be omitted also;
%%
rules
Blanks, tabs, and newlines are ignored except that they may not appear in names or multi-
character reserved symbols. Comments may appear wherever a name is legal; they are
enclosed in /* . . . */, as in C and PL/I.
The rules section is made up of one or more grammar rules. A grammar rule has the form:
A : BODY ;
A represents a nonterminal name, and BODY represents a sequence of zero or more names
and literals. The colon and the semicolon are Yacc punctuation.
Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``_'',
and non-initial digits. Upper and lower case letters are distinct. The names used in the body of
a grammar rule may represent tokens or nonterminal symbols.
A literal consists of a character enclosed in single quotes ``'''. As in C, the backslash ``\'' is an
escape character within literals, and all the C escapes are recognized. Thus
'\n' newline
'\r' return
'\'' single quote ``'''
'\\' backslash ``\''
'\t' tab
'\b' backspace
'\f' form feed
'\xxx' ``xxx'' in octal
For a number of technical reasons, the NUL character ('\0' or 0) should never be used in
grammar rules.
If there are several grammar rules with the same left hand side, the vertical bar ``|'' can be
used to avoid rewriting the left hand side. In addition, the semicolon at the end of a rule can
be dropped before a vertical bar. Thus the grammar rules
A : B C D ;
A : E F ;
A : G ;
A : B C D
| E F
| G
;
It is not necessary that all grammar rules with the same left side appear together in the
grammar rules section, although it makes the input much more readable, and easier to
change.
If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
empty : ;
Names representing tokens must be declared; this is most simply done by writing
in the declarations section. (See Sections 3 , 5, and 6 for much more discussion). Every name
not defined in the declarations section is assumed to represent a nonterminal symbol. Every
nonterminal symbol must appear on the left side of at least one rule.
Of all the nonterminal symbols, one, called the start symbol, has particular importance. The
parser is designed to recognize the start symbol; thus, this symbol represents the largest,
most general structure described by the grammar rules. By default, the start symbol is taken
to be the left hand side of the first grammar rule in the rules section. It is possible, and in fact
desirable, to declare the start symbol explicitly in the declarations section using the %start
keyword:
%start symbol
The end of the input to the parser is signaled by a special token, called the endmarker. If the
tokens up to, but not including, the endmarker form a structure which matches the start
symbol, the parser function returns to its caller after the endmarker is seen; it accepts the
input. If the endmarker is seen in any other context, it is an error.
It is the job of the user-supplied lexical analyzer to return the endmarker when appropriate;
see section 3, below. Usually the endmarker represents some reasonably obvious I/O status,
such as ``end-of-file'' or ``end-of-record''.
2: Actions
performed each time the rule is recognized in the input process. These actions may return
values, and may obtain the values returned by previous actions. Moreover, the lexical
analyzer can return values for tokens, if desired.
An action is an arbitrary C statement, and as such can do input and output, call subprograms,
and alter external vectors and variables. An action is specified by one or more statements,
enclosed in curly braces ``{'' and ``}''. For example,
A : '(' B ')'
{ hello( 1, "abc" ); }
and
Source Code:
LEX
%{
#include"y.tab.h"
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext); return NUMBER;}
[a-zA-Z]+ {return ID;}
[\t]+ ;
\n {return 0;}
. {return yytext[0];}
%%
YACC
%{
#include<stdio.h>
%}
%token NUMBER ID
%left '+' '-'
%left '*' '/'
%%
expr: expr '+' expr
|expr '-' expr
|expr '*' expr
|expr '/' expr
|'-'NUMBER
|'-'ID
|'('expr')'
|NUMBER
|ID
;
%%
main()
{
printf("Enter the expression\n");
yyparse();
printf("\nExpression is valid\n");
exit(0);
}
int yyerror(char *s)
{
printf("\nExpression is invalid");
exit(0);
}
OUTPUT:
$lex p4a.l
$yacc –d p4a.y
$cc lex.yy.c y.tab.c –ll
$./a.out
Enter the expression
(a*b+5)
Expression is valid
$./a.out
Enter the expression
(a+6-)
Expression is invalid
Program #15
Yacc turns the specification file into a C program, which parses the input according to
the specification given. The algorithm used to go from the specification to the parser is
complex, and will not be discussed here (see the references for more information). The
parser itself, however, is relatively simple, and understanding how it works, while not
strictly necessary, will nevertheless make treatment of error recovery and ambiguities
much more comprehensible.
Source Code:
LEX
%{
#include"y.tab.h"
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext); return DIGIT;}
[a-zA-Z]+ {return LETTER;}
[\t] ;
\n return 0;
. {return yytext[0];}
%%
YACC
%{
#include<stdio.h>
%}
%token LETTER DIGIT
%%
variable: LETTER|LETTER rest
;
rest: LETTER rest
|DIGIT rest
|LETTER
|DIGIT
;
%%
Lab Manual of Compiler Design Page 68
Chameli Devi School Of Engineering, Indore
main()
{
yyparse();
printf("The string is a valid variable\n");
}
int yyerror(char *s)
{
printf("this is not a valid variable\n");
exit(0);
}
OUTPUT:
$lex p4b.l
$yacc –d p4b.y
$cc lex.yy.c y.tab.c –ll
$./a.out
input34
The string is a valid variable
$./a.out
89file
This is not a valid variable
Program #16
The heart of the input specification is a collection of grammar rules. Each rule describes an
allowable structure and gives it a name. For example, one grammar rule might be
Here, date, month_name, day, and year represent structures of interest in the input process;
presumably, month_name, day, and year are defined elsewhere. The comma ``,'' is enclosed
in single quotes; this implies that the comma is to appear literally in the input. The colon and
semicolon merely serve as punctuation in the rule, and have no significance in controlling the
input. Thus, with proper definitions, the input
July 4, 1776
An important part of the input process is carried out by the lexical analyzer. This user routine
reads the input stream, recognizing the lower level structures, and communicates these
tokens to the parser. For historical reasons, a structure recognized by the lexical analyzer is
called a terminal symbol, while the structure recognized by the parser is called a nonterminal
symbol. To avoid confusion, terminal symbols will usually be referred to as tokens.
There is considerable leeway in deciding whether to recognize structures using the lexical
analyzer or grammar rules. For example, the rules
...
might be used in the above example. The lexical analyzer would only need to recognize
individual letters, and month_name would be a nonterminal symbol. Such low-level rules tend
to waste time and space, and may complicate the specification beyond Yacc's ability to deal
with it. Usually, the lexical analyzer would recognize the month names, and return an
indication that a month_name was seen; in this case, month_name would be a token.
Literal characters such as ``,'' must also be passed through the lexical analyzer, and are also
considered tokens.
Specification files are very flexible. It is realively easy to add to the above example the rule
allowing
7 / 4 / 1776
as a synonym for
July 4, 1776
In most cases, this new rule could be ``slipped in'' to a working system with minimal effort,
and little danger of disrupting existing input.
The input being read may not conform to the specifications. These input errors are detected
as early as is theoretically possible with a left-to-right scan; thus, not only is the chance of
reading and computing with bad input data substantially reduced, but the bad data can usually
be quickly found. Error handling, provided as part of the input specifications, permits the
reentry of bad data, or the continuation of the input process after skipping over the bad data.
In some cases, Yacc fails to produce a parser when given a set of specifications. For
example, the specifications may be self contradictory, or they may require a more powerful
recognition mechanism than that available to Yacc. The former cases represent design errors;
the latter cases can often be corrected by making the lexical analyzer more powerful, or by
rewriting some of the grammar rules. While Yacc cannot handle all possible specifications, its
power compares favorably with similar systems; moreover, the constructions which are
difficult for Yacc to handle are also frequently difficult for human beings to handle. Some
users have reported that the discipline of formulating valid Yacc specifications for their input
revealed errors of conception or design early in the program development.
Source Code:
LEX
%{
#include"y.tab.h"
%}
%%
[a] return A;
[b] return B;
%%
YACC
%{
#include<stdio.h>
%}
%token A B
%%
S:A S B
|
;
%%
main()
{
printf("Enter the string\n");
if(yyparse()==0)
{
printf("Valid\n");
}
}
yyerror(char *s)
{
printf("%s\n",s);
}
OUTPUT:
$lex p5b.l
$yacc –d p5b.y
$cc lex.yy.c y.tab.c –ll
$./a.out
Enter the string
aabb
[Ctrl-d]
Valid
$./a.out
Enter the string
aab
syntax error
Program #17
Program to recognize the Context free grammar (anbn, n>= 10), Where a & b
are input symbols of the grammar.
• a set of terminal symbols, which are the characters of the alphabet that appear in the strings
generated by the grammar.
• a set of nonterminal symbols, which are placeholders for patterns of terminal symbols that can
be generated by the nonterminal symbols.
• a set of productions, which are rules for replacing (or rewriting) nonterminal symbols (on the
left side of the production) in a string with other nonterminal or terminal symbols (on the right
side of the production).
• a start symbol, which is a special nonterminal symbol that appears in the initial string generated
by the grammar.
• Apply one of the productions with the start symbol on the left hand size, replacing the start
symbol with the right hand side of the production;
• Repeat the process of selecting nonterminal symbols in the string, and replacing them with the
right hand side of some corresponding production, until all nonterminals have been replaced by
terminal symbols.
1. Applying at most one production (starting with the start symbol) we can generate {wcd<S>, b<L>e,
s}. Only one of these strings consists entirely of terminal symbols, so the set of terminal strings we can
generate using at most one production is {s}.
2. Applying at most two productions, we can generate all the strings we can generate with one
production, plus any additional strings we can generate with an additional production.
The set of terminal strings we can generate with at most two productions is therefore {s, wcds}.
We can repeat this process for an arbitrary number of steps N, and find all the strings the grammar can
generate by applying N productions.
Source Code:
LEX
%{
#include"y.tab.h"
%}
%%
[a] return A;
[b] return B;
%%
YACC
%{
#include<stdio.h>
%}
%token A B
%%
stat:exp B
;
exp:A A A A A A A A A exp1
;
exp1:A exp2
|A
|A A exp2
|A A A exp2
|A A A A exp2
;
exp2:A
;
%%
main()
{
printf("Enter the string\n");
if(yyparse()==0)
{
printf("Valid\n");
}
}
yyerror(char *s)
{
printf("error\n");
OUTPUT:
$lex p6.l
$yacc –d p6.y
$cc lex.yy.c y.tab.c –ll
$./a.out
Enter the string
aaaaaaaaaaab
Valid
$./a.out
Enter the string
aab
error
Program #18
/* Input to the program is assumed to be syntactically correct. The expression of ‘if’ statement,
for true condition and statement for false condition are enclosed in parenthesis */
Some programming languages permit the user to use words like ``if'', which are normally
reserved, as label or variable names, provided that such use does not conflict with the legal
use of these names in the programming language. This is extremely hard to do in the
framework of Yacc; it is difficult to pass information to the lexical analyzer telling it ``this
instance of `if' is a keyword, and that instance is a variable''. The user can make a stab at it,
using the mechanism described in the last subsection, but it is difficult.
A number of ways of making this easier are under advisement. Until then, it is better that the
keywords be reserved; that is, be forbidden for use as variable names. There are powerful
stylistic reasons for preferring this, anyway.
The parsing actions of error and accept can be simulated in an action by use of macros
YYACCEPT and YYERROR. YYACCEPT causes yyparse to return the value 0; YYERROR
causes the parser to behave as if the current input symbol had been a syntax error; yyerror is
called, and error recovery takes place. These mechanisms can be used to simulate parsers
with multiple endmarkers or context-sensitive syntax checking.
An action may refer to values returned by actions to the left of the current rule. The
mechanism is simply the same as with ordinary actions, a dollar sign followed by a digit, but in
this case the digit may be 0 or negative. Consider
noun : DOG
{ $$ = DOG; }
| CRONE
{ if( $0 == YOUNG ){
printf( "what?\n" );
}
$$ = CRONE;
}
;
...
In the action following the word CRONE, a check is made that the preceding token shifted
was not YOUNG. Obviously, this is only possible when a great deal is known about what
might precede the symbol noun in the input. There is also a distinctly unstructured flavor
about this. Nevertheless, at times this mechanism will save a great deal of trouble, especially
when a few combinations are to be excluded from an otherwise regular structure.
Source Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int parsecondition(char[],int,char*,int);
void gen(char [],char [],char[],int);
int main()
{
int counter = 0,stlen =0,elseflag=0;
char stmt[60]; // contains the input statement
char strB[54]; // holds the expression for 'if'
condition
char strS1[50]; // holds the statement for true
condition
char strS2[45]; // holds the statement for false
condition
counter++;
counter = counter + 3; // increment over 'then'
counter = parsecondition(stmt,counter,strS1,stlen);
if(stmt[counter+1]==';')
{ //reached end of statement, generate the output
printf("\n Parsing the input statement....");
gen(strB,strS1,strS2,elseflag);
return 0;
}
if(stmt[counter]==')')
counter++; // increment over ')'
counter = counter + 3; // increment over 'else'
counter = parsecondition(stmt,counter,strS2,stlen);
counter = counter + 2; // move to the end of
statement
if(counter == stlen)
{ //generate the output
elseflag = 1;
printf("\n Parsing the input statement....");
gen(strB,strS1,strS2,elseflag);
return 0;
}
return 0;
}
/* Function : parsecondition
Description : This function parses the statement
from the given index to get the statement enclosed
in ()
Input : Statement, index to begin search, string
to store the condition, total string length
Output : Returns 0 on failure, Non zero counter
value on success
*/
int parsecondition(char input[],int cntr,char
*dest,int totallen)
{
int index = 0,pos = 0;
while(input[cntr]!= '(' && cntr <= totallen)
cntr++;
if(cntr >= totallen)
return 0;
index = cntr;
while (input[cntr]!=')')
cntr++;
if(cntr >= totallen)
return 0;
while(index<=cntr)
dest[pos++] = input[index++];
dest[pos]='\0'; //null terminate the string
return cntr; //non zero value
}
/* Function : gen ()
Description : This function generates three
address code
Input : Expression, statement for true condition,
statement for false condition, flag to denote if
the 'else' part is present in the statement
output :Three address code
*/
void gen(char B[],char S1[],char S2[],int elsepart)
{
int Bt =101,Bf = 102,Sn =103;
printf("\n\tIf %s goto %d",B,Bt);
printf("\n\tgoto %d",Bf);
printf("\n%d: ",Bt);
printf("%s",S1);
if(!elsepart)
printf("\n%d: ",Bf);
else
{ printf("\n\tgoto %d",Sn);
printf("\n%d: %s",Bf,S2);
printf("\n%d:",Sn);
}
}
OUTPUT