0% found this document useful (0 votes)

158 views

Compiler Lab Manual PDF

This document contains information about the Compiler Design course offered at Chameli Devi School of Engineering, Indore in the year 2013-2014. It includes details like the scope of the course, disciplines involved, terminology used, why study compilers, applications of compiler technology and tools, and a list of 16 practical assignments for students. The course was taught by Mr. Ajay Jaiswal to semester 7 Computer Science students.

Uploaded by

Anonymous shXs4coZfy

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

158 views

Compiler Lab Manual PDF

Uploaded by

Anonymous shXs4coZfy

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 82

Chameli Devi School Of Engineering, Indore

CDGI'S

CHAMELIDEVI SCHOOL OF ENGINEERING, INDORE.

DEPARTMENT OF COMPUTER SCIENCE

COURSE FILE CONTENT

Year Class/Sem Branch Subject Faculty Name

2013-14 Sem-VII CSE Compiler Mr. Ajay Jaiswal
CS-A &B Design

Content
1.
Scope of the course

2.
Disciplines involved in it

3.
Abstract view for a compiler

4.
Front-end and back-end tasks

5.
Modules

6.
List of Practicals

7.
LINUX O/S

8.
C++ / JAVA program backup

Lab Manual of Compiler Design Page 1

Chameli Devi School Of Engineering, Indore

CHAMELI DEVI GROUP OF INSTITUTES, INDORE.

Chameli Devi School Of Engineering.

Department of Computer Science & Information Technology

Compiler Design Laboratory

Compiler Design [CS-701] Practical

( Year 2013-2014)

Name: ________________________________________

Roll No.: _______________________________________

Branch: _______________________________________

Semester:_______________________________________

Section: ________________________________________

Subject: _______________________________________

Certified by:

Total Practical :
Practicals performed:

Faculty Name/Signature

Lab Manual of Compiler Design Page 2

Chameli Devi School Of Engineering, Indore

CHAMELI DEVI GROUP OF INSTITUTES, INDORE

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY

COMPILER DESIGN LABORATORY (CS-701)

PRACTICAL LIST

S.NO Name of Practical

1.
Implement a Program to count character of a given string without using
space & with using space for the string “a handle of a string is a substring
that matches the right side of a production rule. ”.

2.
Create a file (Comiler.cc) & Implement a Program to read all the content of a
Compiler.cc (how many lines, how many words and how many character in
the file) .

3.
Write a program for implementation of Deterministic Finite Automata (DFA)
for the strings accepted by (“abbb”, “abb”, “ab”,”a”).

4.
Construction of Minimization of Deterministic Finite Automata for the given
diagram & recognize the string (aa + b)*ab(bb)*.

5.
Construct a program for how to Compute FIRST () & FOLLOW () symbol for
LL(1 ) grammar, if the Context free grammar for LL(1) Construction is……..?

6.
Construct a Operator Precedence Parser for the following given grammar
and also compute Leading () and trailing () symbols of the given grammar.

7.
Program using LEX to count the number of characters, words, spaces and
lines in a given input file.

8.
Program using LEX to count the numbers of comment lines in a given C
program. Also eliminate them and copy the resulting program into separate
file.

Lab Manual of Compiler Design Page 3

Chameli Devi School Of Engineering, Indore

9.
Program using LEX to recognize a valid arithmetic expression and to
recognize the identifiers and operators present. Print them separately.

10.
Program using LEX to recognize whether a given sentence is simple or
compound.

11.
Program using LEX to recognize and count the number of identifiers in a
given input file.

12.
Implement a YACC (Yet Another Compiler Compiler ) program to recognize a
valid arithmetic expression that uses operators +, -, * and /.

13.
Implement YACC (Yet Another Compiler Compiler ) program to recognize a
valid variable, which starts with a letter, followed by any number of letters or
digits.

14.
YACC (Yet Another Compiler Compiler ) program to recognize strings
‘aaab’, ‘abbb’, ‘ab’ and ‘a’ using the grammar (anbn, n>= 0).

15.
Program to recognize the Context free grammar (anbn, n>= 10), Where a & b
are input symbols of the grammar.

16.
Write a C program to implement the syntax-directed definition of “if E then
S1” and “if E then S1 else S2”.

Lab Manual of Compiler Design Page 4

Chameli Devi School Of Engineering, Indore

CHAMELI DEVI GROUP OF INSTITUTES, INDORE

Chameli Devi School Of Engineering

Department of Computer Science & Information Technology

Basic Computer Engineering Laboratory

Practical List
S.No. Practical Date of Experiment Date of Submission Signature & Remarks

1. Practical 1
2. Practical 2
3. Practical 3
4. Practical 4
5. Practical 5
6. Practical 6
7. Practical 7
8. Practical 8
9. Practical 9
10. Practical 10
11. Practical 11
12. Practical 12
13. Practical 13
14. Practical 14
15. Practical 15
16. Practical 16
17. Practical 17
18. Practical 18
19. Practical 19
20. Practical 20
21. Practical 21
22. Practical 22
23. Practical 23
24. Practical 24
25. Practical 25

Head of Department Faculty

Lab Manual of Compiler Design Page 5

Chameli Devi School Of Engineering, Indore

LAB MANUAL

SUBJECT NAME--------------

SUBJECT CODE------------------

CLASS------------------------------

SEMESTER-------------------------

FACULTY NAME / SIGNATURE-------------------

FACULTY NAME / SIGNATURE--------------------

Lab Manual of Compiler Design Page 6

Chameli Devi School Of Engineering, Indore

Course scope
Aim:
To learn techniques of a modern compiler

Main reference:
Compilers – Principles, Techniques and Tools, Second Edition by Alfred V. Aho, Ravi Sethi,
Jeffery D. Ullman

Supplementary references:
nd
Modern compiler construction in Java 2 edition Advanced Compiler Design and
Implementation by Muchnick.
Subjects
Lexical analysis (Scanning)
Syntax Analysis (Parsing)
Syntax Directed Translation
Intermediate Code Generation
Run-time environments
Code Generation
Machine Independent Optimization
Compiler learning
Isn’t it an old discipline?
Yes, it is a well-established discipline
Algorithms, methods and techniques are researched and developed in early stages of
computer science growth
There are many compilers around and many tools to generate them automatically
So, why we need to learn it?
Although you may never write a full compiler
But the techniques we learn is useful in many tasks like writing an interpreter for a scripting
language, validation checking for forms and so on.

Lab Manual of Compiler Design Page 7

Chameli Devi School Of Engineering, Indore

Terminology
Compiler:
a program that translates an executable program in one language into an executable program
in another language. we expect the program produced by the compiler to be better, in some
way, than the original

Interpreter:
a program that reads an executable program and produces the results of running that
program. usually, this involves executing the source program in some fashion. Our course is
mainly about compilers but many of the same issues arise in interpreters.

Disciplines involved
 Algorithms
 Languages and machines
 Operating systems
 Computer architectures
Why Study Compilers?
General background information for good software engineer.
Increases understanding of language semantics.
Seeing the machine code generated for language constructs helps understand performance
issues for languages.
Teaches good language design.
New devices may need device-specific languages.
New business fields may need domain-specific languages.

Applications of Compiler Technology & Tools

 Processing XML/other to generate documents, code, etc.

 Processing domain-specific and device-specific languages.
 Implementing a server that uses a protocol such as http or imap
 Natural language processing, for example, spam filter, search, document
comprehension, summary generation
 Translating from a hardware description language to the schematic of a circuit

Lab Manual of Compiler Design Page 8

Chameli Devi School Of Engineering, Indore

 Automatic graph layout (graphviz, for example)

 Extending an existing programming language
 Program analysis and improvement tools

Abstract view
Compilers translate from a source language (typically a high level language) to a functionally
equivalent target language (typically the machine code of a particular machine or a machine-
independent virtual machine).
Compilers for high level programming languages are among the larger and more complex
pieces of software
• Original languages included Fortran and Cobol
• Often multi-pass compilers (to facilitate memory reuse)
• Compiler development helped in better programming language design
• Early development focused on syntactic analysis and optimization
• Commercially, compilers are developed by very large software groups
• Current focus is on optimization and smart use of resources for modern RISC (reduced
instruction set computer) architectures.

Source
code Compiler

errors Mach ine

code

Recognizes legal (and illegal) programs

• Generate correct code
• Manage storage of all variables and code
• Agreement on format for object (or assembly) code

Lab Manual of Compiler Design Page 9

Chameli Devi School Of Engineering, Indore

Principles of Compiler Design Syllabus

Introduction to Compiler:
Translator issues, why to write compiler, compilation process in brief, front end and backend
model, compiler construction tools, Interpreter and the related issues, Cross compiler,
Incremental compiler, Boot strapping.

1. Lexical Analysis

Review of lexical analysis: alphabet, token, lexical error, Block schematic of lexical
analyser, Automatic construction of lexical analyser (LEX), LEX specification details.
2. Syntax Analysis
Introduction: Role of parsers, Parsing technique: Top down-RD parser, Predictive LL
(k) parser, Bottom up-shift-Reduce, SLR, LR(k), LALR etc. using ambiguous grammars,
Error detection and recovery, Automatic construction of parser (YACC), YACC
specifications.
semantic analysis
Need of semantic analysis, type checking and type conversation.
3. Syntax directed translation
Syntax directed definitions, construction of syntax trees, bottom-up evaluation of S-
attribute definition, L-attributed definition , Top-down translation, Bottom-up evaluation of
inherited attributes.
Intermediate code Generation: Intermediate code generation for declaration,
assignment, iterative statements, case statements, arrays, structures, conditional
statements, Boolean expressions, procedure calls, Intermediate code Generation using
YACC
4. Run Time Storage Organisation
Storage allocation strategies, static, dynamic storage allocation, allocation strategies for
block structured and non-block structured languages; O.S. support required for IO
statements. (e.g. printf, scanf) and memory allocation deallocation related statement.
(e.g. new, malloc)

5. Code GenerationIntroduction: Issues in code generation, Target machine description,

Basic blocks and flow graphs, next use representation of basic blocks, Peephole optimisation,

Lab Manual of Compiler Design Page 10

Chameli Devi School Of Engineering, Indore

DAG generating code from a DAG, Dynamic programming, Code generator-generator

concept.
6. Code OptimisationIntroduction, classification of Optimisation, principle sources of
Optimisation, m/c dependent Optimisation, m/c independent optimisation, Optimisation
of basic blocks, loops in flowgraphs, Optimising transformation: compile time evaluation,
Common sub-expression elimination, variable propagation, code Movement, strength
reduction, dead code elimination and loop optimisation, local optimisation, DAG based
local optimisation. Global optimisation: control and data flow analysis, control flow
analysis-concepts and definition, data flow analysis, data flow analysis, Computing data
flow information, meet over paths,Data flow equations. Iterative data flow analysis:
Available exprns, live range identification.

Definition

A compiler is a computer program (or set of programs) that transforms source code written in
a programming language (the source language) into another computer language (the target
language, often having a binary form known as object code).

The Analysis-Synthesis Model of Compilation

There are two parts to compilation:
Analysis determines the operations implied by the source program which are recorded in a
tree structure
Synthesis takes the tree structure and translates the operations therein into the target
program

Other Tools that Use the Analysis-Synthesis Model

 Editors (syntax highlighting)
 Pretty printers (e.g. Doxygen)
 Static checkers (e.g. Lint and Splint)

Interpreters
• Text formatters (e.g. TeX and LaTeX)
• Silicon compilers (e.g. VHDL)
• Query interpreters/compilers (Databases)

Lab Manual of Compiler Design Page 11

Chameli Devi School Of Engineering, Indore

Grouping of phases

Incremental compiler
The term incremental compiler may refer to two different types of compiler.

• Imperative programming
• Interactive Programming
In imperative programming and software development, an incremental compiler is one that
when invoked, takes only the changes of a known set of source files and updates any
corresponding output files (in the compiler's target language, often bytecode) that may
already exist from previous compilations. By effectively building upon previously compiled
output files, the incremental compiler avoids the wasteful recompilation entire source files,

Lab Manual of Compiler Design Page 12

Chameli Devi School Of Engineering, Indore

where most of the code remains unchanged. For most incremental compilers, compiling a
program with small changes to its source code is usually near instantaneous. It can be said
that an incremental compiler reduces the granularity of a language's traditional compilation
units while maintaining the language's semantics, such that the compiler can append and
replace smaller parts.

Cross compiler
A cross compiler is a compiler capable of creating executable code for a platform other than
the one on which the compiler is run.
Cross compiler tools are used to generate executables for embedded system or multiple
platforms.
It is used to compile for a platform upon which it is not feasible to do the compiling, like micro
controllers that don't support an operating system.

Phases of a Compiler

Source Program

1
Lexical Analyzer

2
Syntax Analyzer

3
Semantic Analyzer

Symbol-table Error Handler

Manager
4 Intermediate Code
Generator

5
Code Optimizer

6
Code Generator

Target Program

Lab Manual of Compiler Design Page 13

Chameli Devi School Of Engineering, Indore

Program #01

1. Implement a Program to count character of a given string without

using space & with using space for the string “a handle of a string is a
substring that matches the right side of a production rule. ”/*

lexical analysis (scanning)

token stream 1 3 2 4 1 5 1 token number

ident assign number times ident plus ident
"val" - 10 - "val" - "i" token value

syntax analysis (parsing)

Statement
syntax tree
Expression

Term

ident = number * ident + ident

Lexical Analysis
Stream of characters is grouped into tokens
Examples of tokens are identifiers, reserved words, integers, doubles or floats, delimiters,
operators and special symbols int a; a = a + 2; int reserved word a
identifier ; special symbol a identifier = operator a
identifier + operator 2 integer constant ; special symbol

Examples of Token
Token: A sequence of characters to be treated as a
single unit.
• Examples of tokens.
– Reserved words (e.g. begin, end, struct, if etc.)
– Keywords (integer, true etc.)
– Operators (+, &&, ++ etc)
– Identifiers (variable names, procedure names, parameter names)
– Literal constants (numeric, string, character constants etc.)
– Punctuation marks (:, , etc.)

Lab Manual of Compiler Design Page 14

Chameli Devi School Of Engineering, Indore

A ./* string with using space.*/

Source Code:
#include<iostream.h>
void main()
{
char c[30];
int n=0;
cout<<"Enter the String" <<"\n";
cin>>c;
for(int i=0;c[i]!='\0';i++)
{
n=n+1;
}
cout<<"Length of the string is"<<n;
getch();
}

Input:

Type any strings with combinations of letters

Output:

Total No. of letters/ characters of given string.

Lab Manual of Compiler Design Page 15

Chameli Devi School Of Engineering, Indore

B ./* string without using space.*/

Source Code:
#include <iostream>
using namespace std;
#include <conio.h>
#include <iomanip>
# include <string>

const int SIZE = 100;

void input (char*);
void wordCount (char*);
void longWord (char*, int);
void numbers (char*);
void outputLetterCounts (int letterCount[]);

int main()
{
char string [100] = {'/0'};
char word[100] = {'/0'};
int x = 0; int n = 0;
int letterCount[100] = {0};

input(string);
wordCount(string);
longWord(string, x);
numbers (string);
outputLetterCounts(letterCount);
return 0;
}
void input (char *enter)
{
cout<<"Enter sentence(s) "<< std::endl;
std::cin.getline(enter, SIZE);
int len = strlen( enter );

}
void wordCount (char *word2)
{
int cnt = 0;

while(*word2 != '\0')
{
while(isspace(*word2))
{
++word2;
}

Lab Manual of Compiler Design Page 16

Chameli Devi School Of Engineering, Indore

if(*word2 != '\0')
{
++cnt;
while(!isspace(*word2) && *word2 != '\0')
++word2;
}
}
std::cout<<"Number of words: "<<cnt<<endl;

}
void outputLetterCounts(int letterCount[])
{
for (int l = 0; l < 26; l++)
{
if (letterCount[l] > 0)
{
cout << letterCount[l] << " " << char('a' + l) << endl;
}
}
}

void numbers (char *word2)

{

int num = 0;
int amount = strlen(word2);

for(int i = 0; i < amount; i++)

{
if(isdigit(word2[i]))
num++;

}
std::cout<<"Digits: "<<num<<endl;

}
void longWord (char *temp, int x )
{
int counter = 0;
int max_word = -1;

int length = int(strlen(temp));

for(int i=0; i<length; i++)

{
if(temp[i] !=' ')
{
counter++;
}
else if(temp[i]==' ')

Lab Manual of Compiler Design Page 17

Chameli Devi School Of Engineering, Indore

{
if(counter > max_word)
{
max_word = counter;
}
counter = 0;
}

}
std::cout <<"Longest word:" << max_word;

Input:

Type any strings with combinations of letters / characters of sentence..

Output:

Total No. of letters/ characters of given string without spaces (Excluding white spaces.).

Lab Manual of Compiler Design Page 18

Chameli Devi School Of Engineering, Indore

Program #02

/* Create a file (Comiler.cc) & Implement a Program to read all the content
of a Compiler.cc (how many lines, how many words and how many
character in the file) .*/

Source Code:
#include<stdio.h>
int main()

{
int noc=0,now=0,nol=0;
FILE *fw,*fr;
char fname[20],ch;
printf("\n enter the source file name’ Comiler.cc’");
gets(fname);
fr=fopen(fname,"r");
if(fr==NULL)
{
printf("\n error \n");
exit(0);
}
ch=fgetc(fr);
while(ch!=EOF)
{
noc++;
if(ch==' ');
now++;
if(ch=='\n')
{
nol++;
now++;
}
ch=fgetc(fr);
}
fclose(fr);
printf("\n total no of character=%d",noc);
printf("\n total no of words=%d",now);
printf("\n total no of lines=%d",nol);
return 0;
}

Lab Manual of Compiler Design Page 19

Chameli Devi School Of Engineering, Indore

#include <fstream>
#include <iostream>
#include <string>
//#include <cctype>

using namespace std;

int main()
{
string name; // name of the file to be opened

char ch; // character to be read into the loop

fstream fileName; // declare the fstream object

// list of integers to hold the count information

int characters = 0,
words = 0,
sentences = 0,
lines = 0;

// list of constants used to define words, lines, and sentences

const char EOLN = '\n', // end of line character
SENT = '.', // end of sentence character
BLANK = ' '; // end of word character

do
{
int count; // just a counter
// Prompt for user input and open the specified file
if (count != 1)
{
cout << "Enter the name of a file: ";
}
else
{
cout << "File Not Found!\nEnter the name of a file: ";
}
cin >> name;
cin.ignore(); // ignore the next character in the buffer
fileName.open(name.c_str()); // convert name to a c-style string

count = 1;

} while (!fileName);

// use a while loop to perform the required operations

while ( !fileName.eof())
{

Lab Manual of Compiler Design Page 20

Chameli Devi School Of Engineering, Indore

char prevChar; // track the last character analyzed

fileName.get(ch); // get each character from the file

cout << ch; // and print it to the screen

characters ++; // count the characters in the file

// count the words in the file

if ((ch == BLANK) && (prevChar != BLANK))

{
words ++;
}

if ( ch == SENT ) // count the sentences in the file

{
sentences ++;
} // end of the sentence if

if ( ch == EOLN ) // count the lines in the file

{
lines ++;
words ++; // count the next word here
} // end of end-of-line if

prevChar = ch;

} // end of while loop

fileName.clear(); // clear the fail state

fileName.close(); // close the file

// display a summary of the file analysis

cout << "\nThere are " << characters << " characters in this file.\n";
cout << "There are " << words << " words in this file.\n";
cout << "There are " << sentences << " sentences in this file.\n";
cout << "There are " << lines << " lines in this file.\n";

return 0;
}

Lab Manual of Compiler Design Page 21

Chameli Devi School Of Engineering, Indore

Input:

Type any strings with combinations of letters / characters of sentence..Construct automata.

Output:

Sequences of transition states with accepting states.

Lab Manual of Compiler Design Page 22

Chameli Devi School Of Engineering, Indore

Program #03

/*Write a program for implementation of Deterministic Finite Automata

(DFA) for the strings accepted by (“abbb”, “abb”, “ab”,”a”).*/

Deterministic finite automata (DFA) :

A deterministic finite automaton (DFA) is a 5-tuple: (S, Σ, T, s, A)

· an alphabet (Σ)

· a set of states (S)

· a transition function (T : S × Σ → S).

· a start state

· a set of accept states

The machine starts in the start state and reads in a string of symbols from its alphabet. It
uses the transition function T to determine the next state using the current state and the
symbol just read. If, when it has finished reading, it is in an accepting state, it is said to accept
the string, otherwise it is said to reject the string. The set of strings it accepts form a
language, which is the language the DFA recognizes.

Non-Deterministic Finite Automaton (N-DFA):

A Non-Deterministic Finite Automaton (NFA) is a 5-tuple: (S, Σ, T, s, A)

· an alphabet (Σ)

· a set of states (S)

· a transition function (T: S × Σ → S).

· a start state

· a set of accept states

Where P(S) is the power set of S and ε is the empty string. The machine starts in the start
state and reads in a string of symbols from its alphabet. It uses the transition relation T to
determine the next state(s) using the current state and the symbol just read or the empty
string. If, when it has finished reading, it is in an accepting state, it is said to accept the string,
otherwise it is said to reject the string. The set of strings it accepts form a language, which is

Lab Manual of Compiler Design Page 23

Chameli Devi School Of Engineering, Indore

the language the NFA recognizes.

Source Code:
#include<stdio.h>
#include<iostream.h>
#include<stdlib.h>
#include<conio.h>
void main()
{int n,m,start,nf,ps;
char str[20];
clrscr();
printf("enter no of states");
scanf("%d",&n);
printf("enter no of inputs");
scanf("%d",&m);
//constructing buffers
int **tran=new int* [n];
for(int i=0;i<n;i++)
{
tran[i]=new int[m];
}
for(i=0;i<n;i++)
{
for(int j=0;j<m;j++)
{
printf("enter next state for present state %d on input%d ",i,j);
scanf("%d",&tran[i][j]);
}
}
printf("enter starting state");
scanf("%d",&start);

printf("enter no of final states");

scanf("%d",&nf);
int* final=new int[nf];
for(i=0;i<nf;i++)
{
printf("enter the state");
scanf("%d",&final[i]);
}

printf("enter string");
scanf("%s",str);

i=0;
ps=start;

Lab Manual of Compiler Design Page 24

Chameli Devi School Of Engineering, Indore

while(str[i]!='\0')
{
ps=tran[ps][str[i]-48];
i++;
}
for(i=0;i<n;i++)
{
if(ps==final[i])
{
printf("accepted");
break;
}
}
//deleting buffer
delete final;
for(i=0;i<n;i++)
{
delete tran[i];
}
delete tran;
}

Lab Manual of Compiler Design Page 25

Chameli Devi School Of Engineering, Indore

Program #04

/*Construction of minimization of Deterministic Finite Automata

for the given diagram & recognize the string (aa + b)*ab(bb)*. */
Minimizing Finite Automata
Consider the finite automaton shown in figure 1 which accepts the regular set denoted by the
regular expression (aa + b)*ab(bb)*. Accepting states are colored yellow while rejecting states
are blue.

Figure 1 - Recognizer for (aa + b)ab(bb)

Closer examination reveals that states s2 and s7 are really the same since they are both
accepting states and both go to s6 under the input b and both go to s3 under an a. So, why
not merge them and form a smaller machine? In the same manner, we could argue for
merging states s0 and s5. Merging states like this should produce a smaller automaton that
accomplishes exactly the same task as our original one.

From these observations, it seems that the key to making finite automata smaller is to
recognize and merge equivalent states. To do this, we must agree upon the definition of
equivalent states. Here is one formulation of what Moore defined as indistinguishable states.

Definition. Two states in a finite automaton M are equivalent if and only if for
every string x, if M is started in either state with x as input, it either accepts in
both cases or rejects in both cases.

Another way to say this is that the machine does the same thing when started in either state.
This is especially necessary when finite automata produce output.

Lab Manual of Compiler Design Page 26

Chameli Devi School Of Engineering, Indore

Two questions remain. First, how does one find equivalent states, and then, exactly how
valuable is this information? We shall answer the second question first by providing a
corollary to a famous theorem proven long ago by Myhill [3] and Nerode [4].

Corollary. For a deterministic finite automaton M, the minimum number of

states in any equivalent deterministic finite automaton is the same as the
number of equivalence classes of M's states.

With one more observation, we shall be able to present an algorithm for transforming an
automaton into its smallest equivalent machine.

Fact. Equivalent states go to equivalent states under all inputs.

Now we know that if we can find the equivalence classes (or groups of equivalent states) for
an automaton, then we can use these as the states of the smallest equivalent machine. The
machine shown in figure 1 will be used as an example for the intuitive discussion that follows.

Let us first divide the machine's states into two groups: accepting and rejecting states. These
groups are: A = {s2, s7} and B = {s0, s1, s3, s4, s5, s6}. Note that these are equivalent under
the empty string as input.

Then, let us find out if the states in these groups go to the same group under inputs a and b.
As we noted at the beginning of this discussion, the states of group A both go to states in
group B under both inputs. Things are different for the states of group B. The following table
shows the result of applying the inputs to these states. (For example, the input a leads from
s1 to s5 in group B and input b leads to to s2 in group A.)

in state: s0 s1 s3 s4 s5 s6

a leads to: B B B B B B

b leads to: B A B B B A

Looking at the table we find that the input b helps us distinguish between two of the states (s1
and s6) and the rest of the states in the group since it leads to group A for these two instead
of group B. Thus the states in the set {s0, s3, s4, s5} cannot be equivalent to those in the set
{s1, s6} and we must partition B into two groups. Now we have the groups:

A = {s2, s7}, B = { s0, s3, s4, s5}, C = { s1, s6}

and the next examination of where the inputs lead shows us that s3 is not equivalent to the
rest of group B. We must partition again.

Lab Manual of Compiler Design Page 27

Chameli Devi School Of Engineering, Indore

Continuing this process until we cannot distinguish between the states in any group by
employing our input tests, we end up with the groups:

A = {s2, s7}, B = {s0, s4, s5}, C = {s1}, D = {s3}, E = { s6}.

In view of the above theoretical definitions and results, it is easy to argue that all of the states
in each group are equivalent because they all go to the same groups under the inputs a and
b. Thus in the sense of Moore the states in each group are truly indistinguishable. We also
can claim that due to the corollary to the Myhill-Nerode theorem, any automaton that accepts
(aa + b)*ab(bb)* must have at least five states.Building the minimum state finite automaton is
now rather straightforward. We merely use the equivalence classes (our groups) as states
and provide the proper transitions. This gives us the finite automaton pictured in figure 2.

Figure 2 - A Minimal Automaton

Here is the state minimization algorithm.

The complexity of this algorithm is O(n2) since we check all of the states each time we
execute the repeat loop and might have to execute the loop n times since it might take an
input of length n to distinguish between two states. A faster algorithm was later developed by
Hopcroft.

Lab Manual of Compiler Design Page 28

Chameli Devi School Of Engineering, Indore

Source Code:
#include<stdio.h>
#include<iostream.h>
#include<string.h>
#include<stdlib.h>
#include<conio.h>
void main()
{
int nstates,minputs,start,nf,ps;
char str[20];
clrscr();
printf("enter no of states");
scanf("%d",&nstates);
printf("enter no of inputs");
scanf("%d",&minputs);
//constructing buffers
int **tran=new int* [nstates];
for(int i=0;i<nstates;i++)
{
tran[i]=new int[minputs];
}
for(i=0;i<nstates;i++)
{
for(int j=0;j<minputs;j++)
{
printf("enter next state for present state %d on input%d ",i,j);
scanf("%d",&tran[i][j]);
}
}
printf("enter starting state");
scanf("%d",&start);

printf("enter no of final states");

scanf("%d",&nf);
int* final=new int[nf];
for(i=0;i<nf;i++)
{
printf("enter the state");
scanf("%d",&final[i]);
}
int *stategroup=new int[nstates],**groupgroup=new int*[nstates];
memset(stategroup,-1,nstates*sizeof(int));

int **groupstate=new int*[nstates];

for(i=0;i<nstates;i++)
{
groupstate[i]=new int[nstates+1];
groupgroup[i]=new int[2];

Lab Manual of Compiler Design Page 29

Chameli Devi School Of Engineering, Indore

memset(groupgroup[i],-1,2*sizeof(int));
memset(groupstate[i],-1,(nstates+1)*sizeof(int));
}

for(i=0;i<nf;i++)
{
stategroup[final[i]]=0;
groupstate[0][final[i]]=1;
}

groupstate[0][nstates]=10;//means a partition having final states

for(i=0;i<nstates;i++)
{
if(stategroup[i]!=0)
{
stategroup[i]=1;
groupstate[1][i]=1;
}
}

groupstate[1][nstates]=100;//means a partition having no final states

int groupcount=2;
int count=0,change;
///////////////////////////////minimization starts here/////////////////////////////////////////////////
do
{
change=0;
for(int j=0;j<minputs;j++)
{
int lastgroup,presentgroup,state,latestgroupcount,maxgroupcount=0;
for(int group=0;group<groupcount;group++)
{ count=0;
while(groupstate[group][count]!=1&&count<nstates)
{count++;
}
groupgroup[group][0]=stategroup[tran[ count ][j] ];
groupgroup[group][1]=groupstate[group][nstates];
}
for( group=0;group<groupcount;group++)
{
latestgroupcount=groupcount;
lastgroup=groupgroup[group][0];
count=0;
while(count<nstates)
{
if((state=groupstate[group][count])==1)
{
presentgroup=stategroup[tran[count][j]];
if(presentgroup!=lastgroup)

Lab Manual of Compiler Design Page 30

Chameli Devi School Of Engineering, Indore

{
change=1;
//find for any group going to presentgroup
int flag=0;
for(int
anygroup=0;anygroup<latestgroupcount;anygroup++)
{
if(groupgroup[anygroup]
[0]==presentgroup
&&groupgroup[anygroup]
[1]==groupgroup[group][1])
{flag=1;
break;
}
}
//change groupgroup
//change stategroup
//change groupstate and groupcount
anygroup=flag==1?
anygroup:latestgroupcount++;
groupgroup[anygroup][0]=presentgroup;
groupgroup[anygroup][1]=groupgroup[group]
[1];

stategroup[count]=anygroup;

groupstate[anygroup][count]=1;
groupstate[group][count]=-1;
groupstate[anygroup]
[nstates]=groupgroup[group][1];
}
}

count++;
}//end of while
if(maxgroupcount<latestgroupcount){maxgroupcount=latestgroupcount;}
}//checking all the groups for loop
groupcount=maxgroupcount;
}//checking all the inputs for loop

}while(change!=0);
/////////////////////////////end of minimization////////////////////////////////////////////////

printf("\n\nGroups\n\n");
for(i=0;i<groupcount;i++)
{
printf("%d ",i);
for(int j=0;j<nstates;j++)

Lab Manual of Compiler Design Page 31

Chameli Devi School Of Engineering, Indore

{
if(groupstate[i][j]!=-1)
printf(" %d ",j);
}
printf("\n");
}
//deleting buffer
delete stategroup;
delete final;
for(i=0;i<nstates;i++)
{
delete tran[i];
delete groupstate[i];
delete groupgroup[i];
}
delete groupgroup;
delete tran;
delete groupstate;
}

Lab Manual of Compiler Design Page 32

Chameli Devi School Of Engineering, Indore

INPUT:

Recognizer for (aa + b)ab(bb)

OUTPUT:

A Minimal Automaton for (aa + b)ab(bb)

Lab Manual of Compiler Design Page 33

Chameli Devi School Of Engineering, Indore

Program #05
/*Construct a program for how to calculate FIRST () & FOLLOW () symbol
for LL(1 ) grammar, if the Context free grammar for LL(1) Construction is

S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@

Compute FIRST () & FOLLOW ( ) symbol for LL(1 ) grammar ? */

The construction of a predictive parser is aided by two functions associated with a grammar
G. These functions, FIRST and FOLLOW, allow us to fill in the entries of a predictive parsing
table for G, whenever possible. Sets of tokens yielded by the FOLLOW function can also be
used as synchronizing tokens during panic-mode error recovery.just suppose for a sec that
you r ll(1) parser and you have n supernatural power of seeing the future of string by one
step.

FIRST(α)

If α is any string of grammar symbols, let FIRST(α) be the set of terminals that begin the
strings derived from α. If α ⇒ ε then ε is also in FIRST(α).

To compute FIRST(X) for all grammar symbols X, apply the following rules until no more
terminals or ε can be added to any FIRST set:

1. If X is terminal, then FIRST(X) is {X}.

2.If X → ε is a production, then add ε to FIRST(X).

3.If X is nonterminal and X →Y1 Y2 … Yk . is a production, then place a in FIRST(X) if for

some i, a is in FIRST(Yi), and ε is in all of FIRST(Y1), … , FIRST(Yi-1); that is, Y1, … ,Yi-1 ⇒
ε. If ε is in FIRST(Yj) for all j = 1, 2, … , k, then add ε to FIRST(X). For example, everything in
FIRST(Y1) is surely in

FIRST(X). If Y1 does not derive ε, then we add nothing more to FIRST(X), but if Y1⇒ ε, then
we add FIRST(Y2) and so on.

Now, we can compute FIRST for any string X1X2 . . . Xn as follows. Add to FIRST(X1X2 …
Xn) all the non-ε symbols of FIRST(X1). Also add the non-ε symbols of FIRST(X2) if ε is in
FIRST(X1), the non-ε symbols of FIRST(X 3) if ε is in both FIRST(X 1) and FIRST(X2), and so
on. Finally, add ε to FIRST(X1X2 … Xn) if, for all i, FIRST(X i) contains ε.

Lab Manual of Compiler Design Page 34

Chameli Devi School Of Engineering, Indore

FOLLOW(A)

Define FOLLOW(A), for nonterminal A, to be the set of terminals a that can appear
immediately to the right of A in some sentential form, that is, the set of terminals a such that
there exists a derivation of the form S⇒αΑaβ for some α and β. Note that there may, at some
time during the derivation, have been symbols between A and a, but if so, they derived ε and
disappeared. If A can be the rightmost symbol in some sentential form, then $, representing
the input right endmarker, is in FOLLOW(A).

To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be
added to any

FOLLOW set:

1.Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker.

2.If there is a production A ⇒ αΒβ, then everything in FIRST(β), except for ε, is placed in
FOLLOW(B).

3.If there is a production A ⇒ αΒ, or a production A ⇒ αΒβ where FIRST(β) contains ε (i.e., β
⇒ε),

then everything in FOLLOW(A) is in FOLLOW(B).

EXAMPLE:

Consider the expression grammar :

E → T E’

E’→ + T E’ | ε

T → F T’

T’→ * F T’ | ε

F → ( E ) | id

Then:

FIRST(E) = FIRST(T) = FIRST(F) = {( , id}

FIRST(E’) = {+, ε}

FIRST(T’) = {*, ε}

FOLLOW(E) = FOLLOW(E’) = {) , $}

Lab Manual of Compiler Design Page 35

Chameli Devi School Of Engineering, Indore

FOLLOW(T) = FOLLOW(T’) = {+, ), $}

FOLLOW(F) = {+, *, ), $

Algorithm:

FIRST :

1. If first character of production is terminal then becomes first.

eg. first(abAb)={a};

2.If a production of this type A->BCD... means all are variable or non-terminal then
if first(B) donot contains null then first(A)=first(B)
stop here.
else
then also check for next non terminal like C here same as above step and
first(A)=First(B)+first(C).
if we get null stop there.
FOLLOW:

if you know first then you can easily go with follow.

1.if a variable is start symbol then follow=$.

2.if a production is of A->(any string1)B(any string2) then follow(B)=first(any string2) -

{null}

3.if a production is of A->(any string)B then follow(B)=follow(A) .

stop
it is very simple, please try to understand!

Lab Manual of Compiler Design Page 36

Chameli Devi School Of Engineering, Indore

Source Code:
#include"stdio.h"
#include<conio.h>
char array[10][20],temp[10];
int c,n;void fun(int,int[]);
int fun2(int i,int j,int p[],int key)
{
int k;
if(!key)
{
for(k=0;kc)return 1;
else return 0;
}
}
void fun(int i,int p[])
{
int j,k,key;
for(j=2;array[i][j]!='';j++)
{
if(array[i][j-1]=='/')
{
if(array[i][j]>='A'&&array[i][j]<='Z')
{
key=0;
fun2(i,j,p,key);
}
else
{
key=1;
if(fun2(i,j,p,key))
temp[++c]=array[i][j];
if(array[i][j]=='@'&&p[0]!=-1)
{ //taking ,@, as null symbol.
if(array[p[0]][p[1]]>='A'&&array[p[0]][p[1]]<='Z')
{
key=0;
fun2(p[0],p[1],p,key);
}
else
if(array[p[0]][p[1]]!='/'&&array[p[0]][p[1]]!='')
{
if(fun2(p[0],p[1],p,key))
temp[++c]=array[p[0]][p[1]];
}
}
}
}
}
}

Lab Manual of Compiler Design Page 37

Chameli Devi School Of Engineering, Indore

void main()
{
int p[2],i,j;
clrscr();
printf("Enter the no. of productions :");
scanf("%d",&n);
printf("Enter the productions :\n");
for(i=0;i<n;i++)
scanf("%s",array[i]);
for(i=0;i<n;i++)
{
c=-1,p[0]=-1,p[1]=-1;
fun(i,p);
printf("First(%c) : [ ",array[i][0]);
for(j=0;j<=c;j++)
printf("%c,",temp[j]);
printf("\b ].\n");
getch();
}
}

INPUT:

S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@

OUTPUT:

Enter the no. of productions :6

Enter the productions :
S/aBDh
B/cC
C/bC/@
D/E/F
E/g/@
F/f/@
First(S) : [ a ].
First(B) : [ c ].
First(C) : [ b,@ ].
First(D) : [ g,@,f ].
First(E) : [ g,@ ].
First(F) : [ f,@ ].

Lab Manual of Compiler Design Page 38

Chameli Devi School Of Engineering, Indore

Program #06

/*Construct a Operator Precedence Parser for the following given

grammar and also compute Leading () and trailing () symbols of
the given grammar. */

Operator-Precedence Parser
Operator grammar
small, but an important class of grammars we may have an efficient operator precedence
parser (a shift-reduce parser) for an operator grammar.

In an operator grammar, no production rule can have:

-> at the right side
-> two adjacent non-terminals at the right side.

Precedence Relations
In operator-precedence parsing, we define three disjoint precedence relations between
certain pairs of terminals.

.
a < b b has higher precedence than a
a =· b b has same precedence as a
.
a > b b has lower precedence than a

The determination of correct precedence relations between terminals are based on the
traditional notions of associativity and precedence of operators. (Unary minus causes a
problem).
The intention of the precedence relations is to find the handle of a right-sentential form,
.
< with marking the left end,
=· appearing in the interior of the handle, and
.
> marking the right hand.

In our input string $a a ...a $, we insert the precedence relation between the pairs of
1 2 n
terminals (the precedence relation holds between the terminals in that pair).

Lab Manual of Compiler Design Page 39

Chameli Devi School Of Engineering, Indore

Using Operator -Precedence Relations

E -> E+E | E-E | E*E | E/E | E^E | (E) | -E | id

Then the input string id+id*id with the precedence relations inserted will be:

. . . . . .
$ < id > + < id > * < id > $

Operator-Precedence Parsing Algorithm

The input string is w$, the initial stack is $ and a table holds precedence relations
between certain terminals

Algorithm:
set p to point to the first symbol of w$ ;
repeat forever
if ( $ is on top of the stack and p points to $ ) then return
else {
let a be the topmost terminal symbol on the stack and let b be the symbol pointed to
by p;
if ( a <. b or a =· b ) then { /* SHIFT */
push b onto the stack;
advance p to the next input symbol;
}
else if ( a .> b ) then /* REDUCE */
repeat pop stack
until ( the top of stack terminal is related by <. to the terminal most recently popped
);
else error();
}

Lab Manual of Compiler Design Page 40

Chameli Devi School Of Engineering, Indore

Operator-Precedence Parsing Algorithm – Example

stack input action
.
$ id+id*id$ $ < id shift
.
$id +id*id$ id > + reduce E  id
$ +id*id$ shift
$+ id*id$ shift
.
$+id *id$ id > * reduce E  id
$+ *id$ shift
$+* id$ shift
.
$+*id $ id > $ reduce E  id
.
$+* $ * > $ reduce E  E*E
.
$+ $ + > $ reduce E  E+E
$ $ accept

Disadvantages of Operator Precedence Parsing

Disadvantages :
It cannot handle the unary minus (the lexical analyzer should handle the unary minus).
Small class of grammars.

Difficult to decide which language is recognized by the grammar.

Advantages :
simple

powerful enough for expressions in programming languages

Lab Manual of Compiler Design Page 41

Chameli Devi School Of Engineering, Indore

Source Code:
#include<iostream.h>
#include<stdio.h>
#include<conio.h>
#include<ctype.h>
#include<string.h>

char **arr;// contains productions for different Non terminals

//having non terminals at arr[i][0] and rest contains productions
//************************IMPORTANT******************************
//remember symbol @ is just another terminal there
/****************************************************************/

//examples of productions used by program

//S->a/^/(T) '/' is used to define multiple productions for the same Non
terminal
//T->T,S/S

//arr will contain the productions as follows

// 0 1 2 3 4 5 6 7 8 9 10
//
//arr[0] S a / ^ / ( T )
//arr[1] T T , S / S

int *flagstate;// to see whether leading has already been found for a Non terminal
char **foundlead;//contains the already found leading for a Non terminal
int *NtSymbols;//used to reduce time complexity by storing where the
productions for a non terminal are stored in arr
char **foundtrail;//contains the already found trailing for a Non terminal
int **trailgoesto;//to tell which Non Terminals trailing goes to whose trailing
int **leadgoesto;//to tell which Non Terminals leading goes to whose leading

int strmergeunique(chardest,const char source)

{
int strlength=strlen(source),change=0;
for(int i=0;i<strlength;i++)
{
if(!strchr(dest,source[i]))
{
dest[strlen(dest)+1]='\0';
dest[strlen(dest)]=source[i];
change=1;
Lab Manual of Compiler Design Page 42
Chameli Devi School Of Engineering, Indore

}
}
return change;
}
void leading(int no_of_nonterminals)
{
int nonterminals=0;
char Gamma,str[10]={'\0'};
while(nonterminals<no_of_nonterminals)
{
for(int eachletter=1;arr[nonterminals][eachletter]!='\0';eachletter++)
{
Gamma=arr[nonterminals][eachletter];
if(isupper(Gamma))
{
leadgoesto[ NtSymbols[toascii(Gamma)-65] ][
leadgoesto[NtSymbols[toascii(Gamma)-65]][0]+1 ]=nonterminals;
leadgoesto[ NtSymbols[toascii(Gamma)-65] ][0]++;
continue;
}
else
{
if(Gamma=='\x0')
{break;}
if(Gamma=='/')
{continue;}

str[0]=Gamma;
str[1]='\0';
strmergeunique(foundlead[nonterminals],str);

while(arr[nonterminals][eachletter+1]!='\x0'&&
arr[nonterminals][eachletter+1]!='/')
{
eachletter++;
}
}
nonterminals++;
}

int change=0;
Lab Manual of Compiler Design Page 43
Chameli Devi School Of Engineering, Indore

do
{
change=0;
for(int i=0;i<nonterminals;i++)
{
for(int j=1;j<=leadgoesto[i][0];j++)
{
change|=strmergeunique(foundlead[leadgoesto[i][j]],foundlead[i]);
}
}
}
while(change);

}
void trailing(int no_of_nonterminals)
{
int nonterminals=0;
char Delta,str[10]={'\0'};
while(nonterminals<no_of_nonterminals)
{
int eachletter=strlen(arr[nonterminals])-1;
for(;eachletter>0;eachletter--)
{

Delta=arr[nonterminals][eachletter];
// *******alpha B
if(isupper(Delta))
{
trailgoesto[ NtSymbols[toascii(Delta)-65] ][
trailgoesto[NtSymbols[toascii(Delta)-65]][0]+1 ]=nonterminals;
trailgoesto[ NtSymbols[toascii(Delta)-65] ][0]++;

if(arr[nonterminals][eachletter-1]!='/'&&
eachletter-1>0)
{Delta=arr[nonterminals][eachletter-1];
if(!isupper(Delta))
{
str[0]=Delta;
str[1]='\0';
strmergeunique(foundtrail[nonterminals],str);

}
}
Lab Manual of Compiler Design Page 44
Chameli Devi School Of Engineering, Indore

}
// B alpha
// ***** alpha
else
{
if(Delta=='/')
{continue;}

str[0]=Delta;
str[1]='\0';
strmergeunique(foundtrail[nonterminals],str);
Delta=arr[nonterminals][eachletter-1];
if(isupper(Delta)&&eachletter-1>0)
{

trailgoesto[ NtSymbols[toascii(Delta)-65] ][
trailgoesto[NtSymbols[toascii(Delta)-65]][0]+1 ]=nonterminals;
trailgoesto[ NtSymbols[toascii(Delta)-65] ][0]++;

}
}

while(eachletter-1>0&&
arr[nonterminals][eachletter-1]!='/')
{
eachletter--;
}
}
nonterminals++;
}
int change=0;
do
{
change=0;
for(int i=0;i<nonterminals;i++)
{
for(int j=1;j<=trailgoesto[i][0];j++)
{
change|=strmergeunique(foundtrail[trailgoesto[i][j]],foundtrail[i]);
}
}
}
while(change);
Lab Manual of Compiler Design Page 45
Chameli Devi School Of Engineering, Indore

}
void main()
{
int nt;
clrscr();
printf("Enter no.of nonterminals :");
scanf("%d",&nt);
arr=new char*[nt];
foundlead=new char*[nt];
foundtrail=new char*[nt];
flagstate=new int[nt];
leadgoesto=new int*[nt];
trailgoesto=new int*[nt];
NtSymbols=new int[26];
for (int i=0;i<nt;i++)
{
arr[i]=new char[100];
foundlead[i]=new char[10];
memset(foundlead[i],'\0',10);
foundtrail[i]=new char[10];
memset(foundtrail[i],'\0',10);
flagstate[i]=0;
leadgoesto[i]=new int[nt];
leadgoesto[i][0]=0;
trailgoesto[i]=new int[nt];
trailgoesto[i][0]=0;
printf("Enter non terminal ");
cin>>arr[i][0];
flushall();
printf("Enter Production for %c------>",arr[i][0]);
gets(arr[i]+1);

NtSymbols[toascii(arr[i][0])-65]=i;

char prod[50];
leading(nt);
trailing(nt);
cout<<endl<<endl;
for(i=0;i<nt;i++)
{
printf("leading (%c)--> { %s }\n",arr[i][0],foundlead[i]);
Lab Manual of Compiler Design Page 46
Chameli Devi School Of Engineering, Indore

printf("trailing (%c)--> { %s }\n",arr[i][0],foundtrail[i]);

delete arr[i];
delete foundlead[i];
delete foundtrail[i];
delete leadgoesto[i];
delete trailgoesto[i];
}
delete arr;
delete flagstate;
delete foundlead;
delete NtSymbols;
delete foundtrail;
delete trailgoesto;
delete leadgoesto;
}

Lab Manual of Compiler Design Page 47

Chameli Devi School Of Engineering, Indore

Program #07

Program using LEX to count the number of characters, words, spaces and
Lines in a given input file.

Lexical Analyzer

The main task of the lexical analyzer is to read the input source program, scanning the
characters, and produce a sequence of tokens that the parser can use for syntactic analysis.
The interface may be to be called by the parser to produce one token at a time Maintain
internal state of reading the input program (with lines) Have a function “getNextToken” that
will read some characters at the current state of the input and return a token to the parser
Other tasks of the lexical analyzer include Skipping or hiding whitespace and comments
Keeping track of line numbers for error reporting Sometimes it can also produce the
annotated lines for error reports Produce the value of the token Optional: Insert identifiers into
the symbol table

Character Level Scanning

The lexical analyzer needs to have a well-defined valid character set Produce invalid
character errors Delete invalid characters from token stream so as not to be used in the
parser analysis

E.g. don’t want invisible characters in error messages For every end-of-line, keep track of line
numbers for error reporting Skip over or hide whitespace and comments If comments are
nested (not common), must keep track of nesting to find end of comments May produce
hidden tokens, for convenience of scanner structure Always produce an end-of-file token
Important that quoted strings and comments don’t get stuck if an unexpected end of file
occurs

Lab Manual of Compiler Design Page 48

Chameli Devi School Of Engineering, Indore

Source Code:

%{
int ch=0, bl=0, ln=0, wr=0;
%}
%%
[\n] {ln++;wr++;}
[\t] {bl++;wr++;}
[" "] {bl++;wr++;}
[^\n\t] {ch++;}
%%
int main()
{
FILE *fp;
char file[10];
printf("Enter the filename: ");
scanf("%s", file);
yyin=fp;
yylex();
printf("Character=%d\nBlank=%d\nLines=%d\nWords=%d", ch, bl, ln, wr);
return 0;
}

Lab Manual of Compiler Design Page 49

Chameli Devi School Of Engineering, Indore

INPUT:
A input file (.doc or any format), counts number of characters, words, spaces and Lines in a
given input file.

OUTPUT:

$cat > input

Girish rao salanke
$lex p1a.l
$cc lex.yy.c –ll
$./a.out
Enter the filename: input
Character=16
Blank=2
Lines=1
Word=3

Lab Manual of Compiler Design Page 50

Chameli Devi School Of Engineering, Indore

Program #08

Program using LEX to count the numbers of comment lines in a given C/

C++/JAVA program. Also eliminate them and copy the resulting program
into separate file.

Compiler-construction tools
Originally, compilers were written “from scratch”, but now the situation is quite different. A
number of tools are available to ease the burden.

We will study tools that generate scanners and parsers. This will involve us in some theory,
regular expressions for scanners and various grammars for parsers. These techniques are
fairly successful. One drawback can be that they do not execute as fast as “hand-crafted”
scanners and parsers.

We will also see tools for syntax-directed translation and automatic code generation. The
automation in these cases is not as complete.

Finally, there is the large area of optimization. This is not automated; however, a basic
component of optimization is “data-flow analysis” (how values are transmitted between parts
of a program) and there are tools to help with this task.

Lexical Analysis (or Scanning)

The character stream input is grouped into meaningful units called lexemes, which are then
mapped into tokens, the latter constituting the output of the lexical analyzer. For example,
any one of the following

x3 = y + 3;
x3 = y + 3 ;
x3 =y+ 3 ;
but not
x 3 = y + 3;
would be grouped into the lexemes x3, =, y, +, 3, and ;.

A token is a <token-name,attribute-value> pair. For example

1. The lexeme x3 would be mapped to a token such as <id,1>. The name id is short for
identifier. The value 1 is the index of the entry for x3 in the symbol table produced by
the compiler. This table is used to pass information to subsequent phases.
2. The lexeme = would be mapped to the token <=>. In reality it is probably mapped to a
pair, whose second component is ignored. The point is that there are many different
identifiers so we need the second component, but there is only one assignment symbol
=.
3. The lexeme y is mapped to the token <id,2>

Lab Manual of Compiler Design Page 51

Chameli Devi School Of Engineering, Indore

4. The lexeme + is mapped to the token <+>.

5. The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters.
It is mapped to <number,something>, but what is the something. On the one hand
there is only one 3 so we could just use the token <number,3>. However, there can be
a difference between how this should be printed (e.g., in an error message produced
by subsequent phases) and how it should be stored (fixed vs. float vs double). Perhaps
the token should point to the symbol table where an entry for this kind of 3 is stored.
Another possibility is to have a separate numbers table.
6. The lexeme ; is mapped to the token <;>.

Note that non-significant blanks are normally removed during scanning. In C, most blanks are
non-significant. Blanks inside strings are an exception.

Note that we can define identifiers, numbers, and the various symbols and punctuation
without using recursion (compare with parsing below).

Lab Manual of Compiler Design Page 52

Chameli Devi School Of Engineering, Indore

Source Code:

%{
int com=0;
%}
%%
"/*"[^\n]+"*/" {com++;fprintf(yyout, " ");}
%%
int main()
{
printf("Write a C program\n");
yyout=fopen("output", "w");
yylex();
printf("Comment=%d\n",com);
return 0;
}

Lab Manual of Compiler Design Page 53

Chameli Devi School Of Engineering, Indore

OUTPUT:

$lex p1b.l
$cc lex.yy.c –ll
$./a.out
Write a C program
#include<stdio.h>
int main()
{
int a, b;
/*float c;*/
printf(“Hai”);
/*printf(“Hello”);*/
}
[Ctrl-d]
Comment=1

$cat output
#include<stdio.h>
int main()
{
int a, b;
printf(“Hai”);
}

Lab Manual of Compiler Design Page 54

Chameli Devi School Of Engineering, Indore

Program #09
Program using LEX to recognize a valid arithmetic expression and to
recognize the identifiers and operators present. Print them separately.

Some Regular Expressions for Flex

\"[^"]*\" string
"\t"|"\n"\" " whitespace (most common forms)
[a-zA-Z]
[a-zA-Z_][a-zA-Z0-9_]* identifier: allows a, aX, a45__
[0-9]*"."[0-9]+ allows .5 but not 5.
[0-9]+"."[0-9]* allows 5. but not .5
[0-9]*"."[0-9]* allows . by itself !!

The user must supply a lexical analyzer to read the input stream and communicate tokens
(with values, if desired) to the parser. The lexical analyzer is an integer-valued function called
yylex. The function returns an integer, the token number, representing the kind of token read.
If there is a value associated with that token, it should be assigned to the external variable
yylval.

The parser and the lexical analyzer must agree on these token numbers in order for
communication between them to take place. The numbers may be chosen by Yacc, or chosen
by the user. In either case, the ``# define'' mechanism of C is used to allow the lexical
analyzer to return these numbers symbolically. For example, suppose that the token name
DIGIT has been defined in the declarations section of the Yacc specification file. The relevant
portion of the lexical analyzer might look like:

yylex(){
extern int yylval;
int c;
...
c = getchar();
...
switch( c ) {

Lab Manual of Compiler Design Page 55

Chameli Devi School Of Engineering, Indore

...
case '0':
case '1':
...
case '9':
yylval = c-'0';
return( DIGIT );
...
}
...

The intent is to return a token number of DIGIT, and a value equal to the numerical value of
the digit. Provided that the lexical analyzer code is placed in the programs section of the
specification file, the identifier DIGIT will be defined as the token number associated with the
token DIGIT.

This mechanism leads to clear, easily modified lexical analyzers; the only pitfall is the need to
avoid using any token names in the grammar that are reserved or significant in C or the
parser; for example, the use of token names if or while will almost certainly cause severe
difficulties when the lexical analyzer is compiled. The token name error is reserved for error
handling, and should not be used naively.

As mentioned above, the token numbers may be chosen by Yacc or by the user. In the default
situation, the numbers are chosen by Yacc. The default token number for a literal character is
the numerical value of the character in the local character set. Other names are assigned
token numbers starting at 257.

To assign a token number to a token (including literals), the first appearance of the token
name or literal in the declarations section can be immediately followed by a nonnegative
integer. This integer is taken to be the token number of the name or literal. Names and literals
not defined by this mechanism retain their default definition. It is important that all token
numbers be distinct.

Lab Manual of Compiler Design Page 56

Chameli Devi School Of Engineering, Indore

For historical reasons, the endmarker must have token number 0 or negative. This token
number cannot be redefined by the user; thus, all lexical analyzers should be prepared to
return 0 or negative as a token number upon reaching the end of their input.

A very useful tool for constructing lexical analyzers is the Lex program developed by Mike
Lesk.[8] These lexical analyzers are designed to work in close harmony with Yacc parsers.
The specifications for these lexical analyzers use regular expressions instead of grammar
rules. Lex can be easily used to produce quite complicated lexical analyzers, but there remain
some languages (such as FORTRAN) which do not fit any theoretical framework, and whose
lexical analyzers must be crafted by hand.

Source Code:

%{
#include<stdio.h>
int a=0,s=0,m=0,d=0,ob=0,cb=0;
int flaga=0, flags=0, flagm=0, flagd=0;
%}
id [a-zA-Z]+
%%
{id} {printf("\n %s is an identifier\n",yytext);}
[+] {a++;flaga=1;}
[-] {s++;flags=1;}
[*] {m++;flagm=1;}
[/] {d++;flagd=1;}
[(] {ob++;}
[)] {cb++;}
%%
int main()
{
printf("Enter the expression\n");
yylex();
if(ob-cb==0)
{
printf("Valid expression\n");
}
else
{
printf("Invalid expression");
}
printf("\nAdd=%d\nSub=%d\nMul=%d\nDiv=%d\n",a,s,m,d);
printf("Operators are: \n");

Lab Manual of Compiler Design Page 57

Chameli Devi School Of Engineering, Indore

if(flaga)
printf("+\n");
if(flags)
printf("-\n");
if(flagm)
printf("*\n");
if(flagd)
printf("/\n");
return 0;
}

OUTPUT:

$lex p2a.l
$cc lex.yy.c –ll
$./a.out
Enter the expression
(a+b*c)
a is an identifier
b is an identifier
c is an identifier

[Ctrl-d]
Valid expression
Add=1
Sub=0
Mul=1
Div=0
Operators are:
+
*

Lab Manual of Compiler Design Page 58

Chameli Devi School Of Engineering, Indore

Program #13

Program using LEX to recognize whether a given sentence is simple or

compound.

%{
int flag=0;
%}
%%
(""[aA][nN][dD]"")|(""[oO][rR]"")|(""[bB][uU][tT]"") {flag=1;}
%%
int main()
{
printf("Enter the sentence\n");
yylex();
if(flag==1)
printf("\nCompound sentence\n");
else
printf("\nSimple sentence\n");
return 0;
}

Lab Manual of Compiler Design Page 59

Chameli Devi School Of Engineering, Indore

OUTPUT:

$lex p2b.l
$cc lex.yy.c –ll
$./a.out
Enter the sentence
I am Pooja
I am Pooja
[Ctrl-d]
Simple sentence

$./a.out
Enter the sentence
CSE or ISE
CSE or ISE
[Ctrl-d]
Compound sentence

Lab Manual of Compiler Design Page 60

Chameli Devi School Of Engineering, Indore

Program #14

Program using LEX to recognize and count the number of identifiers in a

given input file.

Lex helps write programs whose control flow is directed by instances of regular expressions in
the input stream. It is well suited for editor-script type transformations and for segmenting
input in preparation for a parsing routine.

Lex source is a table of regular expressions and corresponding program fragments. The table
is translated to a program which reads an input stream, copying it to an output stream and
partitioning the input into strings which match the given expressions. As each such string is
recognized the corresponding program fragment is executed. The recognition of the
expressions is performed by a deterministic finite automaton generated by Lex. The program
fragments written by the user are executed in the order in which the corresponding regular
expressions occur in the input stream.

Source Code:

%{
#include<stdio.h>
int count=0;
%}
op [+-*/]
letter [a-zA-Z]
digitt [0-9]
id {letter}*|({letter}{digitt})+
notid ({digitt}{letter})+
%%
[\t\n]+
("int")|("float")|("char")|("case")|("default")| ("if")|("for")|("printf")|("scanf") {printf("%s is a
keyword\n", yytext);}
{id} {printf("%s is an identifier\n", yytext); count++;}
{notid} {printf("%s is not an identifier\n", yytext);}
%%
int main()
{
FILE *fp;
char file[10];
printf("\nEnter the filename: ");
Lab Manual of Compiler Design Page 61
Chameli Devi School Of Engineering, Indore

scanf("%s", file);
fp=fopen(file,"r");
yyin=fp;
yylex();
printf("Total identifiers are: %d\n", count);
return 0;
}

OUTPUT:
$cat > input
int
float
78f
90gh
a
d
are case
default
printf
scanf

$lex p3.l
$cc lex.yy.c –ll
$./a.out
Enter the filename: input
int is a keyword
float is a keyword
78f is not an identifier
90g is not an identifier
h is an identifier
a is an identifier
d is an identifier
are is an identifier
case is a keyword
default is a keyword
printf is a keyword
scanf is a keyword
total identifiers are: 4

Lab Manual of Compiler Design Page 62

Chameli Devi School Of Engineering, Indore

Program #15
YACC (Yet Another Compiler Compiler ) program to recognize a valid
arithmetic expression that uses operators +, -, * and /.

Basic Specifications
Names refer to either tokens or nonterminal symbols. Yacc requires token names to be
declared as such. In addition, for reasons discussed in Section 3, it is often desirable to
include the lexical analyzer as part of the specification file; it may be useful to include other
programs as well. Thus, every specification file consists of three sections: the declarations,
(grammar) rules, and programs. The sections are separated by double percent ``%%'' marks.
(The percent ``%'' is generally used in Yacc specifications as an escape character.)

In other words, a full specification file looks like

declarations
%%
rules
%%
programs

The declaration section may be empty. Moreover, if the programs section is omitted, the
second %% mark may be omitted also;

thus, the smallest legal Yacc specification is

%%
rules

Blanks, tabs, and newlines are ignored except that they may not appear in names or multi-
character reserved symbols. Comments may appear wherever a name is legal; they are
enclosed in /* . . . */, as in C and PL/I.

The rules section is made up of one or more grammar rules. A grammar rule has the form:

A : BODY ;

A represents a nonterminal name, and BODY represents a sequence of zero or more names
and literals. The colon and the semicolon are Yacc punctuation.

Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``_'',
and non-initial digits. Upper and lower case letters are distinct. The names used in the body of
a grammar rule may represent tokens or nonterminal symbols.

Lab Manual of Compiler Design Page 63

Chameli Devi School Of Engineering, Indore

A literal consists of a character enclosed in single quotes ``'''. As in C, the backslash ``\'' is an
escape character within literals, and all the C escapes are recognized. Thus

'\n' newline
'\r' return
'\'' single quote ``'''
'\\' backslash ``\''
'\t' tab
'\b' backspace
'\f' form feed
'\xxx' ``xxx'' in octal

For a number of technical reasons, the NUL character ('\0' or 0) should never be used in
grammar rules.

If there are several grammar rules with the same left hand side, the vertical bar ``|'' can be
used to avoid rewriting the left hand side. In addition, the semicolon at the end of a rule can
be dropped before a vertical bar. Thus the grammar rules

A : B C D ;
A : E F ;
A : G ;

can be given to Yacc as

A : B C D
| E F
| G
;

It is not necessary that all grammar rules with the same left side appear together in the
grammar rules section, although it makes the input much more readable, and easier to
change.

If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:

empty : ;

Names representing tokens must be declared; this is most simply done by writing

%token name1 name2 . . .

in the declarations section. (See Sections 3 , 5, and 6 for much more discussion). Every name
not defined in the declarations section is assumed to represent a nonterminal symbol. Every
nonterminal symbol must appear on the left side of at least one rule.

Of all the nonterminal symbols, one, called the start symbol, has particular importance. The
parser is designed to recognize the start symbol; thus, this symbol represents the largest,
most general structure described by the grammar rules. By default, the start symbol is taken
to be the left hand side of the first grammar rule in the rules section. It is possible, and in fact

Lab Manual of Compiler Design Page 64

Chameli Devi School Of Engineering, Indore

desirable, to declare the start symbol explicitly in the declarations section using the %start
keyword:

%start symbol

The end of the input to the parser is signaled by a special token, called the endmarker. If the
tokens up to, but not including, the endmarker form a structure which matches the start
symbol, the parser function returns to its caller after the endmarker is seen; it accepts the
input. If the endmarker is seen in any other context, it is an error.

It is the job of the user-supplied lexical analyzer to return the endmarker when appropriate;
see section 3, below. Usually the endmarker represents some reasonably obvious I/O status,
such as ``end-of-file'' or ``end-of-record''.

2: Actions

With each grammar rule, the user may associate actions to be

performed each time the rule is recognized in the input process. These actions may return
values, and may obtain the values returned by previous actions. Moreover, the lexical
analyzer can return values for tokens, if desired.

An action is an arbitrary C statement, and as such can do input and output, call subprograms,
and alter external vectors and variables. An action is specified by one or more statements,
enclosed in curly braces ``{'' and ``}''. For example,

A : '(' B ')'
{ hello( 1, "abc" ); }

and

XXX : YYY ZZZ

{ printf("a message\n");
flag = 25; }

are grammar rules with actions.

Lab Manual of Compiler Design Page 65

Chameli Devi School Of Engineering, Indore

Source Code:

LEX
%{
#include"y.tab.h"
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext); return NUMBER;}
[a-zA-Z]+ {return ID;}
[\t]+ ;
\n {return 0;}
. {return yytext[0];}
%%

YACC
%{
#include<stdio.h>
%}
%token NUMBER ID
%left '+' '-'
%left '*' '/'
%%
expr: expr '+' expr
|expr '-' expr
|expr '*' expr
|expr '/' expr
|'-'NUMBER
|'-'ID
|'('expr')'
|NUMBER
|ID
;
%%
main()
{
printf("Enter the expression\n");
yyparse();
printf("\nExpression is valid\n");
exit(0);
}
int yyerror(char *s)
{
printf("\nExpression is invalid");
exit(0);
}

Lab Manual of Compiler Design Page 66

Chameli Devi School Of Engineering, Indore

OUTPUT:

$lex p4a.l
$yacc –d p4a.y
$cc lex.yy.c y.tab.c –ll
$./a.out
Enter the expression
(a*b+5)
Expression is valid

$./a.out
Enter the expression
(a+6-)
Expression is invalid

Lab Manual of Compiler Design Page 67

Chameli Devi School Of Engineering, Indore

Program #15

YACC (Yet Another Compiler Compiler ) program to recognize a valid

variable, which starts with a letter, followed by any number of letters or
digits.

Yacc turns the specification file into a C program, which parses the input according to
the specification given. The algorithm used to go from the specification to the parser is
complex, and will not be discussed here (see the references for more information). The
parser itself, however, is relatively simple, and understanding how it works, while not
strictly necessary, will nevertheless make treatment of error recovery and ambiguities
much more comprehensible.

Source Code:

LEX
%{
#include"y.tab.h"
extern yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext); return DIGIT;}
[a-zA-Z]+ {return LETTER;}
[\t] ;
\n return 0;
. {return yytext[0];}
%%

YACC
%{
#include<stdio.h>
%}
%token LETTER DIGIT
%%
variable: LETTER|LETTER rest
;
rest: LETTER rest
|DIGIT rest
|LETTER
|DIGIT
;
%%
Lab Manual of Compiler Design Page 68
Chameli Devi School Of Engineering, Indore

main()
{
yyparse();
printf("The string is a valid variable\n");
}
int yyerror(char *s)
{
printf("this is not a valid variable\n");
exit(0);
}

OUTPUT:

$lex p4b.l
$yacc –d p4b.y
$cc lex.yy.c y.tab.c –ll
$./a.out
input34
The string is a valid variable

$./a.out
89file
This is not a valid variable

Lab Manual of Compiler Design Page 69

Chameli Devi School Of Engineering, Indore

Program #16

Implement a program of YACC (Yet Another Compiler Compiler ) to recognize

strings ‘aaab’, ‘abbb’, ‘ab’ and ‘a’ using the grammar (anbn, n>= 0).

Yacc: Yet Another Compiler-Compiler

Yacc provides a general tool for imposing structure on the input to a computer program. The
Yacc user prepares a specification of the input process; this includes rules describing the
input structure, code to be invoked when these rules are recognized, and a low-level routine
to do the basic input. Yacc then generates a function to control the input process. This
function, called a parser, calls the user-supplied low-level input routine (the lexical analyzer)
to pick up the basic items (called tokens) from the input stream. These tokens are organized
according to the input structure rules, called grammar rules; when one of these rules has
been recognized, then user code supplied for this rule, an action, is invoked; actions have the
ability to return values and make use of the values of other actions.
Yacc is written in a portable dialect of C[1] and the actions, and output subroutine, are in C as
well. Moreover, many of the syntactic conventions of Yacc follow C.

The heart of the input specification is a collection of grammar rules. Each rule describes an
allowable structure and gives it a name. For example, one grammar rule might be

date : month_name day ',' year ;

Here, date, month_name, day, and year represent structures of interest in the input process;
presumably, month_name, day, and year are defined elsewhere. The comma ``,'' is enclosed
in single quotes; this implies that the comma is to appear literally in the input. The colon and
semicolon merely serve as punctuation in the rule, and have no significance in controlling the
input. Thus, with proper definitions, the input

July 4, 1776

might be matched by the above rule.

Lab Manual of Compiler Design Page 70

Chameli Devi School Of Engineering, Indore

An important part of the input process is carried out by the lexical analyzer. This user routine
reads the input stream, recognizing the lower level structures, and communicates these
tokens to the parser. For historical reasons, a structure recognized by the lexical analyzer is
called a terminal symbol, while the structure recognized by the parser is called a nonterminal
symbol. To avoid confusion, terminal symbols will usually be referred to as tokens.

There is considerable leeway in deciding whether to recognize structures using the lexical
analyzer or grammar rules. For example, the rules

month_name : 'J' 'a' 'n' ;

month_name : 'F' 'e' 'b' ;

...

month_name : 'D' 'e' 'c' ;

might be used in the above example. The lexical analyzer would only need to recognize
individual letters, and month_name would be a nonterminal symbol. Such low-level rules tend
to waste time and space, and may complicate the specification beyond Yacc's ability to deal
with it. Usually, the lexical analyzer would recognize the month names, and return an
indication that a month_name was seen; in this case, month_name would be a token.

Literal characters such as ``,'' must also be passed through the lexical analyzer, and are also
considered tokens.

Specification files are very flexible. It is realively easy to add to the above example the rule

date : month '/' day '/' year ;

allowing

7 / 4 / 1776

as a synonym for

July 4, 1776

Lab Manual of Compiler Design Page 71

Chameli Devi School Of Engineering, Indore

In most cases, this new rule could be ``slipped in'' to a working system with minimal effort,
and little danger of disrupting existing input.

The input being read may not conform to the specifications. These input errors are detected
as early as is theoretically possible with a left-to-right scan; thus, not only is the chance of
reading and computing with bad input data substantially reduced, but the bad data can usually
be quickly found. Error handling, provided as part of the input specifications, permits the
reentry of bad data, or the continuation of the input process after skipping over the bad data.

In some cases, Yacc fails to produce a parser when given a set of specifications. For
example, the specifications may be self contradictory, or they may require a more powerful
recognition mechanism than that available to Yacc. The former cases represent design errors;
the latter cases can often be corrected by making the lexical analyzer more powerful, or by
rewriting some of the grammar rules. While Yacc cannot handle all possible specifications, its
power compares favorably with similar systems; moreover, the constructions which are
difficult for Yacc to handle are also frequently difficult for human beings to handle. Some
users have reported that the discipline of formulating valid Yacc specifications for their input
revealed errors of conception or design early in the program development.

Source Code:

LEX
%{
#include"y.tab.h"
%}
%%
[a] return A;
[b] return B;
%%

YACC
%{
#include<stdio.h>
%}
%token A B
%%
S:A S B
|

Lab Manual of Compiler Design Page 72

Chameli Devi School Of Engineering, Indore

;
%%
main()
{
printf("Enter the string\n");
if(yyparse()==0)
{
printf("Valid\n");
}
}
yyerror(char *s)
{
printf("%s\n",s);
}

OUTPUT:

$lex p5b.l
$yacc –d p5b.y
$cc lex.yy.c y.tab.c –ll
$./a.out
Enter the string
aabb
[Ctrl-d]
Valid

$./a.out
Enter the string
aab
syntax error

Lab Manual of Compiler Design Page 73

Chameli Devi School Of Engineering, Indore

Program #17

Program to recognize the Context free grammar (anbn, n>= 10), Where a & b
are input symbols of the grammar.

A context-free grammar (CFG) is a set of recursive rewriting rules (or productions)

used to generate patterns of strings.

A CFG consists of the following components:

• a set of terminal symbols, which are the characters of the alphabet that appear in the strings
generated by the grammar.

• a set of nonterminal symbols, which are placeholders for patterns of terminal symbols that can
be generated by the nonterminal symbols.

• a set of productions, which are rules for replacing (or rewriting) nonterminal symbols (on the
left side of the production) in a string with other nonterminal or terminal symbols (on the right
side of the production).

• a start symbol, which is a special nonterminal symbol that appears in the initial string generated
by the grammar.

To generate a string of terminal symbols from a CFG, we:

• Begin with a string consisting of the start symbol;

• Apply one of the productions with the start symbol on the left hand size, replacing the start
symbol with the right hand side of the production;

• Repeat the process of selecting nonterminal symbols in the string, and replacing them with the
right hand side of some corresponding production, until all nonterminals have been replaced by
terminal symbols.

Lab Manual of Compiler Design Page 74

Chameli Devi School Of Engineering, Indore

Finding all the Strings Generated by a CFG

There are several ways to generate the (possibly infinite) set of strings generated by a grammar. We
will show a technique based on the number of productions used to generate the string.

Find the strings generated by the following CFG:

<S> --> w c d <S> | b <L> e | s

0. Applying at most zero productions, we cannot generate any strings.

1. Applying at most one production (starting with the start symbol) we can generate {wcd<S>, b<L>e,
s}. Only one of these strings consists entirely of terminal symbols, so the set of terminal strings we can
generate using at most one production is {s}.

2. Applying at most two productions, we can generate all the strings we can generate with one
production, plus any additional strings we can generate with an additional production.

{wcdwcd<S>, wcdb<L>e, wcds, b<S>e, b<L>;<S>e,s}

The set of terminal strings we can generate with at most two productions is therefore {s, wcds}.

3. Applying at most three productions, we can generate:

{wcdwcdwcd<S>, wcdwcdb<L>e, wcdwcds, wcdb<L>;<S>e,

wcdb<S>e, bwcd<S>e, bb<L>ee, bse, b<L>;<S>Se,
b<S><S>e, b<L>wcd<S>e, b<L>b<L>ee, b<L>se }

The set of terminal strings we can generate with at most three

productions is therefore {s, wcds, wcdwcds, bse}.

Lab Manual of Compiler Design Page 75

Chameli Devi School Of Engineering, Indore

We can repeat this process for an arbitrary number of steps N, and find all the strings the grammar can
generate by applying N productions.

Source Code:

LEX
%{
#include"y.tab.h"
%}
%%
[a] return A;
[b] return B;
%%

YACC
%{
#include<stdio.h>
%}
%token A B
%%
stat:exp B
;
exp:A A A A A A A A A exp1
;
exp1:A exp2
|A
|A A exp2
|A A A exp2
|A A A A exp2
;
exp2:A
;
%%
main()
{
printf("Enter the string\n");
if(yyparse()==0)
{
printf("Valid\n");
}
}
yyerror(char *s)
{
printf("error\n");

Lab Manual of Compiler Design Page 76

Chameli Devi School Of Engineering, Indore

OUTPUT:

$lex p6.l
$yacc –d p6.y
$cc lex.yy.c y.tab.c –ll
$./a.out
Enter the string
aaaaaaaaaaab
Valid

$./a.out
Enter the string
aab
error

Lab Manual of Compiler Design Page 77

Chameli Devi School Of Engineering, Indore

Program #18

Write a C program to implement the syntax-directed definition of “if E then

S1” and “if E then S1 else S2”.

/* Input to the program is assumed to be syntactically correct. The expression of ‘if’ statement,
for true condition and statement for false condition are enclosed in parenthesis */

Some programming languages permit the user to use words like ``if'', which are normally
reserved, as label or variable names, provided that such use does not conflict with the legal
use of these names in the programming language. This is extremely hard to do in the
framework of Yacc; it is difficult to pass information to the lexical analyzer telling it ``this
instance of `if' is a keyword, and that instance is a variable''. The user can make a stab at it,
using the mechanism described in the last subsection, but it is difficult.

A number of ways of making this easier are under advisement. Until then, it is better that the
keywords be reserved; that is, be forbidden for use as variable names. There are powerful
stylistic reasons for preferring this, anyway.

10: Advanced Topics

This section discusses a number of advanced features of Yacc.

Simulating Error and Accept in Actions

The parsing actions of error and accept can be simulated in an action by use of macros
YYACCEPT and YYERROR. YYACCEPT causes yyparse to return the value 0; YYERROR
causes the parser to behave as if the current input symbol had been a syntax error; yyerror is
called, and error recovery takes place. These mechanisms can be used to simulate parsers
with multiple endmarkers or context-sensitive syntax checking.

Accessing Values in Enclosing Rules.

An action may refer to values returned by actions to the left of the current rule. The
mechanism is simply the same as with ordinary actions, a dollar sign followed by a digit, but in
this case the digit may be 0 or negative. Consider

sent : adj noun verb adj noun

{ look at the sentence . . . }
;

adj : THE { $$ = THE; }

| YOUNG { $$ = YOUNG; }
...
;

Lab Manual of Compiler Design Page 78

Chameli Devi School Of Engineering, Indore

noun : DOG
{ $$ = DOG; }
| CRONE
{ if( $0 == YOUNG ){
printf( "what?\n" );
}
$$ = CRONE;
}
;
...

In the action following the word CRONE, a check is made that the preceding token shifted
was not YOUNG. Obviously, this is only possible when a great deal is known about what
might precede the symbol noun in the input. There is also a distinctly unstructured flavor
about this. Nevertheless, at times this mechanism will save a great deal of trouble, especially
when a few combinations are to be excluded from an otherwise regular structure.

Source Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int parsecondition(char[],int,char*,int);
void gen(char [],char [],char[],int);
int main()
{
int counter = 0,stlen =0,elseflag=0;
char stmt[60]; // contains the input statement
char strB[54]; // holds the expression for 'if'
condition
char strS1[50]; // holds the statement for true
condition
char strS2[45]; // holds the statement for false
condition

printf("Format of ‘if’ statement \n Example...\n");

printf("if (a<b) then (s=a);\n");
printf("if (a<b) then (s=a) else (s=b);\n\n");
printf("Enter the statement \n");
gets(stmt);
stlen = strlen(stmt);
counter = counter + 2; // increment over 'if'
counter = parsecondition(stmt,counter,strB,stlen);
if(stmt[counter]==')')

Lab Manual of Compiler Design Page 79

Chameli Devi School Of Engineering, Indore

counter++;
counter = counter + 3; // increment over 'then'
counter = parsecondition(stmt,counter,strS1,stlen);
if(stmt[counter+1]==';')
{ //reached end of statement, generate the output
printf("\n Parsing the input statement....");
gen(strB,strS1,strS2,elseflag);

return 0;
}
if(stmt[counter]==')')
counter++; // increment over ')'
counter = counter + 3; // increment over 'else'
counter = parsecondition(stmt,counter,strS2,stlen);
counter = counter + 2; // move to the end of
statement
if(counter == stlen)
{ //generate the output
elseflag = 1;
printf("\n Parsing the input statement....");
gen(strB,strS1,strS2,elseflag);
return 0;
}
return 0;
}
/* Function : parsecondition
Description : This function parses the statement
from the given index to get the statement enclosed
in ()
Input : Statement, index to begin search, string
to store the condition, total string length
Output : Returns 0 on failure, Non zero counter
value on success
*/
int parsecondition(char input[],int cntr,char
*dest,int totallen)
{
int index = 0,pos = 0;
while(input[cntr]!= '(' && cntr <= totallen)
cntr++;
if(cntr >= totallen)
return 0;
index = cntr;
while (input[cntr]!=')')
cntr++;
if(cntr >= totallen)

Lab Manual of Compiler Design Page 80

Chameli Devi School Of Engineering, Indore

return 0;
while(index<=cntr)
dest[pos++] = input[index++];
dest[pos]='\0'; //null terminate the string
return cntr; //non zero value
}
/* Function : gen ()
Description : This function generates three
address code
Input : Expression, statement for true condition,
statement for false condition, flag to denote if
the 'else' part is present in the statement
output :Three address code
*/
void gen(char B[],char S1[],char S2[],int elsepart)
{
int Bt =101,Bf = 102,Sn =103;
printf("\n\tIf %s goto %d",B,Bt);
printf("\n\tgoto %d",Bf);
printf("\n%d: ",Bt);
printf("%s",S1);
if(!elsepart)
printf("\n%d: ",Bf);
else
{ printf("\n\tgoto %d",Sn);
printf("\n%d: %s",Bf,S2);
printf("\n%d:",Sn);
}
}

Lab Manual of Compiler Design Page 81

Chameli Devi School Of Engineering, Indore

OUTPUT

Format of ‘if’ statement

Example ...
if (a<b) then (s=a);
if (a<b) then (s=a) else (s=b);

Enter the statement

if (a<b) then (x=a) else (x=b);

Parsing the input statement....

If (a<b) goto 101
goto 102
101: (x=a)
goto 103
102: (x=b)
103:

Lab Manual of Compiler Design Page 82

What Would Mrs Herring Say
No ratings yet
What Would Mrs Herring Say
3 pages
Compiler Lab Manual
No ratings yet
Compiler Lab Manual
82 pages
Java Lab Manual Jan-June 2023 for Student
No ratings yet
Java Lab Manual Jan-June 2023 for Student
11 pages
SPL LAB 1
No ratings yet
SPL LAB 1
15 pages
Te Androidlabmanual Sdl-II
No ratings yet
Te Androidlabmanual Sdl-II
48 pages
26 Owais KCS552
No ratings yet
26 Owais KCS552
86 pages
3160715-System Software Lab-Manual
No ratings yet
3160715-System Software Lab-Manual
51 pages
CPL 2013
No ratings yet
CPL 2013
68 pages
MLA LabManual1
No ratings yet
MLA LabManual1
52 pages
Data Structure and Algorithm Lab Manual
No ratings yet
Data Structure and Algorithm Lab Manual
46 pages
Programming Languages: Practical Workbook
No ratings yet
Programming Languages: Practical Workbook
60 pages
Case Lab Manual
100% (2)
Case Lab Manual
43 pages
314317 (1)
No ratings yet
314317 (1)
152 pages
09ce1104pdf 2020 01 01 10 52 27
No ratings yet
09ce1104pdf 2020 01 01 10 52 27
5 pages
Labreportsampleofcprogramming 240106103456 E3c12f1c
No ratings yet
Labreportsampleofcprogramming 240106103456 E3c12f1c
11 pages
ICT Lab Manual
No ratings yet
ICT Lab Manual
180 pages
Lab Manual Fcse
No ratings yet
Lab Manual Fcse
35 pages
Final Lab Manual CPPS
No ratings yet
Final Lab Manual CPPS
92 pages
SSOS LAB MANUAL2020-21 New
No ratings yet
SSOS LAB MANUAL2020-21 New
72 pages
Lab Manual: Sri Sri University
No ratings yet
Lab Manual: Sri Sri University
96 pages
Lab Manual: Sri Sri University
No ratings yet
Lab Manual: Sri Sri University
97 pages
Java Programming-II - CM4109
No ratings yet
Java Programming-II - CM4109
55 pages
CD Lab FPP
No ratings yet
CD Lab FPP
12 pages
OOPS LAB MANUAL
No ratings yet
OOPS LAB MANUAL
47 pages
COSC_408_-_Compiler_Construction
No ratings yet
COSC_408_-_Compiler_Construction
324 pages
SS & OS LAB Manual-1 PDF
No ratings yet
SS & OS LAB Manual-1 PDF
73 pages
Lab Manual CP Lab Final
No ratings yet
Lab Manual CP Lab Final
80 pages
IT Java Programming Lab Manual
No ratings yet
IT Java Programming Lab Manual
61 pages
ML_LAB_Mannual-1
No ratings yet
ML_LAB_Mannual-1
79 pages
SE Lab Manual 2024
No ratings yet
SE Lab Manual 2024
36 pages
ACFrOgCWX5jBJGFd SuuOtQjjm32Yz Ttc1Ie8vYvnRM4Rg1yVIS B4pGHj CysuktdEM5WsiVxndxRLcIpv78ig6ulyP MQJZs3m5B3xW7PPpQkLWguPXCaqL94R0xshvJXa1 5r7gNGowvK01E
No ratings yet
ACFrOgCWX5jBJGFd SuuOtQjjm32Yz Ttc1Ie8vYvnRM4Rg1yVIS B4pGHj CysuktdEM5WsiVxndxRLcIpv78ig6ulyP MQJZs3m5B3xW7PPpQkLWguPXCaqL94R0xshvJXa1 5r7gNGowvK01E
97 pages
Programming in C Dip Sem-I Wef 01082011
No ratings yet
Programming in C Dip Sem-I Wef 01082011
3 pages
Module 1
No ratings yet
Module 1
185 pages
Comsats University Islamabad: Department of Computer Science
No ratings yet
Comsats University Islamabad: Department of Computer Science
14 pages
Jawaharlal Nehru Engineering College: Laboratory Manual
No ratings yet
Jawaharlal Nehru Engineering College: Laboratory Manual
58 pages
Javalab Manual
No ratings yet
Javalab Manual
58 pages
CG Se
No ratings yet
CG Se
80 pages
SEN Experiment Printout
No ratings yet
SEN Experiment Printout
71 pages
Java Programming Journal Guidelines
No ratings yet
Java Programming Journal Guidelines
6 pages
MAD L M 22-23 I sem.docx
No ratings yet
MAD L M 22-23 I sem.docx
45 pages
CD Lab Manual_updated
No ratings yet
CD Lab Manual_updated
70 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
39 pages
Zhilal Afwu Rahman D Modul6
No ratings yet
Zhilal Afwu Rahman D Modul6
9 pages
BOOP GPP Final-3
No ratings yet
BOOP GPP Final-3
105 pages
Industry Elective
No ratings yet
Industry Elective
9 pages
OOP_LabManual_Fall24_IBIT
No ratings yet
OOP_LabManual_Fall24_IBIT
143 pages
Cloud-Computing-BE-PART-I
No ratings yet
Cloud-Computing-BE-PART-I
79 pages
Institute of Engineering & Technology, Devi Ahilya University, Indore, (M.P.), India. (Scheme Effective From July 2016)
No ratings yet
Institute of Engineering & Technology, Devi Ahilya University, Indore, (M.P.), India. (Scheme Effective From July 2016)
2 pages
02introduction To Programming Edip Senyureksyllabus20232024
No ratings yet
02introduction To Programming Edip Senyureksyllabus20232024
3 pages
Compiler Design Lab Manual (2024-25) (23-Aug-2024)
No ratings yet
Compiler Design Lab Manual (2024-25) (23-Aug-2024)
72 pages
EC8711 EMBEDDED LABORATORY Lab Manual
83% (6)
EC8711 EMBEDDED LABORATORY Lab Manual
126 pages
Programming Fundamental Lab Manual Updated (1)
No ratings yet
Programming Fundamental Lab Manual Updated (1)
39 pages
OOPC09CE2301pdf 2023 07 24 14 32 36
No ratings yet
OOPC09CE2301pdf 2023 07 24 14 32 36
6 pages
Finalised Modified Copy of PCD Lab Manual
No ratings yet
Finalised Modified Copy of PCD Lab Manual
64 pages
Stm Lab Manual
No ratings yet
Stm Lab Manual
24 pages
Student Attendance Master
100% (1)
Student Attendance Master
43 pages
IIT LAB MANUAL
No ratings yet
IIT LAB MANUAL
82 pages
CD Lab Manual
No ratings yet
CD Lab Manual
83 pages
C# Manual
No ratings yet
C# Manual
71 pages
CD Lab Manual_updated12
No ratings yet
CD Lab Manual_updated12
70 pages
Computer Practices Using C++
From Everand
Computer Practices Using C++
Ramkrishna Ghosh
No ratings yet
The Zenger-Folkman 17 Competencies: Leadership Team
No ratings yet
The Zenger-Folkman 17 Competencies: Leadership Team
1 page
Marketing Compendium 2024
No ratings yet
Marketing Compendium 2024
100 pages
Reviewer For Grade 10 Math
No ratings yet
Reviewer For Grade 10 Math
8 pages
Network Address Translation
No ratings yet
Network Address Translation
17 pages
China's Governance Model - Flexibility and Durability of Pragmatic Authoritarianism (کتاب دوست)
100% (1)
China's Governance Model - Flexibility and Durability of Pragmatic Authoritarianism (کتاب دوست)
369 pages
Rib-Knit Crop Tank Top SHEIN USA
No ratings yet
Rib-Knit Crop Tank Top SHEIN USA
1 page
Maintenance en
No ratings yet
Maintenance en
23 pages
Eagles Waterloo SOLO Variant
No ratings yet
Eagles Waterloo SOLO Variant
5 pages
Sales and Distribution Management
No ratings yet
Sales and Distribution Management
34 pages
movement-whitepaper_en
No ratings yet
movement-whitepaper_en
22 pages
1.0 Overview of Operating System Deployment
No ratings yet
1.0 Overview of Operating System Deployment
7 pages
1686631159.8276825 - RINGSHINE 2021-2022 Annual
No ratings yet
1686631159.8276825 - RINGSHINE 2021-2022 Annual
82 pages
TLE GRADE 7 4TH QUARTER INDUSTRIAL ARTS - TOPIC 1234
No ratings yet
TLE GRADE 7 4TH QUARTER INDUSTRIAL ARTS - TOPIC 1234
3 pages
GIOS-SAMAR Inc. Vs DOTC-1
No ratings yet
GIOS-SAMAR Inc. Vs DOTC-1
8 pages
Lecture On Database by Miss Aysha (GCUF)
No ratings yet
Lecture On Database by Miss Aysha (GCUF)
6 pages
Puntos de Tierra Sentra 2016
No ratings yet
Puntos de Tierra Sentra 2016
1 page
Chapter 1 PPT Quiz
No ratings yet
Chapter 1 PPT Quiz
9 pages
10MCA17 UNIX Programs (MCA SEM 2, VTU)
No ratings yet
10MCA17 UNIX Programs (MCA SEM 2, VTU)
54 pages
Fintech Term 6 AJ PA
No ratings yet
Fintech Term 6 AJ PA
5 pages
Usage of Social Media
No ratings yet
Usage of Social Media
34 pages
Patricia Benner
No ratings yet
Patricia Benner
2 pages
Detecting Fake News Using Machine Learning: Gaurav Kumar Choubey (21mca1061) Guide Name: DR Rajarajeswari S
No ratings yet
Detecting Fake News Using Machine Learning: Gaurav Kumar Choubey (21mca1061) Guide Name: DR Rajarajeswari S
29 pages
Walk in Interview - TO-II
No ratings yet
Walk in Interview - TO-II
2 pages
Bond 710
No ratings yet
Bond 710
2 pages
Unsupervised Learning of Video Representations Using Lstms
No ratings yet
Unsupervised Learning of Video Representations Using Lstms
12 pages
Mechanical Interview Questions and Answers: Search Question and Answer
No ratings yet
Mechanical Interview Questions and Answers: Search Question and Answer
4 pages
Vanessa Redgrave - Wikipedia
No ratings yet
Vanessa Redgrave - Wikipedia
11 pages
Panjar Obat Oktober
No ratings yet
Panjar Obat Oktober
1 page
Maranatha Baptist Church Constitution
No ratings yet
Maranatha Baptist Church Constitution
24 pages