Name – Siddhant Bokil
Div - TY-CSA
Roll No - 44
PRN No - 12111169
Assignment 1
Title
Implement LEX/FLEX code to count the number of characters, words and lines
in an input file.
Theory
In the domain of compiler design, lexical analysis plays a crucial role in
converting the input stream of characters into meaningful tokens. LEX/FLEX is
a tool that assists in generating lexical analyzers based on regular expressions.
The task of counting the number of characters, words, and lines in an input file
demonstrates a fundamental application of lexical analysis.
1. Lexical Analysis:
Lexical analysis is the initial phase of the compilation process. It involves
breaking down the input stream into a sequence of tokens or lexemes, which are
meaningful units for the compiler. Tokens may include identifiers, keywords,
operators, literals, etc.
2. LEX/FLEX:
LEX/FLEX is a tool for generating lexical analyzers based on regular
expressions. It takes a set of regular expressions and corresponding actions as
input, and generates C code that recognizes the specified patterns in the input
text.
3. Regular Expressions:
Regular expressions are a concise and powerful way to describe patterns within
text. In this context, regular expressions are used to define the patterns of
characters that constitute words, spaces, and lines in the input file.
4. Character Count:
Counting characters involves scanning each character in the input stream. This
count includes all characters, including whitespace and special characters.
5. Word Count:
Words are typically separated by whitespace characters (spaces, tabs, etc.) or
punctuation. To count words, the lexical analyzer identifies sequences of
characters separated by whitespace or punctuation.
6. Line Count:
Lines are delimited by newline characters (\n). Counting lines involves
identifying each occurrence of a newline character.
7. Implementation:
In the implementation using LEX/FLEX, regular expressions are defined to
match characters, words, and lines in the input stream.
Each regular expression is associated with an action, which updates the
corresponding count variables.
The lexical analyzer scans the input file character by character, applying the
defined rules and updating counts accordingly.
Code -
%{
#undef yywrap
#define yywrap() 1
int charCount = 0;
int wordCount = 0;
int spaceCount = 0;
int lineCount = 0;
%}
%%
[^ \t\n]+ {
wordCount++;
charCount += yyleng;
}
[ \t]+ {
spaceCount++;
}
[\n] {
lineCount++;
}
%%
int main()
{
yyin = fopen("assignment1_input.txt", "r");
if (!yyin) {
fprintf(stderr, "Error opening file.");
return 1;
}
yylex();
fclose(yyin);
yyout = fopen("assignment1_output.txt", "w");
if (!yyout) {
fprintf(stderr, "Error opening file.");
return 1;
}
fprintf(yyout, "Word Count: %d\n", wordCount);
fprintf(yyout, "Space Count: %d\n", spaceCount);
fprintf(yyout, "Line Count: %d\n", lineCount);
fprintf(yyout, "Char Count: %d\n", charCount);
fclose(yyout);
return 0;
}
Input -
Output -
Assignment 2
Title –
Tokenization of C program
Theory -
Tokenization is a critical phase in the compilation process, where the source
code is analyzed and divided into meaningful units called tokens. Each token
represents a specific element in the code, such as keywords, identifiers, literals,
operators, etc. The provided code utilizes LEX/FLEX to tokenize a C program,
identifying and categorizing different elements according to their lexical
patterns.
1. Lexical Analysis:
Lexical analysis is the initial phase of compilation where the source code
is scanned to recognize tokens. Each token represents a distinct syntactic
unit in the programming language.
2. LEX/FLEX:
LEX/FLEX is a tool used to generate lexical analyzers based on regular
expressions. It takes a set of rules defined by regular expressions and
corresponding actions and generates C code for tokenizing input text.
3. Regular Expressions:
Regular expressions are patterns used to describe the lexical structure of
tokens in the source code. The provided code defines regular expressions
for various token types.
4. Token Definitions:
DIGIT: Matches sequences of digits.
FLOAT: Matches floating-point numbers.
OPERATOR: Matches various operators such as arithmetic, assignment,
and comparison operators.
SPACE: Matches space and tab characters.
NEWLINE: Matches newline characters.
DATATYPE: Matches data types like int, float, char.
KEYWORDS: Matches reserved words like if, for, while, etc.
IDENTIFIER: Matches variable and function names.
HEADER: Matches include statements.
SPECIAL: Matches special characters like brackets, commas, etc.
QUOTE: Matches quotation marks.
5. Tokenization Process:
Each regular expression is associated with an action that prints the token
type along with the matched text.
The lexer scans the input code and applies the rules defined in the regular
expressions.
When a match is found, the corresponding action is executed, printing the
token type.
If an unexpected character is encountered, an error message is printed.
6. Implementation:
The main function opens the input file for reading and invokes the lexer
using `yylex()`.
After tokenizing the input, the file is closed.
7. Example Output:
Tokens are printed with their corresponding types, e.g., "10 is integer", "if
is keyword", "x is identifier", etc.
Code -
%{
#include <stdio.h>
#include <string.h>
#undef yywrap
#define yywrap() 1
%}
DIGIT [0-9]+
FLOAT [0-9]+"."[0-9]+
OPERATOR "="|"-"|"+"|"++"|"--"|"+="|"-
="|"*"|"/"|"*="|"/="|""|">"|"<"|">="|"<="
SPACE [ \t]
NEWLINE \n
DATATYPE int|float|char
KEYWORDS if|for|while|printf|return
IDENTIFIER [a-zA-Z_][a-zA-Z_0-9]*
HEADER "#include<"[a-zA-Z]+\.[a-zA-Z]+">"
SPECIAL [()[\]{}.,;:?!%\\#]
QUOTE \"
%%
{DIGIT} {
printf("%s is integer\n", yytext);
}
{FLOAT} {
printf("%s is float\n", yytext);
}
{OPERATOR} {
printf("%s is operator\n",yytext);
}
{KEYWORDS} {
printf("%s is Keyword\n", yytext);
}
{DATATYPE} {
printf("%s is datatype\n", yytext);
}
{IDENTIFIER} {
printf("%s is identifier\n",yytext);
}
{HEADER} {
printf("%s is header\n",yytext);
}
{SPECIAL} {
printf("%s is special character\n", yytext);
}
{QUOTE} {
printf("%s is quote\n", yytext);
}
{SPACE}+
{NEWLINE}+
.{
printf("Unexpected character: %s\n", yytext);
}
%%
int main()
{
yyin = fopen("[Link]", "r");
yylex();
fclose(yyin);
return 0;
}
Input -
Output -
Assignment 3
Title - Convert all uppercase to lowercase letters and summation of digits if a
number is found.
Theory:
In this assignment, we utilize LEX/FLEX to convert all uppercase letters to
lowercase and calculate the summation of digits if a number is encountered in
the input. This functionality is achieved by defining rules for recognizing
uppercase letters, lowercase letters, and numbers, along with appropriate actions
to perform the required operations.
1. Lexical Analysis:
Lexical analysis is the process of breaking down the input stream into tokens or
lexemes, which are the smallest meaningful units in the language. In this code,
tokens include individual characters and numbers.
2. LEX/FLEX:
LEX/FLEX is a tool used to generate lexical analyzers based on regular
expressions. It enables defining patterns and corresponding actions to tokenize
input text.
3. Regular Expressions:
Regular expressions are used to describe patterns in the input text that
correspond to different token types.
4. Token Definitions:
Lowercase Letters (`[a-z]`): Matches lowercase letters in the input. These letters
are simply printed as they are.
Uppercase Letters (`[A-Z]`): Matches uppercase letters and converts them to
lowercase using the `tolower()` function. The converted letters are then printed.
Digits (`[0-9]+`): Matches sequences of digits. The sum of these digits is
calculated and printed along with the original number.
5. Implementation:
The `main()` function opens the input file for reading and the output file for
writing.
It sets the input and output streams for the lexer (`yyin` and `yyout`).
`yylex()` is called to tokenize the input text.
After tokenization, the input and output files are closed.
6. Example Output:
Uppercase letters are converted to lowercase.
If a number is found, the sum of its digits is printed along with the original
number.
7. Usefulness:
This code is useful for performing basic text transformations, such as converting
case and extracting numeric data.
8. Limitations:
The code assumes numbers are contiguous digits. It won't handle cases with
non-numeric characters in between digits.
It doesn't handle negative numbers, floating-point numbers, or numbers in
scientific notation.
In summary, the provided code demonstrates how to use LEX/FLEX to convert
uppercase letters to lowercase and calculate the sum of digits in numbers found
in the input. It showcases the practical application of lexical analysis for text
manipulation tasks.
Code -
%{
#include <stdio.h>
#include <ctype.h>
%}
%%
[a-z] {
fprintf(yyout, "%s", yytext);
}
[A-Z] {
int i;
for(i = 0; yytext[i]; i++){
yytext[i] = tolower(yytext[i]);
}
fprintf(yyout, "%s", yytext);
}
.|\n {
fprintf(yyout, "%s", yytext);
}
[0-9]+ {
int sum = 0;
for (int i = 0; i < yyleng; ++i) {
sum += yytext[i] - '0';
}
fprintf(yyout, "Sum of digits in number %s is %d\n", yytext, sum);
}
%%
int main() {
FILE *input_file = fopen("[Link]", "r");
FILE *output_file = fopen("[Link]", "w");
if (input_file == NULL || output_file == NULL) {
fprintf(stderr, "Error opening files.\n");
return 1;
}
yyin = input_file;
yyout = output_file;
yylex();
fclose(input_file);
fclose(output_file);
return 0;
}
int yywrap() {
return 1;
}
Input -
Output -
Assignment 4
Title:
Write a program in lex and yacc to recognize sentence as simple statement or
compound statement.
Theory:
The objective is to develop a program using Lex and Yacc that categorizes input
sentences into either simple statements or compound statements based on their
structure.
Lex rules and actions: Lex generates lexical analyzers, identifying patterns in
inputtext and returning tokens, while Yacc creates parsers, analyzing input
structure according to a defined grammar and executing user-defined actions.
• [ \t] ;: Matches whitespace characters (space or tab) and ignores them.
• am|is|are|have|has|can|will|shall|eat|sing|go|goes { return VERB;}:
Matchesverbs and returns the token VERB.
• very|simply|gently { return ADVERB; }: Matches adverbs and returns
the token ADVERB.
• and|or|also|so|but|if|then { return CONJUNCTION;}: Matches
conjunctions and returns the token CONJUNCTION.
• fast|good|honest { return ADJECTIVE;}: Matches adjectives and
returns the token ADJECTIVE.
• I|he|she|we|they|you|this { return PRONOUN;}: Matches pronouns
and returns the token PRONOUN.
• in|on|to { return PREPOSITION;}: Matches prepositions and returns
the token PREPOSITION.
• [a-zA-Z]+ { return NOUN;}: Matches nouns and returns the token
NOUN.
• .: Matches any other character.
Yacc Code:
• Defines grammar rules for sentences, simple statements, and compound
• statements.
• Specifies terminals using %token and non-terminals implicitly.
• Grammar rules outline sentence structure and include actions to print
• whether a sentence is simple or compound.
Lex Code:
• Recognizes different parts of speech (nouns, verbs, adverbs,
conjunctions, pronouns, adjectives, prepositions) using regular
expressions.
• Tokenizes input sentences by matching patterns and returning
corresponding tokens.
• Ignores whitespace characters and treats unmatched characters as
tokens.
Explanation:
• Yacc code defines two types of sentences: simple and compound, with
• distinct structures.
• Lex code tokenizes input sentences, recognizing various parts of
speech and
• returning tokens.
• The main function opens the input file, invokes the Yacc parser, and
• determines whether the input sentence is a simple or compound
statement
• based on its parsed structure.
• Error handling is implemented through the yyerror() function, which
displays
• error messages in case of parsing errors.
Lex Code :
%{
#include "[Link].h"
%}
%%
[\t ] ;
am|is|are|have|has|can|will|shall|eat|sing|go|goes { return VERB;}
very|simply|gently { (ADVERB); }
and|or|also|so|but|if|then { return (CONJUNCTION);}
fast|good|honest { return (ADJECTIVE);}
I|he|she|we|they|you|this { return (PRONOUN);}
in|on|to { return (PREPOSITION);}
[a-zA-Z]+ { return (NOUN);}
. ;
%%
int yywrap()
{
return 1;
}
Yacc Code:
%{
#include<stdio.h>
void yyerror(char*);
int yylex();
%}
%token NOUN PRONOUN ADJECTIVE VERB ADVERB CONJUNCTION PREPOSITION
%%
sentence: compound { printf("COMPOUND SENTENCE\n");}
|
simple {printf("SIMPLE SENTENCE\n");}
;
simple: subject VERB object;
compound: subject VERB object CONJUNCTION subject VERB object;
subject: NOUN|PRONOUN;
object: NOUN|ADJECTIVE NOUN|ADVERB NOUN|PREPOSITION NOUN;
%%
void yyerror(char *s)
{
printf("ERROR:%s",s);
}
int main(int argc,char* argv[])
{
FILE* yyin=fopen(argv[1],"r");
yyparse();
fclose(yyin);
return 0;
}
Input :
Assignment 5
Title
Implement a code optimizer for C/C++ subset.
Theory:
Implementing a code optimizer for a subset of C/C++ involves analyzing and
modifying the code to improve its efficiency, reduce redundancy, and minimize
resource usage without altering its functionality. The provided code
demonstrates a basic code optimizer implemented using Yacc (parser generator)
and Lex (lexical analyzer generator).
● Code Optimization:
Code optimization aims to improve code quality, execution speed, and
resource utilization while maintaining the program's functionality. It
involves various techniques such as constant folding, common
subexpression elimination, and dead code elimination.
● Yacc (Parser Generator):
Yacc generates parsers based on a formal grammar specification. It parses
input according to the grammar rules and invokes user-defined actions.
● Lex (Lexical Analyzer Generator):
Lex generates lexical analyzers that recognize patterns in the input text
and perform corresponding actions.
● Yacc Code:
The Yacc code defines grammar rules for expressions and statements.
It includes actions to build a representation of the parsed code (an array
of `struct expr`) and calls optimization functions.
● Lex Code:
The Lex code recognizes numbers, identifiers, and other characters.
It returns tokens to the Yacc parser.
● Optimization Techniques:
Common Subexpression Elimination (CSE): Identifies repeated
computations and replaces them with a single computation.
Dead Code Elimination: Removes code that has no effect on the program
output.
Expression Simplification: Simplifies expressions to reduce the number
of operations.
● Optimization Function (`opt()`):
The `opt()` function performs optimization on the generated intermediate
code.
It iterates over the array of expressions and applies optimization
techniques.
In this code, common subexpressions are detected and eliminated by
updating operands and marking redundant expressions.
● Intermediate Representation:
The array `arr` holds the intermediate representation of the parsed code.
Each element of `arr` represents an expression (`struct expr`) with
operands, operator, and result.
● Printing Optimized Code:
The `quad()` function prints the original and optimized intermediate code.
Original code is printed before optimization, and optimized code is
printed afterward.
● Input and Output:
The main function prompts the user to enter an expression.
The entered expression is parsed, optimized, and printed.
● Error Handling:
`yyerror()` function handles parsing errors and displays error messages.
`yywrap()` function indicates the end of input.
Code -
Yacc code -
%{
#include"[Link].h"
#include<stdio.h>
#include<stdlib.h>
char temp ='A'-1;
int index1=0;
char addtotable(char, char, char);
struct expr{
char operand1;
char operand2;
char operator;
char result;
};
%}
%union{
char symbol;
}
%left '+''-'
%left '*''/'
%token <symbol> NUMBER ID
%type <symbol> exp
%%
st: ID '=' exp ';' {addtotable((char)$1,(char)$3,'=');};
exp: exp '+' exp {$$ = addtotable((char)$1,(char)$3,'+');}
|exp '-' exp {$$ = addtotable((char)$1,(char)$3,'-');}
|exp '/' exp {$$ = addtotable((char)$1,(char)$3,'/');}
|exp '*' exp {$$ = addtotable((char)$1,(char)$3,'*');}
|'(' exp ')' {$$ = (char)$2;}
|NUMBER {$$ = (char)$1;}
|ID {$$=(char)$1;};
%%
struct expr arr[20];
void quad(){
int i;
for(i=0;i<index1;i++){
if(arr[i].operator=='!') continue;
printf("%c:=\t",arr[i].result);
printf("%c\t",arr[i].operand1);
printf("%c\t",arr[i].operand2);
printf("%c\n",arr[i].operator);
}
}
int main(){
temp='A'-1;
printf("Enter the expression\n");
yyparse();
quad();
opt();
printf("After Optimization\n");
quad();
int yywrap(){
return 1;
}
void yyerror(char *s){
printf("Error %s",s);
}
char addtotable(char a, char b, char c){
temp++;
arr[index1].operand1=a;
arr[index1].operand2=b;
arr[index1].operator=c;
arr[index1].result=temp;
index1++;
return temp;
}
void opt(){
int i,j;
for(i=0;i<index1;i++)
for(int j=i+1;j<index1;j++){
if(arr[i].operator==arr[j].operator && arr[i].operand1 ==arr[j].operand1
&& arr[i].operand2 == arr[j].operand2){
int z;
for(int z=j+1;z<index1;z++){
if(arr[z].operand1==arr[j].result) arr[z].operand1=arr[i].result;
if(arr[z].operand2==arr[j].result) arr[z].operand2=arr[i].result;
}
arr[j].operator='!';
}
}
}
Lex code -
%{
#include"[Link].h"
extern char yyval;
%}
%%
[0-9]+ {[Link] = (char)yytext[0];return NUMBER;}
[a-zA-Z]+ {[Link] =(char) yytext[0];return ID;}
. {return yytext[0];}
\n {return 0;}
%%
Output -
Assignment 6
Title –
Implement a code generator for C/C++ subset.
Theory-
Implementing a code generator for a subset of C/C++ involves crafting a system
capable of analyzing and refining code to enhance efficiency and resource usage
without altering functionality. The provided code illustrates a fundamental code
generator implemented using Yacc (Yet Another Compiler Compiler) and Lex
(Lexical Analyzer Generator).
The provided code implements a basic code generator for a subset of C/C++
using Yacc and Lex. Here's an explanation of the components and functionality:
Yacc Code:
• The Yacc code defines grammar rules and actions for parsing input
expressions and statements.
• It includes productions for statements (`st`) like variable assignments,
conditional blocks, keywords, and inbuilt functions.
• Each production has corresponding actions to generate C code based
on the parsed input.
• The `finalst` production collects and prints the final generated code.
• Grammar rules for expressions (`exp`) handle arithmetic operations,
variable references, and constants.
Lex Code:
• The Lex code tokenizes the input stream, recognizing keywords,
inbuilt functions, numbers, variables, and comparison operators.
• It returns tokens to the Yacc parser along with associated values stored
in the `yylval` union.
• Regular expressions match patterns such as keywords, numbers, and
operators.
Code Generation:
• The Yacc parser generates C code based on the parsed input
expressions and statements.
• It prints C code corresponding to variable assignments, conditional
blocks, keywords, and inbuilt functions.
• The generated code is printed as part of the `main()` function, which
reads input from a file (`[Link]`), parses it, generates C code, and
prints the final C program.
In summary, the code demonstrates a basic implementation of a code generator
using Yacc and Lex, capable of converting a subset of C/C++ expressions and
statements into equivalent C code.
Yacc code -
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern FILE *yyin;
char *buffer;
int i = 0;
int yylex();
void yyerror();
%}
%union
{
int num;
char* var;
char* expr;
}
%left '+' '-'
%left '*' '/'
%token <num> NUMBER
%token <var> VARIABLE COMPARISON BOOLOPR
%token <expr> KEYWORD INBUILT
%type <expr> exp st finalst condition condblock
%%
finalst: finalst st {
printf("%s", $2);
}
| {}
;
st: VARIABLE '=' exp ';' {
buffer = (char *)malloc(strlen($1) + strlen($3)+7);
sprintf(buffer, "%s = %s;\n", $1, $3);
$$ = buffer;
}
| condblock {
buffer = (char *)malloc(strlen($1));
sprintf(buffer, "%s\n", $1);
$$ = buffer;
}
| KEYWORD {
buffer = (char *)malloc(strlen($1));
sprintf(buffer, "%s ", $1);
$$ = buffer;
}
| INBUILT {
buffer = (char *)malloc(strlen($1));
sprintf(buffer, "%s\n", $1);
$$ = buffer;
}
| '{' {
buffer = (char *)malloc(3);
sprintf(buffer, "{ \n");
$$ = buffer;
}
| '}' {
buffer = (char *)malloc(3);
sprintf(buffer, "} \n");
$$ = buffer;
}
;
condblock: KEYWORD '(' condition ')'
{
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "%s (%s)", $1, $3);
$$ = buffer;
}
;
condition : exp COMPARISON exp
{
buffer = (char *)malloc(strlen($1) + strlen($2) + strlen($3) + 5);
sprintf(buffer, "%s %s %s ", $1, $2, $3);
$$ = buffer;
}
exp: exp '+' exp {
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "(%s + %s)", $1, $3);
$$ = buffer;
}
| exp '-' exp {
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "(%s - %s)", $1, $3);
$$ = buffer;
}
| exp '*' exp {
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "(%s * %s)", $1, $3);
$$ = buffer;
}
| exp '/' exp {
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "(%s / %s)", $1, $3);
$$ = buffer;
}
| exp '%' exp {
buffer = (char *)malloc(strlen($1) + strlen($3) + 5);
sprintf(buffer, "(%s %% %s)", $1, $3);
$$ = buffer;
}
| '(' exp ')' {
buffer = (char *)malloc(strlen($2) + 3);
sprintf(buffer, "(%s)", $2);
$$ = buffer;
}
| NUMBER {
buffer = (char *)malloc(20);
sprintf(buffer, "%d", $1);
$$ = buffer;
}
| VARIABLE {
buffer = (char *)malloc(strlen($1) + 1);
strcpy(buffer, $1);
$$ = buffer;
}
;
%%
int main()
{
printf("#include<stdio.h>\n\n");
printf("int main(){\n");
yyin = fopen("[Link]", "r");
while(!feof(yyin))
{
yyparse();
}
fclose(yyin);
printf("return 0;\n}");
return 0;
}
void yyerror(char *s)
{
printf("%s\n", s);
}
int yywrap() {
return 1;
}
Lex code -
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "[Link].h"
%}
%%
"int"|"char"|"string"|"float"|"if"|"else"|"else if" { [Link] =
strdup(yytext); return KEYWORD; }
"printf""(".*");" { [Link] = strdup(yytext); return INBUILT; }
[0-9]+ { [Link] = atoi(yytext); return NUMBER; }
[a-zA-Z]+ { [Link] = strdup(yytext); return VARIABLE; }
"=="|"!="|"<"|">"|"<="|">=" { [Link] = strdup(yytext); return
COMPARISON; }
[-+*/=();{}%] { return yytext[0]; }
[ \t\n] { }
.{}
%%
Input -
Output -