0% found this document useful (0 votes)
11 views

Assignment Writeup

Compiler Design Assignment Solution

Uploaded by

Pro Coder
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Assignment Writeup

Compiler Design Assignment Solution

Uploaded by

Pro Coder
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Assignment Writeup

Prepared By
Mayank Bohra 2021BTech072

Submitted To
Mr. Ankur Sharma

Course CS1112: Compiler Design

April 2024
Assignment-1

1.a) Use Flex to generate a lexer for the simple language.

In the above Lex code, the program that analyzes input text or code. It identifies
different elements such as operators (+, -, *, /), numbers, and identifiers (like
variable names). Each time it finds one of these elements, it prints out what it is
(e.g., "Operator: +").

The main function opens an input file, reads its contents, and processes them
using the Lex tool to perform lexical analysis.

Input file
Output

1.b) Your scanner should define a struct called token: containing a field called type
which is an enum telling the token type, and a union called value which can be a either
a string called name, or an int called intVal or a float called floatVal, for different types
of tokens
This Lex program analyzes input text, categorizing elements like operators,
parentheses, integers, floats, dates, and identifiers. For each recognized token, it
prints its type and value, both on the console and in an output file. It distinguishes
between various types of numeric values, such as integers and floats, and handles
special cases like dates and identifiers.
2) In any programming language of your choice
a) Implement a DFA recognizer.
b) Implement an NFA recognizer.
In the above code, I designed two classes called NFA and DFA. Each class stores
states as integers and includes a counter variable named "steps" for error handling,
as well as functions for verification and transition.

The transition function utilizes an integer called "current_state" to track the current
state of the automaton. It iterates through characters in the input string, matching
them with the rules of the respective automaton. If a match occurs, the
"current_state" variable progresses to the next state based on the rule. After
processing the entire string, the "verify" function confirms if the "current_state"
equals the accepting state

The NFA class includes states represented by objects of the "state" class, containing
attributes like "name" and "position". These attributes are essential for a heuristic
used in determining the state transition when there are multiple potential states for
the same input character. This heuristic calculates the distance between the current
state and the possible states, selecting the state with the higher distance as it's closer
to the accepting state. The variable 'curr', denoting the current state, is updated to
reflect the name of the new state.
Assignment-2
1) Write code to take a grammar as input and
a) compute the FIRST and FOLLOW sets of all the non-terminals
b) create the LL1 parsing table for the grammar.

grammar.csv

in this code, I'm working with a grammar defined in a CSV file named "grammar.csv". First,
I find the first and follow sets for each non-terminal symbol in the grammar using a
function called find_first_follow_sets().

After that, I print out the first sets and follow sets that I've computed. This helps in
understanding the relationships between different symbols in the grammar.
Then, I read the grammar from the CSV file into a data structure. This grammar will be
used to construct a parsing table.

Moving forward, I create a parsing table using the grammar, first sets, and follow sets that I
obtained earlier. The parsing table is crucial for parsing input strings according to the
grammar rules.

Finally, I print out the parsing table so that I can see how different input symbols
correspond to different actions during the parsing process. It helps me understand how the
grammar can be used to parse strings.
In this code, I've defined a Python class called ProductionRule. This class represents a
production rule in a grammar. Each production rule consists of a left-hand side (the non-
terminal symbol) and a right-hand side (a sequence of symbols).

The __init__ method is the constructor of the class. It initializes the left_side and
right_side attributes of the production rule object with the values provided as arguments.
Moving on to the read_grammar function, it takes a file path (grammar_file) as input and
returns a list of ProductionRule objects representing the grammar read from that file.

Within the function, I open the grammar file in read mode and iterate over each line in the
file using a for loop. For each line, I strip any leading or trailing whitespace and then split
the line by commas using the split(',') method. The left-hand side of the production rule is
assigned to left_side, while the right-hand side is unpacked into right_side.

Then, I create a new ProductionRule object using the left_side and right_side, and append
it to the grammar list.
Finally, I return the grammar list containing all the production rules parsed from the file.
This function essentially reads the grammar from a file and converts it into a list of
ProductionRule objects, which can be further processed or analyzed.
In the above function, first, I read the grammar from the file using the read_grammar
function, which we discussed earlier. This gives me a list of production rules that define the
grammar.

Then, I initialize a set called non_terminals to store all the non-terminal symbols in the
grammar. I iterate through each production rule, and if the left-hand side of the rule is an
uppercase alphabet (indicating a non-terminal symbol), I add it to the non_terminals set.
Next, I initialize empty dictionaries first_set and follow_set to store the first and follow sets
for each non-terminal symbol, respectively. I iterate through each non-terminal symbol in
non_terminals, and for each symbol, I initialize an empty set in both first_set and
follow_set.

Now comes the calculation part. First, I calculate the first sets for all non-terminal symbols.
I set updated to True to begin with, and then I enter a while loop that continues as long as
updated is True. Inside this loop, I iterate through each production rule, and for each rule, I
call the calculate_first_set function to compute the first set for the left-hand side of the
rule. If any updates occur during this process, updated is set to True.

After calculating the first sets, I proceed to calculate the follow sets in a similar manner.
Again, I set updated to True initially, and then I enter a while loop that continues as long as
updated is True. Inside this loop, I iterate through each non-terminal symbol, and for S
symbol, I add the end-of-input marker $ to its follow set. Then, I call the
calculate_follow_set function to compute the follow set for the current non-terminal
symbol. If any updates occur during this process, updated is set to True.

Finally, I return the first_set and follow_set dictionaries containing the computed first and
follow sets for each non-terminal symbol in the grammar.
In the function ‘calculate_first_set’, I'm taking in several parameters: the non_terminal
symbol we're calculating the first set for, a list of symbols (which represent the right-hand
side of a production rule), the grammar itself, the first_set dictionary which stores the first
sets for each non-terminal symbol, and an optional parameter epsilon which represents the
empty string.

Now, I'm initializing a boolean variable updated to keep track of whether the first set for the
current non-terminal symbol has been updated during this function call.
Next, I iterate through each symbol in the symbols list, which represent the right-hand side
of a production rule.

If the symbol is an epsilon, meaning it represents the empty string, I add it to the first set of
the non_terminal if it's not already present, and I set updated to True. Then, I break out of
the loop because epsilon doesn't lead to any further symbols.

If the symbol is an uppercase alphabet (indicating a non-terminal symbol), I gather the first
symbol of the right-hand side of each production rule where the left-hand side matches the
current symbol. Then, I recursively call calculate_first_set with this symbol and its
corresponding right-hand side symbols, updating updated if any changes occur, and I union
the resulting first set with the first set of the non_terminal.

If the symbol is a terminal symbol, I add it to the first set of the non_terminal if it's not
already present, and I set updated to True. Then, I break out of the loop because terminals
don't lead to further symbols.

Finally, I return updated, indicating whether the first set for the non_terminal has been
updated during this function call.
In the function ‘calculate_follow_set’, I start by taking in parameters: the non_terminal
symbol for which we're calculating the follow set, the grammar, the first_set, and the
follow_set dictionaries.

I set up a boolean variable updated to keep track of whether the follow set for the current
non-terminal symbol has been updated during this function call.
Now, I iterate through each production rule in the grammar. If the non_terminal symbol is
found in the right-hand side of the current rule, I proceed to handle it.

I check if the non_terminal is the last symbol in the right-hand side of the rule. If it is, and
if the left-hand side of the rule is different from the non_terminal, I recursively call
calculate_follow_set for the left-hand side of the rule. If any updates occur during this call,
I set updated to True. Then, I union the follow set of the non_terminal with the follow set of
the left-hand side of the rule.

If the non_terminal is not the last symbol in the right-hand side, I handle the next symbol
in the right-hand side of the rule.

If the next symbol is a non-terminal, I union the follow set of the non_terminal with the
first set of the next symbol. If the empty string is in the first set of the next symbol, I
recursively call calculate_follow_set for the next symbol. If any updates occur during this
call, I set updated to True. Then, I union the follow set of the non_terminal with the follow
set of the next symbol.

If the next symbol is a terminal, I add it to the follow set of the non_terminal.
Finally, I return updated, indicating whether the follow set for the non_terminal has been
updated during this function call.
In the above function, firstly, I initialize an empty dictionary called parsing_table which
will eventually hold the parsing table constructed from the grammar, first sets, and follow
sets.

Then, I iterate through each production rule in the grammar. For each rule, I extract the
left-hand side (non-terminal symbol) and check if it exists as a key in the parsing_table
dictionary. If not, I initialize an empty dictionary for that non-terminal symbol in the
parsing_table.

Now, I loop through each symbol in the right-hand side of the current rule. If the symbol is
not the empty string (epsilon), I handle it accordingly.

If the symbol is a non-terminal (an uppercase alphabet), I iterate through each terminal
symbol in its first set obtained from first_set. For each terminal symbol in the first set, I
add an entry in the parsing_table mapping the current non-terminal to the current terminal
symbol, with the value being the current production rule.

If the symbol is a terminal symbol, I directly add an entry in the parsing_table mapping the
current non-terminal to the terminal symbol, with the value being the current production
rule.

Next, if the empty string (epsilon) is in the first set of the current non-terminal, I iterate
through each terminal symbol in its follow set obtained from follow_set. For each terminal
symbol in the follow set, I add an entry in the parsing_table mapping the current non-
terminal to the terminal symbol, with the value being the current production rule.
Finally, I return the constructed parsing_table which represents the parsing table derived
from the grammar, first sets, and follow sets. This table is essential for predictive parsing of
strings based on the given grammar.
In the above function, first, I initialize an empty PrettyTable object, which I'll use to neatly
display the parsing table.

Then, I create an empty set called terminals to store all unique terminal symbols present in
the parsing table.

Next, I iterate through each non-terminal symbol in the parsing_table. For each non-
terminal, I iterate through the dictionary of terminals associated with it. If a terminal
symbol exists, I add it to the terminals set.

Now that I have all the unique terminal symbols, I set the field names of the PrettyTable
object. The field names consist of "Non-terminal" followed by the sorted list of terminals.

Moving on to populating the table, I iterate again through each non-terminal symbol in the
parsing_table. For each non-terminal, I construct a row for the table. For each terminal
symbol in the sorted list of terminals, I check if it exists in the terminals dictionary of the
current non-terminal. If it does, I retrieve the corresponding production rule. If the right-
hand side of the rule is not empty or doesn't consist solely of the empty string, I add the
production rule in the form "left_side -> right_side" to the row. Otherwise, I add an empty
string.

Finally, I add the constructed row to the PrettyTable object.


Once all rows are added, I print the PrettyTable, displaying the parsing table in a visually
appealing format. This table provides a clear representation of which production rule to
apply for each combination of non-terminal and terminal symbols during the parsing
process.
c) Implement a top-down parser in a recursive and a non-recursive way.

Recursive

In the above code, first, there are two global variables idx and error that I'll be using
throughout the parsing process. idx keeps track of where I am in the input string, and error
flags if I encounter any syntax errors.

Now, let's look at the match function. When I call it with a character c and the input string,
it tells me I'm trying to match that character. Then, it checks if the current character in the
input matches c. If it does, I acknowledge the match and move to the next character. If
there's no match or if I've reached the end of the input, I report a syntax error.

Moving on to the A and S functions, they represent non-terminal symbols in my grammar.


In A, I start by saying I'm parsing A. Then, if the current character is '0', I match '0', then '1',
and then recursively call myself to parse A again. If the current character is '1', I match it
twice. If it's neither '0' nor '1', I report a syntax error.

In S, I announce that I'm parsing S. Then, if the current character is '0', I call the A function,
match '1', and recursively call myself to parse S again. If the current character is '~'
(epsilon), I match it. If it's anything else, I report a syntax error.

After defining the functions, I set up idx and error. I ask for an input string to parse and
then kick off the parsing process by calling the S function.
Finally, I check if I've parsed the entire input string without any syntax errors. If everything
went smoothly, I proudly declare "Valid Expression"; otherwise, I admit defeat with
"Invalid Expression".
Non-Recursive
In the above code, at the beginning, I set up two global variables idx and error which I'll be
using throughout the parsing process. idx keeps track of where I am in the input string, and
error flags if I encounter any syntax errors.

Inside the parse function, I initialize a stack with the starting symbol 'S', representing the
start of the parsing process.

Now, I enter a loop that continues as long as there are symbols on the stack and I haven't
reached the end of the input string.

In each iteration of the loop, I pop a symbol from the stack and print that I'm parsing that
symbol.
Then, I have a series of conditions to handle different cases based on the popped symbol.
If the symbol is 'S', indicating the start symbol, I check if I'm at the beginning of the
input string. If so, I push symbols onto the stack according to the grammar rules. If the
input string starts with '0', I push '0', '1', and 'A'. If it starts with '' (epsilon), I just print
that I've matched ''. If it's anything else, I report a syntax error.
If the symbol is 'A', I check if the next character in the input string matches the expected
pattern according to the grammar rules. If it's '0', I push '0', '1', and 'A' onto the stack. If
it's '1', I push '1' twice onto the stack. If it's neither, I report a syntax error.
If the symbol is '0' or '1', representing terminals, I check if the current character in the
input string matches. If it does, I print that I've matched the character and move to the
next one. If it doesn't match, I report a syntax error.
If none of the above conditions match, it means there's an unknown symbol on the
stack, which shouldn't happen in a valid parsing process. So, I report an error.

Finally, I return the remaining symbols on the stack after the parsing process.
After defining the function, I set up idx and error, prompt the user to enter the string to
parse, and then start the parsing process by calling the parse function.

Once the parsing is done, I check if the stack is empty, if I've reached the end of the input
string, and if no syntax errors occurred. If all conditions are met, I proudly declare "Valid
Expression"; otherwise, I admit defeat with "Invalid Expression".
Assignment-3

In the following Lex code, I have defined lexical rules for tokenizing input, where tokens
represent arithmetic operators, numbers, parentheses, mathematical functions like square
root and logarithm, and end-of-line markers. It skips whitespace characters and matches
numeric inputs as integers, returning the corresponding token.
In the following Bison code, I have defined a grammar and corresponding actions for
parsing arithmetic expressions. I have used tokens generated by the Lex code.
In the grammar I have specified rules for evaluating expressions involving addition,
multiplication, exponentiation, square root, and logarithm operations, along with
parentheses for precedence.
I have added the precedence of the operators to compute the value of the expression using
appropriate mathematical functions defined in the Bison code.
On encountering an error during parsing, the yyerror function is called to report the error.

You might also like