A) Explain the stages involved in analyzing a source program.
What are the key activities
performed during the lexical, syntactic, and semantic analysis stages?
Cognitive Level: Understand / Remember
Marks: 6
---
A compiler transforms a high-level source program into machine code through several
stages of analysis and synthesis. The analysis phase breaks down the program and gathers
information to prepare it for code generation.
---
Main Stages in Analyzing a Source Program:
1. Lexical Analysis
2. Syntactic Analysis (Parsing)
3. Semantic Analysis
---
1. Lexical Analysis:
Also called: Scanner or Tokenizer
Purpose:
Converts a stream of characters into a stream of tokens.
Tokens are the smallest meaningful elements like identifier, keyword, number, operator, etc.
Key Activities:
Removing white spaces and comments
Recognizing keywords (e.g., if, while)
Recognizing identifiers (e.g., x, count)
Generating tokens (e.g., ID(x), NUM(10))
Error handling for invalid characters
Example:
For input: int a = 5;
Tokens: KEYWORD(int) ID(a) ASSIGN(=) NUM(5) SEMICOLON(;)
---
2. Syntactic Analysis (Parsing):
Purpose:
Checks whether the token sequence follows the grammar of the programming language.
Builds a parse tree (syntax tree).
Key Activities:
Apply context-free grammar rules
Construct the tree structure representing nested program constructs
Report syntax errors like missing brackets, semicolons
Example:
Checks if if (x > 0) x = x - 1; is syntactically valid.
---
3. Semantic Analysis:
Purpose:
Ensures that the program is meaningful semantically.
Deals with type checking, scope resolution, and symbol table management.
Key Activities:
Type checking (e.g., assigning a float to an int variable)
Detect undeclared variables
Ensure function call arguments match parameters
Build abstract syntax tree (AST) for further compilation
Example:
Ensures int x = "Hello"; is flagged as an error.
---
Conclusion:
These stages ensure that the source program is syntactically and semantically correct
before moving to code generation. Each stage performs specific checks that build upon the
results of the previous stage.
---
B) What is code optimization? Describe its role in the compilation process and provide
examples of common optimization techniques.
Cognitive Level: Remember / Understand
Marks: 6
---
What is Code Optimization?
Definition:
Code optimization is the process of improving the intermediate or final code to make it run
faster, consume less memory, or use fewer resources, without changing its output.
It is performed after semantic analysis and before code generation.
---
Role in Compilation:
Improves performance of the generated code
Makes the program more efficient
Can result in smaller binary size
Minimizes runtime and power consumption
---
Types of Code Optimization:
1. Machine-Independent Optimization:
Works on the intermediate code, independent of hardware.
2. Machine-Dependent Optimization:
Applied during code generation, based on specific processor architecture.
---
Common Optimization Techniques:
1. Constant Folding:
Evaluate constant expressions at compile time.
Example: int x = 3 * 4; becomes int x = 12;
2. Dead Code Elimination:
Remove code that will never be executed.
Example:
if (false) {
x = 10;
}
3. Common Subexpression Elimination:
Avoid recomputing expressions.
Example:
x = a * b + c * d;
y = a * b + e;
Optimize:
t = a * b;
x = t + c * d;
y = t + e;
4. Loop Optimization:
Improve performance of loops (e.g., loop unrolling, loop invariant code motion).
5. Strength Reduction:
Replace expensive operations with cheaper ones.
Example: Replace x = x * 2 with x = x << 1;
---
Conclusion:
Optimization makes programs efficient and practical for real-world use, especially in systems
with limited resources. It is a critical phase in the compiler for performance-critical
applications.
---
C) Describe the middle-end of a compiler. What is its function, and why is it important to
separate optimization from other phases?
Cognitive Level: Understand / Remember
Marks: 6
---
What is the Middle-End of a Compiler?
The middle-end of a compiler is the phase that lies between the front-end (lexical, syntax,
semantic analysis) and the back-end (code generation). Its main focus is on optimizing the
intermediate code.
---
Functions of the Middle-End:
1. Intermediate Code Generation:
Translates the parse tree or AST into an intermediate representation (IR), such as:
Three-address code (TAC)
Static Single Assignment (SSA)
Control Flow Graph (CFG)
2. Code Optimization:
Performs machine-independent optimizations to improve code performance.
Ensures no change in the program's meaning/output.
3. Type Checking and Control Flow Analysis:
Verifies type safety and logical correctness.
Determines flow of control using flow graphs.
---
Importance of Separating Optimization from Other Phases:
1. Modularity:
Clear separation makes the compiler easier to maintain and update.
2. Reusability:
The same middle-end (optimization phase) can be used with different front-ends (languages)
and back-ends (architectures).
3. Target Independence:
Intermediate optimizations don’t depend on the target machine, allowing code to be
retargeted easily.
4. Easier Debugging and Testing:
Errors can be isolated to a specific phase (e.g., if code runs slow, investigate the
middle-end).
5. Enhances Compiler Efficiency:
Reduces the complexity of the code generation phase by handling heavy transformations
earlier.
---
Example:
For code:
x = (a + b) * (a + b);
The middle-end might:
1. Convert to IR
2. Detect a + b is repeated
3. Optimize:
t = a + b;
x = t * t;
---
Conclusion:
The middle-end plays a vital role in bridging high-level language constructs with low-level
hardware implementation. By focusing on optimization and intermediate representation, it
contributes significantly to program performance while maintaining language-independence
and machine-independence.
—
A) Phases of a Compiler and Difference Between Interpreter, Compiler, and Hybrid Compiler
Cognitive Level: Remembering
Marks: 6
---
Phases of a Compiler (With Neat Diagram):
A compiler converts a high-level source program into machine code through multiple
structured phases, each handling a specific task.
---
Diagram:
Source Program
↓
+------------------+
| Lexical Analysis | → Tokens
+------------------+
↓
+------------------+
| Syntax Analysis | → Parse Tree
+------------------+
↓
+------------------+
| Semantic Analysis| → Annotated Tree
+------------------+
↓
+----------------------------+
| Intermediate Code Generator| → IR Code
+----------------------------+
↓
+----------------------+
| Code Optimization | → Optimized IR
+----------------------+
↓
+----------------------+
| Code Generation | → Target Code
+----------------------+
↓
+------------------+
| Code Linking & |
| Assembly | → Executable
+------------------+
---
Explanation of Compiler Phases:
1. Lexical Analysis (Scanner):
Converts character stream to tokens.
Removes white spaces and comments.
2. Syntax Analysis (Parser):
Analyzes token structure using grammar.
Builds parse tree.
3. Semantic Analysis:
Ensures semantic correctness (e.g., type checking, scope resolution).
4. Intermediate Code Generation:
Generates language-independent intermediate code.
5. Code Optimization:
Improves code efficiency (speed/memory) without altering output.
6. Code Generation:
Converts optimized IR into target machine code.
7. Code Linking and Assembly:
Links different object files and prepares final executable.
---
Difference Between Interpreter, Compiler, and Hybrid Compiler:
AspectInterpreter Compiler Hybrid Compiler
Execution Line-by-line executionTranslates entire code first Combination of
compilation & interpretation
OutputNo permanent binary Creates executable file May generate intermediate code
Speed Slower execution Faster after compilation Medium speed
Error HandlingStops at first error Lists all errors after compilingDepends on
implementation
Example Languages Python, JavaScript C, C++, Rust Java (Bytecode + JVM), C#
(.NET CLR)
---
B) Define Token, Pattern, and Lexeme with Suitable Examples. Identify Tokens for a C
Program.
Cognitive Level: Remembering
Marks: 6
---
Definitions:
1. Token:
A token is a class of input strings treated as a single unit during compilation.
Example: int, main, =, ;, identifier, constant, keyword
2. Lexeme:
A lexeme is an actual character sequence in the source code that matches the pattern for a
token.
Example: main is a lexeme for the token identifier.
3. Pattern:
A pattern is a rule (usually regular expression) that defines the form of lexemes for a
particular token.
Example: [a-zA-Z_][a-zA-Z0-9_]* is the pattern for an identifier.
---
Sample C Program:
int main()
{
int a = 10, b = 20;
printf("Sum is : %d", a + b);
return (0);
}
---
Tokens, Lexemes, and Attribute Values:
Token Type Lexeme Pattern / Description Attribute
Keyword int Keyword type=int
Identifier main Identifier function name
Symbol ( Left parenthesis -
Symbol ) Right parenthesis -
Symbol { Opening brace -
Keyword int Keyword type=int
Identifier a Identifier variable
Symbol = Assignment -
Constant 10 Integer constant value=10
Symbol , Comma -
Identifier b Identifier variable
Symbol = Assignment -
Constant 20 Integer constant value=20
Symbol ; Semicolon -
Identifier printf Identifier (function) library function
Symbol ( Left parenthesis -
String Literal "Sum is : %d" String constant format string
Symbol , Comma -
Identifier a Identifier variable
Symbol + Arithmetic operator addition
Identifier b Identifier variable
Symbol ) Right parenthesis -
Symbol ; Semicolon -
Keyword return Keyword -
Symbol ( Left parenthesis -
Constant 0 Integer constant value=0
Symbol ) Right parenthesis -
Symbol ; Semicolon -
Symbol } Closing brace -
---
C) i) Explain Regular Expression and Regular Definition with Example
ii) Write Regular Definitions for Given Languages
Cognitive Level: Remembering
Marks: 6
---
i) Regular Expression and Regular Definition
Regular Expression:
A regular expression (RE) describes a set of strings defined by certain syntactic rules using:
Symbols (a, b, 0, 1)
Operators:
| (alternation)
* (Kleene star)
() (grouping)
+ (one or more times)
? (zero or one time)
Example:
a(b|c)* matches strings: a, ab, ac, abc, acc, etc.
---
Regular Definition:
Regular definitions assign names (non-terminals) to regular expressions for ease of use.
Example:
digit → 0|1|2|3|4|5|6|7|8|9
number → digit+
---
ii) Regular Definitions for Specific Languages
1. All strings of lowercase letters that contain the five vowels (a, e, i, o, u) in order:
Explanation: Each vowel must appear at least once, in order, possibly separated by other
lowercase letters.
Regular Definition:
[a-z]*a[a-z]*e[a-z]*i[a-z]*o[a-z]*u[a-z]*
Matches: aeiou, baceidofug, xayaebizomuk
---
2. All strings of lowercase letters in ascending lexicographic order:
Explanation: Each next character should be greater than or equal to the previous one.
Example: abc, aee, mno
Regular Definition:
a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*
Note: Each character appears zero or more times, and the order is maintained.
Matches: ab, aabbcc, xyz, aee
A) Explain the major activities performed during the lexical, syntactic, and semantic analysis
phases. How is a source program analyzed?
Cognitive Level: Understand / Remember
Marks: 6
---
Introduction:
A compiler transforms a high-level programming language (source code) into machine code.
The transformation involves multiple analysis phases that ensure the correctness and
structure of the input program. Three important analysis phases are:
Lexical Analysis
Syntax Analysis
Semantic Analysis
---
1. Lexical Analysis (Scanner Phase):
Main Function:
Breaks the input source code into tokens.
Major Activities:
Tokenization: Converts characters into valid tokens such as keywords, identifiers, operators,
numbers, etc.
Lexeme recognition: Matches strings against token patterns.
Removing whitespaces and comments.
Error detection: Identifies invalid characters or malformed lexemes.
Symbol table creation: Adds identifiers and literals with relevant information (type, value,
etc.).
Example:
Input: int a = 5;
Tokens: KEYWORD(int), ID(a), ASSIGN(=), CONST(5), SEMICOLON(;)
---
2. Syntax Analysis (Parser Phase):
Main Function:
Checks whether the sequence of tokens forms valid statements based on grammar rules.
Major Activities:
Parsing: Builds a parse tree or syntax tree using a context-free grammar.
Detecting syntax errors: Such as missing brackets, semicolons, or misplaced keywords.
Ensuring language constructs: like if, while, function follow proper structure.
Example:
Detects syntax errors in:
if (x > 5 // Missing closing parenthesis and opening brace
x = 10;
---
3. Semantic Analysis:
Main Function:
Ensures that the syntax tree follows language semantics, i.e., the meaning of statements is
correct.
Major Activities:
Type checking: Ensures type compatibility (e.g., int = float warning).
Scope resolution: Ensures that variables are declared before use.
Function checks: Verifies correct function usage (parameters, return type).
Builds annotated syntax trees or intermediate representation.
Example:
Detects errors like:
int x;
x = "Hello"; // Type mismatch
---
How is a Source Program Analyzed?
1. The source program is given as input to the compiler.
2. Lexical analysis converts it into a stream of tokens.
3. Syntax analysis constructs the syntactic structure using grammar.
4. Semantic analysis verifies that the syntax follows the language’s meaning rules.
Each phase passes structured data to the next stage, leading to an intermediate code that is
semantically and syntactically correct.
---
Conclusion:
The lexical, syntax, and semantic phases work sequentially to convert unstructured source
code into a well-defined, meaningful intermediate representation. Each phase focuses on a
specific layer of correctness—from character level to logic level.
---
B) How many phases are there in a compiler? Explain each phase in detail.
Cognitive Level: Understand / Remember
Marks: 6
---
Introduction:
A compiler consists of multiple phases that perform analysis (checking correctness) and
synthesis (generating code). These phases can be grouped into two main categories:
Analysis Phase: Understands and checks the program.
Synthesis Phase: Translates and optimizes the program.
Typically, a compiler has 6 main phases plus two supporting components.
---
Phases of a Compiler:
1. Lexical Analysis (Scanner):
Breaks input into tokens.
Removes whitespaces and comments.
Builds the symbol table with variable/function names.
Example: Converts int x = 5; into tokens: KEYWORD(int), ID(x), CONST(5), etc.
2. Syntax Analysis (Parser):
Constructs the parse tree from token stream.
Uses context-free grammar.
Reports syntax errors (missing brackets, operators).
Example: Detects if an if statement lacks {} or has misplaced expressions.
3. Semantic Analysis:
Ensures logical correctness.
Performs:
Type checking
Scope resolution
Function parameter validation
Updates and uses the symbol table.
4. Intermediate Code Generation:
Converts syntax tree into an intermediate representation (IR).
Makes it easier to optimize and generate machine code.
Example IR:
t1 = a + b
t2 = t1 * c
5. Code Optimization:
Improves intermediate code to make it faster or smaller.
Does dead code elimination, loop optimization, constant folding.
Example: Converts x = 2 * 3; to x = 6;
6. Code Generation:
Converts optimized IR to target machine code.
Allocates registers and memory.
Ensures correct instruction formats.
---
Supporting Phases:
1. Symbol Table Management:
Stores variable names, types, scopes, function info.
Used by all phases, especially semantic analysis and optimization.
2. Error Handling:
Each phase detects and recovers from different types of errors:
Lexical errors: Invalid characters
Syntax errors: Invalid token sequences
Semantic errors: Type mismatch, undeclared variables
---
Diagram of Compiler Phases:
Source Code
↓
Lexical Analysis → Tokens
↓
Syntax Analysis → Parse Tree
↓
Semantic Analysis → Annotated Tree
↓
Intermediate Code Generation → IR Code
↓
Code Optimization → Optimized IR
↓
Code Generation → Target Machine Code
---
Conclusion:
A compiler typically has six main phases, each responsible for a specific transformation of
the source code. These phases work in sequence to ensure that the input code is valid,
optimized, and converted into executable machine code.
---