Name: Suyash Sunil Dongre
Roll no:87
Class: TY CS-B
PRN: 12320236
Assignment no:1
1. Write Specification of LEX/FLEX Program.
1. Purpose:
The purpose of this LEX/FLEX program is to generate a lexical analyzer (scanner) that reads
input text and identifies tokens using regular expressions, then processes those tokens based on
defined actions. This can be used for applications such as text processing, building compilers,
or analysing structured input data.
2. Input:
• The program reads a stream of characters from standard input or a file.
• The input consists of text that may contain words, numbers, punctuation, whitespace,
or other symbols.
3. Output:
• The program outputs the recognized tokens and associated actions.
• If unrecognized characters are found, the program outputs an error message.
• The output format for recognized tokens is customizable (e.g., printing the token,
storing it, or processing it further).
4. Functional Requirements:
1. Tokenization:
o Recognizes various tokens (e.g., numbers, identifiers, operators) based on
predefined regular expressions.
o Tokens are categorized as types such as keywords, identifiers, numbers,
operators, etc.
2. Actions:
o For each token recognized, execute an action. Common actions include:
▪ Printing the matched token.
▪ Storing the token for later use.
▪ Counting occurrences of specific tokens.
3. Error Handling:
o The program must handle invalid input by printing an error message for
unrecognized characters.
4. End of Input:
o The program should recognize the end of input and stop processing.
5. Whitespace Handling:
o Whitespace characters (spaces, tabs, newlines) can either be ignored or
recognized depending on the requirements.
5. Regular Expressions and Tokens:
The regular expressions define the patterns for recognizing various tokens in the input. Some
common token types include:
• Identifiers (e.g., variable names):
o Regular Expression: [A-Za-z_][A-Za-z0-9_]*
• Integer Numbers:
o Regular Expression: [0-9]+
• Floating-point Numbers:
o Regular Expression: [0-9]+\.[0-9]+
• Operators (e.g., +, -, *):
o Regular Expression: \+|\-|\*|\/
• Whitespace (spaces, tabs, newlines):
o Regular Expression: [ \t\n]+ (can be ignored or processed)
• Comments (for languages with comments):
o Regular Expression (multi-line comment): /\*[^*]*\*+([^/*][^*]*\*+)*/
• Errors (any unrecognized character):
o Regular Expression: . (matches any character)
6. Main Program Structure:
A typical LEX/FLEX program consists of the following sections:
1. Definitions Section (%{ ... %}):
o Includes necessary header files and definitions.
Example:
%{
#include <stdio.h>
%}
2. Rules Section (%% ... %%):
o Defines the regular expressions and the associated actions.
Example:
[0-9]+ { printf("Number: %s\n", yytext); }
[A-Za-z]+ { printf("Identifier: %s\n", yytext); }
. { printf("Error: Unknown character %s\n", yytext); }
3. User Code Section:
o Contains the main function and any necessary supporting functions.
o The main() function typically calls the yylex() function to start the lexical analysis
process.
Example:
int main() {
yylex(); // Begin lexical analysis
return 0;
7. Error Handling:
If no pattern matches, an error message is printed.
{ printf("Error: Unrecognized character %s\n", yytext); }
8. Example of a Simple LEX/FLEX Program:
%{
#include <stdio.h>
%}
%%
[0-9]+ { printf("Number: %s\n", yytext); }
[A-Za-z]+ { printf("Word: %s\n", yytext); }
. { printf("Error: Unknown character %s\n", yytext); }
%%
int main() {
yylex(); // Start lexical analysis
return 0;
9. Compilation and Execution:
1. Generate C Code: Use flex to generate the C source code ([Link].c).
o Command: flex program.l
2. Compile C Code: Use a C compiler to compile the generated C file.
o Command: gcc [Link].c -o program
3. Run the Program: Execute the compiled program to perform lexical analysis on the
input.
o Command: ./program < [Link]
10. Performance Considerations:
• Efficiency: Flex uses a finite state machine (FSM) to process regular expressions, which
is efficient for many types of input. Optimizing regular expressions can improve
performance for larger inputs.
• Memory Management: Ensure proper handling of large inputs and consider using buffer
management to process large files efficiently.
Conclusion:
This LEX/FLEX program is designed to tokenize input text based on regular expressions,
perform specified actions (like printing or counting tokens), and handle errors gracefully. It is
an essential tool for text processing tasks such as lexical analysis, compiler design, or building
search engines.