Symbol Table
Prepared by : Sorabh Gupta
Definition
• Symbol table is data structure that effective and efficient way
of storing information about various names appearing in
source program.
Contents of Symbol table
• Data is stored in the symbol table in form of name, information
pair
Name Information
X REAL
Y INT
Z CHAR
• Operations which can be applied on symbol table :
– Symbol table must allow searching particular name in table
– It allow editing information for a particular name
– It must allow deleting particular enteries
– Allow adding more information about name
• Contents in Symbol table
– Variable Names
– Constants
– Procedure names
– Function Names
– Strings
– Compiler generated temporaries
– Labels
• Compiler may use single symbol table to store all information or
may use separate symbol tables.
• Eg. int a=5;
float b;
b=a+3;
Name Information
a Var, Int
b Var, Real
5 const
3 const
• Similarly we can have separate symbol table for separate data
type.
Name Information Symbol table for int
a Var, Int
Name Information Symbol table for
real
b Var, Real
Name Information Symbol table for
constant
5 const
3 const
How to store Names in Symbol table?
• The most common of them is Array representation of symbol
table
[Link] Length Array representation
[Link] length Array representation
[Link] Array representation
[Link] Name Representation
a) Fixed number of Dimension
b) No limit on Dimension
Fixed Length Array Representation
• It is one of the simplest method of representation names in
contiguous array records
• Name and Information field of the symbol table are of fixed
size.
• Consider IBM 370 Representation
1. Length of identifier (Name) = 08 characters
2. Amount of information = 16 characters
• Suppose each block can only 4 characters.
Identifier
Information
Let us Consider name MANISH
MANI Identifier
SH
Information
AMIT Identifier
Information
We summed up the above diagrams :
NAME INFORMATION
1 M A N I S H
2 A M I T
Advantage
• Easy to understand
• Easy to implement and access
Disadvantages
• Wastage of large amount of memory
• Sequential access
• Insertion and deletion of records is comparatively very slow.
• Requires large processing and memory requirment
Variable Length Array Representation
• We will save all names in separate array of
character and will only store start index and
length of name.
Identifier 6 Length of Identifier
Identifier attributes
4
attributes
M A N I S H A M I T ………….
6 character 4 character
Advantages
• Variable Name can now be of any length
• Ease of access
• No wastage of memory in symbol table
• Symbol table now more organized
Disadvantages
• Problem of indirection maintenance
• Requirement of extra array record for storing variable name
Two array Representation of Symbol table
• Instead of storing records in 1 array we will be using 2 arrays
• One array is used to store identifiers and another array is used
to store its attributes (information)
• Two arrays are connected with a rule
• An identifiers takes 1 word of memory & information takes 4
word of memory.
• We store identifier at position 2i and the corresponding
information of identifier at 4i.
Name 1
2i
2i+1 Name 2 Att. Of
4i
2i+2 Name 3 Name 1 4i+1
Stored 4i+2
here 4i+3
Advantages
• Faster access to information
• Low wastage of memory
Disadvantages
• Requirement of additional array
• Complexity of symbol table increases
Array Names Representation
• Fixed number of dimensions
• Variable number(no limit) of dimensions
Fixed number of dimensions
• FORTRAN where number of dimension is
limited to 3 for an array
UL1 UL2
UL3 B
• The word containing A represent dimension of
array.
• A is 2 bit long
Bit 1 Bit 2 Description of
Array
0 0 Not an array
0 1 1D
1 0 2D
1 1 3D
• UL1, UL2, UL3 represent upper limit for each of the dimension
of array.
• We have provided 3 bits for B to define formal parameters or
constant
– 0 -> Constant
– 1 -> formal parameter
Linked array representation of array with no limit
on dimension
• Symbol table will store number of dimension of an array.
• It contains a pointer to an array containing lower and upper
limit of first dimension, this array is connected to another
array which contain lower and upper limit for second
dimension and so on.
Symbol table
for A
Dimen LL1 LL2 LL3
sion
UL1 UL2 UL3
Data Structure for Symbol Table
• Information in symbol table can be stored in computer system
in various data structures.
• Most common data structures used for symbol table :-
– Lists
– Self Organizing Lists
– Trees
– Hash tables
Lists
• It is conceptually simplest and easy to implement data
structure for symbol table in linear list of records
NAME 1
INFO 1
NAME 2
INFO 2
.
.
.
.
NAME N
INFO N
• We can use single array to store the name and its associated
information.
• Method of access if sequential.
• New names are added to end of list.
Advantages
• Minimum space required
• Easy to understand and implement
Disadvantages
• Sequential access
• Amount of work done required to be high
Self Organizing List
• This approach is used to decrease searching time in list
organization.
• Special attribute link is added to information of name that
allows dynamic features in list.
• It promotes reusability
NAME 1
INFO 1
LINK 1
NAME 2
INFO 2
LINK 2
NAME 3
FIRST INFO 3
LINK 3
NAME 4
INFO 4
LINK 4
Available
3 -> 1 -> 4 -> 2
Advantages
• Increases search efficiency
• Promotes reusability
Disadvantages
• Difficult to maintain so many links and connections
• Extra space required
Hash Tables
• Hashing is an important technique used to search the records
of symbol table.
• This method is superior to list organization.
• In hashing scheme two tables are maintained
– Hash table
– Symbol table
• Hash table consist of K entries from 0 to k-1. These entries are
basically to symbol table pointing to names of symbol table
Name Information Hash Link
0 Sum Sum
1 i
2 j
H(name) . i
. j
. avg
.
K-1 avg
Types of hash function
• Mid square function
– Consider a Key k
• Let k= 3230
– Find the square of K
• K2 = (3230)2 = 10432900
– Apply hash function
– Hash function will eliminate few digits from both sides of K2.
– Hence H(K)= 32
• Division Method
– Let Key = K
– Hash function H(K) = K mod m
Where m = prime no. (largest)
– Let us consider, we have to index 100 records and hence largest
prime number close to 100 is 97, hence we will chose m= 97
– Let K= 3230
– H(3230) = 3230 mod 97 = 29
• Folding Method
– We may use this method for large key values
– There are two methods to get hash value
• Shift folding
• Boundary folding
• Shift Folding
– Let K= 324 987 626 191 532
– Divide this key value into parts K1 K2 K3 K4 K5
K1 = 327
K2 = 987
K3 = 626
K4 = 191
K5 = 532
– H(K) = K1 + K2 + K3 + K4 + k5 = 2663
• Boundary Folding
– Some of sub keys are reversed to generate new simple hash values.
– Let K= 324 987 626 191 532
K1 = 327
K2 = 789
K3 = 626
K4 = 191
K5 = 253
– H(K) = K1 + K2® + K3 + K4 + K5 ® = 2168
Trees
• Symbol table can also be stored in form of binary search tree.
• Binary search tree have following properties :
– Each node can have at most two children
• Left child
• Right child
– Key value stored in parent node is more than key stored in left of the
children and less than right child.
32
23 36
20 26 34 42
Representing Scope Information
• Lifetime of variable in a particular block
• Representing names in symbol table along with indicator of
block in which it appear.
• Scope representation reflects visibility of variable name in
source program.
• There are two ways of representing scope information
– Scope by Number
– Scope by Location
Scope Representation by Number
• The same name can be declared many times as distinct name
in different blocks or procedures
• Each procedure or block can be given a unique number.
• Symbol table will not only contain the name of identifier, but
each entry will contain a pair (name,
procedure/block_number)
Example
Procedure 1 ()
{
Var a, b;
Procedure 2()
{
var a, x;
}
Procedure 3 ()
{
Var c, x;
}
}
Procedure 1 ()
{
Var a, b;
Procedure 2()
{
var a, x;
}
Procedure 3 ()
{
Var c, x;
}
}
a 1 2
b 1
c 3
2 3
x
1
Procedure 2
1
Procedure 3
Scope Representation by Location
• It creates separate table for each scope block or procedure
• Multiple identifier with same name can easily be inserted
because same name variable will be shared in separate block.
• We cannot have two variables with same name in same block
or procedure.
• This approach is more reliable and more easy to understand.
Procedure 1 ()
{
Var a, b;
Procedure 2()
{
var a, x;
}
Procedure 3 ()
{
Var c, x;
}
}
Procedure 1 a
b
procedure 2 a
Procedure 3 c x