0% found this document useful (0 votes)
29 views

Construction of Syntax Trees

This document discusses syntax trees and directed acyclic graphs (DAGs) used to represent the structure of expressions in a programming language. It describes how syntax trees and DAGs are constructed bottom-up using syntax-directed definitions. Nodes in the trees and graphs represent operators and operands, with interior nodes for operators pointing to child nodes for operands. The document provides examples of constructing the syntax tree and DAG for an expression like "a - 4 + c".

Uploaded by

Shammer Sha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Construction of Syntax Trees

This document discusses syntax trees and directed acyclic graphs (DAGs) used to represent the structure of expressions in a programming language. It describes how syntax trees and DAGs are constructed bottom-up using syntax-directed definitions. Nodes in the trees and graphs represent operators and operands, with interior nodes for operators pointing to child nodes for operands. The document provides examples of constructing the syntax tree and DAG for an expression like "a - 4 + c".

Uploaded by

Shammer Sha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Previous Page Home Next Page

Dependency Graphs Bottom up evaluation of S-


attributed
definitions

CONSTRUCTION OF SYNTAX TREES


**************************************

Introduction

A syntax tree is a condensed form of parse tree useful for representing language constructs.They are created with the
help of syntax-directed definitions. The use of syntax trees as an intermediate form helps to dissociate translation from
parsing.Translation routines that are invoked during parsing must operate under two kinds of restrictions.First,a grammar that
is suited for parsing may not reflect the natural hierarchical structure of the constructs in the language.Second,the parsing
method constrains the order in which nodes in a parse tree are considered.This order may not match the order in which
information about a construct becomes available.

Syntax Trees

In a syntax tree,operators and keywords do not appear as leaves,but rather are associated with the interior node that
would be the parent of those leaves in the parse tree.Another simplification found in syntax trees is that chains of single
productions may be collapsed(see Figure 1 and Figure 2).Syntax-directed translation can be based on syntax trees as well
as on parse trees.The approach is the same in each case;we attach attributes to the nodes as in a parse tree.

Figure 1
Figure 2

Constructing Syntax Trees for Expressions

The construction of a syntax tree for an expression is similar to the translation of the expression into postfix form.We
construct subtrees for the subexpressions by creating a node for each operator and operand.The children of an operator
node are the roots of the nodes representing the subexpresions constituting the operands of that operator.

Each node in a syntax tree can be implemented as a record with several fields.In the node for an operator,one field
identifies the operator and the remaining fields contain pointers to the nodes for the operands.The operator is often called the
label of the node.When used for translation,the nodes in a syntax tree may have additional fields to hold the values of
attributes attached to the node.Usually there are a number of functions defined to create the nodes of syntax trees.Each
function returns a pointer to a newly created node.

Consider for example the expression a - 4 + c.Here,we make use of the following functions to create the nodes of syntax
trees for expressions with binary operators.

a) mknode(op , left , right) creates an operator node with label op and two fields
containing pointers to left and right.

b) mkleaf(id , entry) creates a identifier node with label id and a field containing
entry,a pointer to the symbol-table entry for the identifier.

c) mkleaf(num , val) creates a number node with label num and a field containing val,the value of the number.

The following sequence of function calls creates the syntax tree for the expression a - 4 + c.In this sequence, p1 , p2 , p3
, p4 , p5 are pointers to nodes, and entrya and entyrc are pointers to the symbol-table entries for identifiers a and
c,respectively.

i) p1 := mkleaf(id , entrya);
ii) p2 := mkleaf(num , 4);
iii) p3 := mknode(' - ', p1 , p2);
iv) p4 := mkleaf(id , entryc);
v) p5 := mknode('+' , p3 , p4);

The tree is constructed bottom up.The function calls mkleaf(id,entrya) and mkleaf(num , 4) construct the leaves for a and
4;the pointers to these nodes are saved using p1 and p2.The call mknode (' - ' , p1 , p2 ) then constructs the interior node
with the leaves for a and 4 as children.After two mor steps, p5 is left pointing to the root.

Figure 3

A Syntax-Directed Definition for Constructing Syntax Trees

Figure 4 contains an S-attributed definition for constructing a syntax tree for an expression containing the operators + and
-.It uses the underlying productions of the grammar to schedule the calls of the functions mknode and mkleaf to construct
the tree.The synthesized attribute nptr for E and T keeps track of the pointers returned by the function calls.

Figure 4
An annotated parse tree depicting the construction of a syntax tree for the expression a - 4 + c is shown in Figure
5.The parse tree is shown dotted.The parse-tree nodes labeled by the nonterminals E and T use the synthesized attribute
nptr to hold a pointer to the syntax-tree node for the expression represented by the nonterminal.

Figure 5

The semantic rules associated with the productions T ---> id and T ---> num define attribute T.nptr to bea pointer to a new
leaf for an identifier and a number,respectively.Attributes id.entry and num.val are the lexical values assumed to be returned
by the analyzer with the tokens id and num.

In Fig 2,when an expression E is a single term,corresponding to a use of the production E ---> T,the attribute E.nptr gets
the value of T.nptr.When the semantic rule E.nptr := mknode(' - ', E1.nptr , T.nptr) associated with the production E --->
E1 - T is invoked,previous rules have set E1.nptr and T.nptr to be pointers to the leaves for a and 4,respectively.

Directed Acyclic Graphs for Expressions

A directed acyclic graph (dag) for an expression identifies the common subexpressions in the expression.Like a syntax
tree,a dag has a node for every subexpression of the expression;an interior node represents an operator and its children
represent its operands.The difference is that a node in a dag representing a common subexpression has more than one
"parent;" in a syntax tree,comon subexpression would be represented as a duplicated subtree.

Figure 6 shows a dag for the expression a + a * ( b - c ) + ( b - c ) * d.


Figure 6

The leaf for a has two parents because a is common to the two subexpressions a and a * ( b - c ).Likewise,both
occurrences of the common subexpression b - c are represented by the same node,which also has two parents.

The syntax-directed definition of Figure 4 will construct a dag instead of a syntax tree if we modify the operations for
constructing nodes.A dag is obtained if the function constructing a node first checks to see whether an identical node already
exists.For example,before constructing a new node with label op and fields with pointers to left and right ,mknode( op , left
, right ) can check whether such a node has already been constructed.If so,mknode( op , left , right ) can return a pointer
to the previously constructed node.The leaf-constructing functions mkleaf can behave similarly.

The sequence of instructions for constructing the dag in Figure 6 is listed as below.The functions defined constructs the
dag provided mknode and mkleaf create new nodes only when necessary,returning pointers to existing nodes with the
correct label and children whenever possible.

(1) p1 := mkleaf(id,a);
(2) p2 := mkleaf(id,a);
(3) p3 := mkleaf(id,b);
(4) p4 := mkleaf(id,c);
(5) p5 := mknode(' - ' ,p3,p4);
(6) p6 := mknode(' * ' ,p2,p5);
(7) p7 := mknode(' + ' ,p1,p6);
(8) p8 := mkleaf(id,b);
(9) p9 := mkleaf(id,c);
(10) p10 := mknode(' - ' ,p8,p9);
(11) p11 := mkleaf(id,d);
(12) p12 := mknode(' * ' ,p10,p11);
(13) p13 := mknode(' + ' ,p7,p12);

When the call mkleaf(id,a) is repeated on line 2,the node constructed by the previous call mkleaf(id,a) is returned,so
p1=p2.Similarly,the nodes returned on lines 8 and 9 are the same as those returned on lines 3 and 4,respectively.Hence,the
node returned on line 10 must be the same one constructed by the call of mknode on line 5.

In many applications,nodes are implemented as records stored in an array,as in Figure 7.In the figure,each record
has a label field that determines the nature of the node.We can refer to a node by its index in the array.The integer index of a
node is often called value number.For example, using value numbers,we can say node 3 has label +,its left child is node
1,and its right child is node 2.The following algorithm can be used to create nodes for a dag representation of an expression.

Figure 7

Algorithm: Value-number method for constructing a node in a dag.

Suppose that nodes are stored in an array and that each node is referred to by its value number.Let the signature of an
operator node be a triple<op , l , r> consisting of its label op,left child l,and right child r.

Input . Label op , node l ,and node r.

Output . A node with signature <op , l , r>.

Method . Serach the array for a node m with label op , left child l ,and right child r. If there is such a node,return m ;
otherwise,create a a new node n with label op , left child l , right child r,and return n.

An obivious way to detrmine if node m is already in the array is to keep all previously created nodes on a list and to
check each node on the list to see if it has the desired signature.The search for m can be made more efficient by using k
lists,called buckets,and using a hashing function h to determine which bucket to search.

The hash function h computes the number of a bucket from the value of op , l ,and r.It will always return the same bucket
number,given the same arguments.If m is not in the bucket h( op , l , r), then a new node n is created and added to this
bucket,so subsequent searches will find it there.Several signatures may hash into the same bucket number,but in practice we
expect each bucket to contain a small number of nodes.
Each bucket can be implemented as a link as shown in Figure 8.Each cell in a linked list represents a node.The bucket
headers,consisting of pointers to the first cell in a list,are stored in an array.The bucket number returned by h( op , l , r ) is an
index into this array of bucket headers.

Figure 8

This algorithm can be adapted to apply to nodes that are not allocated sequentially from an array.In many compilers
,nodes are allocated as they are needed,to avoid preallocating an array that may hold too many nodes most of the time and
not enough nodes some of the time.In this case,we cannot assume that nodes are in sequential storage,so we have to use
pointers to refer to nodes.If the hash function can be made to compute the bucket number from label and pointers to
children,then we can number the nodes in any way and use this number as the value number of the node.

Previous Page Home Next Page


Dependency Graphs Bottom up evaluation of S-
attributed
definitions

You might also like