02-Unit2

Data Structures using ‘C’ Unit 2
Unit 2 Overview of Data Structures
Structure:
2.1 Introduction
2.1.1 What is a Data Structure?
2.1.2 Definition of data structure
2.1.3 The Abstract Level
2.1.4 The Application Level
2.1.5 Implementation Level
Self Assessment Questions
2.2 Data Types and Structured Data Type
2.2.1 Common Structures
2.2.2 Abstract Data Types
2.2.2.1 Properties of Abstract Data Types
2.2.2.2 Generic Abstract Data Types
2.2.2.3 Programming with Abstract Data Types
2.3 Pre and Post Conditions
2.3.1 Preconditions
2.3.2 Postconditions
2.3.3 Checking Pre & Post Conditions
2.3.4 Implementation Checks Preconditions
2.4 Linear Data Structure
2.4.1 The Array Data Structure
2.4.2 Using an Array and Lists as a Data Structure
2.4.3 Elementary Data Structures
Sikkim Manipal University Page No.: 26

2.5 What the application needs ?

2.6 Implementation methods
2.7 Non Linear Data Structures
2.7.1 Trees
2.7.2 Binary Tree
2.7.3 Hash Tables
2.8 Summary
2.9 Terminal Questions
2.1 Introduction
Data structures represent places to store data for use by a computer
program. As you would imagine, this describes a spectrum of data storage
techniques, from the very simple to the very complex. We can look at this
progression, from the simple to the complex, in the following way.
At the lowest level, there are data structures supplied and supported by the
CPU (or computer chip), itself. These vary from chip to chip, but are almost
always of the very primitive sort. They typically include the simple data types,
such as integers, characters, floating point numbers, and bit strings. To
some extent, the data types supported by a chip reflect the hardware design
of the chip. Things such as, how wide (how many bits) are the registers,
how wide is the data bus, does the ALU have an accumulator, does the ALU
support floating point operations?
At the second level of the data structures spectrum are the data structures
supported by particular programming languages. These vary a lot from
language to language. Most languages offer arrays, and many offer arrays
of arrays (matrices). Most of the popular languages provide support for
some sort of record structure. In C these are structs and in Pascal these are

records. A few offer strings as a first class data type (e.g. C++ and Java). A
few languages support linked lists directly in the language (e.g. Lisp and
Scheme). Object oriented languages often offer general lists, stacks, and
even trees.
At the top level of this taxonomy are those data structures that are created
by the programmer, using a particular programming language. In this regard,
it is important to note what tools are provided by a language to facilitate the
implementation of complex data structures envisioned by a programmer.
Things such as arrays, arrays of arrays, pointers, record structures are all
helpful in this regard. Using the available tools, a programmer can build
general lists, stacks, queues, dequeues, tress (of many types), graphs, sets,
and much, much more.
In this book we will focus on those data structures in the top level, those that
are usually created by the application programmer. These are the data
structures that. generally, impact the problem solution and implementation in
the most dramatic ways: size, efficiency, readability , and maintainability .
Objectives
At the end of this unit, you will be able to understand the:
 Meaning and brief introduction of Data Structure
 Discussed the various types of abstract levels
 Brief introduction of Abstract data type and its properties
 Operations and implementations of methods of Pre and Post Conditions.
 Concepts and methods of Linear and Non Linear Data structure.
2.1.1 What is a Data Structure?

A data structure is the organization of data in a computer's memory or in a
file.
The proper choice of a data structure can lead to more efficient programs.

Some example data structures are: array, stack, queue, linked list, binary
tree, hash table, heap, and graph. Data structures are often used to build
databases. Typically, data structures are manipulated using various
algorithms.
Based on the concept of Abstract Data Types (ADT), we define a data

structure by the following three components.
1) Operations: Specifications of external appearance of a data structure
2) Storage Structures: Organizations of data implemented in lower-level
data structures
3) Algorithms: Description on how to manipulate information in the
storage structures to obtain the results defined for the operations
Working with and collecting information on any subject, it doesn't take very
long before you have more data than you know how to handle. Enter the
data structure. In his book Algorithms, Data Structures and Problem Solving
with C, Mark Allen Weiss writes "A data structure is a representation of
data and the operations allowed on that data." Webopedia states, "the
term data structure refers to a scheme for organizing related pieces of
information."
2.1.2 Definition of data structure

"a specification, an application and an implementation view of a collection of
one or more items of data, and the operations necessary and sufficient to
interact with the collection. The specification is the definition of the data
structure as an abstract data type. The specification forms the programming
interface for the data structure. The application level is a way of modeling
real-life data in a specific context. The implementation is a concrete data
type expressed in a programming language. There may be intermediate
levels of implementation, but ultimately the data structure implementation
must be expressed in terms of the source language primitive data types”.

2.1.3 The Abstract Level

The abstract (or logical) level is the specification of the data structure -the
"what" but not the "how." At this level. the user or data structure designer is
free to think outside the bounds of anyone programming language. For
instance. a linear list type would consist of a collection of list nodes such
that they formed a sequence. The operations defined for this list might be
insert. delete, sort and retrieve.
2.1.4 The Application Level

At the application or user level, the user is modeling real-life data in a
specific context. In our list example. we might specify what kind of items
were stored in the list and how long the list is. The context will determine the
definitions of the operations. For example, if the list was a list of character
data, the operations would have a different meaning than if we were talking
about a grocery list.
2.1.5 Implementation Level

The implementation level is where the model becomes compilable,
executable code. We need to determine where the data will reside and
allocate space in that storage area. We also need to create the sequence of
instructions that will cause the operations to perform as specified.

1. Define data Structure? Explain its three components.
2. Discuss the data structure implementation in terms of the source
language primitive data type.
2.2 Data Types and Structured Data Type

The definition for the term data type and structured data type and data type
consists of

 a domain(= a set of values)

 a set of operations.
Example : Boolean or logical data type provided by most programming
languages.
 two values : true, false.
 Many operations including: AND , OR, NOT etc.
Structural and Behavioral Definitions

There are two different approaches to specifying a domain : we can give a
structural definition or can give a behavioral definition. Let us see what
these two are like.
Behavioral Definition of the domain for ‘Fraction’

The alternative approach to defining the set of values for fractions does not
impose any internal structure on them. Instead it Just adds an operation that
creates fractions out of other things. such as CREATE_FRACTION(N.D)
where N is any integer. D is any non-zero integer.
The values of type fraction are defined to be the values that are produced by
this function for any valid combination of inputs. The parameter names were
chosen to suggest its intended behavior: CREATE_FRACTION(N.D) should
return a value representing the fraction N/D (N for numerator. D for
denominator).
You are probably thinking. this is crazy. CREATE_FRACTION could be any

old random function. how do we guarantee that CREATE_FRACTION(N,D)
actually returns the fraction N/D? The answer is that we have to constrain
the behavior of this function. by relating it to the other operations on
fractions. For example, One of the key properties of multiplication is that:
NORMALIZE ((N/D) .(DIN)) = 1/1
This turns into a constraint on CREATE_FRACTION:

NORMALIZE (CREATE_FRACfION(N,D) * CREATE_FRACfION(D,)) =

CREATE_FRACTION(1,1)
So you see CREATE_FRACTION cannot be any old function, its behavior is

highly constrained, because we can write down lots and lots of constraints
like this. And that's the reason we call this sort of definition behavioral,
because the definition is strictly in terms of a set of operations and
constraints or axioms relating the behavior of the operations to one another.
In this style of definition, the domain of a data type -the set of permissible
values -plays an almost negligible role. Any set of values will do, as long as
we have an appropriate set of operations to go along with it.
2.2.1 Common Structures

Let us stick with structural definitions for the moment. and briefly survey
the main kinds of data types, from a structural point of view.
 Atomic Data Types
First of all, there are atomic data types. These are data types that are
defined without imposing any structure on their values. Boolean, our first
example, is an atomic type. So are characters, as these are typically
defined by enumerating all the possible values that exist on a given
computer.
 Structured Data Types

The opposite of atomic is structured. A structured data type has a
definition that imposes structure upon its values. As we saw above,
fractions normally are a structured data type. In many structured data
types, there is an internal structural relationship, or organization, that
holds between the components. For example, if we think of an array as a
structured type, with each position in the array being a component, then
there is a structural relationship of 'followed by': we say that component
N is followed by component N+ 1.

 Structural Relationships
Not all structured data types have this sort of internal structural
relationship. Fractions are structured, but there is no internal relationship
between the sign, numerator, and denominator. But many structured
data types do have an internal structural relationship, and these can be
classified according to the properties of this relationship.
 Linear Structure:
The most common organization for components is a linear structure. A
structure is linear if it has these 2 properties:
Property P1 Each element is 'followed by' at most one other element.
Property P2 No two elements are 'followed by' the same element.
‘An array is an example of a linearly structured data type’. We generally
write a linearly structured data type like this:
A->B->C->D (this is one value with 4 parts).
- counter example 1 (violates Pl): A points to B and C B<-A->C
- counter example 2 (violates P2): A and B both point to C A->C<-B
2.2.2 Abstract Data Types

Handling Problems Figure: Create a
The first thing with which one is confronted when model
writing programs is the problem. Typically you are from a problem .with
abstraction.
confronted with "real-life" problems and you want
to make life easier by providing a program for the
problem. However. real-life problems are
nebulous and the first thing you have to do is to
try to understand the problem to separate
necessary from unnecessary details: You try to
obtain your own abstract view, or model. of the
problem. This process of modeling is

called ’abstraction’ and is illustrated in Figure. The model defines an

abstract view to the problem.
This implies that the model focuses only on problem related stuff and that
you try to define properties of the problem. These properties include:
 the data which are -affected and
 the operations which are identified by the problem
It is said that "computer science is the science of abstraction." But what

exactly is abstraction? Abstraction is "the idea of a quality thought of apart
from any particular object or real thing having that quality. For example. we
can think about the size of an object without knowing what that object is.
Similarly, we can think about the way a car is driven without knowing Its
model or make.
As an example consider the administration of employees in an institution.

The head of the administration comes to you and ask you to create a
program which allows to administer the employees. Well. this is not very
specific. For example, what employee information is needed by the
administration? What tasks should be allowed? Employees are real persons
who can be characterized with many properties; very few are: name. size.
date of birth. shape. social number, room number. hair color, hobbies.
Certainly not all of these properties are necessary to solve the

administration problem. Only some of them are problem specific.
Consequently you create a model of an employee for the problem. This
model only implies properties which are needed to fulfill the requirements of
the administration. for instance name, date of birth and social number.
These properties are called the data of the (employee) model. Now you
have described real persons with help of an abstract employee.
Of course, the pure description is not enough. There must be some

operations defined with which the administration is able to handle the
abstract employees. For example there must be an operation which allows

you to create a new employee once a new person enters the institution.
Consequently, you have to identify the operations which should be able to
be performed on an abstract employee. You also decide to allow access to
the employees' data only with associated operations. This allows you to
ensure that data elements are always in a proper state. For example you
are able to check if a provided date is valid.
Abstraction is used to suppress irrelevant details while at the same time

emphasizing relevant ones. The benefit of abstraction is that it makes it
easier for the programmer to think about the problem to be solved.
To sum up. abstraction is the structuring of a nebulous problem into well-

defined entities by defining their data and operations. Consequently, these
entities combine data and operations. They are not decoupled from each
other.
 Abstract Data Types
A variable in a procedural programming language such as Fortran,
Pascal, C, etc. is an abstraction. The abstraction comprises a number of
attributes -name. address. value. lifetime. scope. type, and size. Each
attribute has an associated value. For example, if we declare an integer
variable in C & C++. int x, we say that the name attribute has value "x"
and that the type attribute has value “int".
Unfortunately, the terminology can be somewhat confusing: The word

"value" has two different meanings-in one instance it denotes one of the
attributes and in the other it denotes the quantity assigned to an attribute.
For example, after the assignment statement x = 5, the value attribute
has the value five.
The name of a variable is the textual label used to refer to that variable
in the text of the source program. The address of a variable denotes is

location in memory. The value attribute is the quantity which that

variable represents. The lifetime of a variable is the interval of time
during the' execution of the program in which the variable is said to exist.
The scope of a variable is the set of statements in the text of the source
program in which the variable is said to be visible. The type of a variable
denotes the set of values which can be assigned to the value attribute
and the set of operations which can be performed on the variable.
Finally. the size attribute denotes the amount of storage required to
represent the variable.
The process of assigning a value to an attribute is called binding. When
a value is assigned to an attribute. that attribute is said to be bound to
the value. Depending on the semantics of the programming language,
and on the attribute in question. The binding may be done statically by
the compiler or dynamically at run-time. For example. in Java the type of
a variable is determined at ‘compile time-static binding’. On the other
hand, the value of a variable is usually not determined until ‘run-time-
dynamic binding’..
Here we are concerned primarily with the type attribute of a variable.
The type of a variable specifies two sets:
 a set of values; and,
 a set of operations.
For example, when we declare a variable, say x, of type int, we know that
x can represent an integer in the range (-231, 231-1) and that we can perform
operations on x such as addition, subtraction, multiplication, and division.
The type int is an abstract data type in the sense that we can think about the
qualities of an int apart from any real thing having that quality. In other
words, we don't need to know how ints are represented nor how the.
operations are implemented to be able to be. able to use them or reason
about them.
In designing object-oriented programs, one of the primary concerns of the

programmer is to develop an appropriate collection of abstractions for the
application at hand, and then to define suitable abstract data types to
represent those abstractions. In so doing, the programmer must be
conscious of the fact that defining an abstract data type requires the
specification of both a set of values and a set of operations on those values.
Indeed, it has been only since the advent of the so-called object-oriented
programming languages that the we see programming languages which
provide the necessary constructs to properly declare abstract data types.
For example, in Java, the class construct is the means by which both a set
of values and an associated set of operations is declared. Compare this with
the struct construct of C or Pascal's record, which only allow the
specification of a set of values!
2.2.2.1 Properties of Abstract Data Types

The example of the quoted before shows, that with abstraction you create a
well-defined entity which can be properly handled. These entities define the
data structure of a set of items. For example, each administered employee
has a name, date of birth and social number. The data structure can only be
accessed with defined operations. This set of operations is called interface
and abstract data type is exported by the entity. An entity with the properties
just described is called an abstract data type
(ADT).
Abstract data type
Figure shows an ADT which consists of an
Abstract data Structure
abstract data structure and operations. Only the
operations are viewable from the outside and Operations Interface
define the interface. Once a new employee is
"created" the data structure is filled with actual
values: You now have an instance of an abstract employee. You can create

as many instances of an abstract employee as needed to describe every

real employed person.
Let's try to put the characteristics of an ADT in a more formal way:
Definition An abstract data type (ADT) is characterized by the following

properties:
1. It exports a type.
2. It exports a set of operations. This set is called interface.
3. Operations of the interface are the one and only access mechanism to
the type's data structure.
4. Axioms and preconditions define the application domain of the type.
With the first property it is possible to create more than one instance of an
ADT as exemplified with the employee example.
Example of the fraction data type, how might we actually implement this
data type in C?
Implementation 1:
typedef struct { int numerator, denominator; } fraction;
main()
{
fraction f;
f.numerator = 1;
f.denominator = 2;
……………
}
Implementation 2 :
#define numerator 0
#define denominator 1
typedef int fraction[2];
main()
{
fraction f;
f[numerator] = 1;
f[denominator] = 2;
……………
}
These are just 2 of many different possibilities. Obviously these differences

are in some sense extremely trivial -they do not affect the domain of values
or meaning of the operations of fractions.
2.2.2.2 Generic Abstract Data Types

ADTs are used to define a new type from which instances can be created.
For instance, one of lists of apples, cars or even lists. The semantically the
definition of a list is always the same. Only the type of the data elements
change according to what type the list should operate on.
This additional information could be specified by a generic parameter which

is specified at instance creation time. Thus an instance of a generic ADT is
actually an instance of a particular variant the ADT. A list of apples can
therefore be declared as follows:
List<Apple> listOfApples;
The angle brackets now enclose the data type for which a variant of the
generic ADT List should be created. ListOf Apples offers the same interface
as any other list, but operates on of type Apple.
Notation :
As ADTs provide an abstract view to describe properties of sets of entities,
their use is independent from a particular programming language. We
therefore introduce a notation here. Each ADT description consists of two
parts:

- Data: This part describes the structure of the data used in the ADT in an
informal way.
- Operations: This part describes valid operations for this ADT, hence, it
describes its interface. We use the special operation constructor to
describe the actions which are to be performed once an entity of this
ADT is created and destructor to describe the actions which are to be
performed once an entity is destroyed. For each operation the provided
arguments as well as preconditions and postconditions are given.
As an example the description of the ADT Integer is presented. Let k be an

integer expression:
 ADT integer is
Data
A sequence of digits optionally prefixed by a plus or minus sign. We refer to
this signed whole number as N.
Operations
Constructor
Creates a new integer.
add(k)
Creates a new integer which is the sum of N and k.
Consequently, the postcondition of this operation is sum = N+k. Don't

confuse this with assign statements as used in programming languages, It
is rather a mathematical equation which yields "true" for each value sum, N
and k after add has been performed.
sub(k)
similar to add. this operation creates a new integer of the difference of both
integer values. Therefore the postcondition for this operation is sum = N-k.

Set(k)
Set N to k. The postcondition for this operation is N = k
……
end
The description above is a specification for the ADT Integer. Please notice,
that we use words for names of operations such as "add". We could use the
more intuitive "+" sign instead, but this may lead to some confusion: You
must distinguish the operation "+" from the mathematical use of "+" in the
postcondition. The name of the operation is just syntax whereas the
semantics is described by the associated pre- and postconditions. However,
it is always a good idea to combine both to make reading of ADT
specifications easier.
Real programming languages are free to choose an arbitrary

implementation for an ADT. For example, they might implement the
operation add with the infix operator "+" leading to more intuitive look for
addition of integers.
2.2.2.3 Programming with Abstract Data Types

By organizing our program this way -i.e. by using abstract data types – we
can change implementations extremely quickly: all we have to do is
re-implement three very trivial functions. No matter how large our application
is.
In general terms, an abstract data type is a. specification of the values and
the operations that has 2 properties:
1. it specifies everything you need to know in order to use the datatype
2. it makes absolutely no reference to the manner in which the datatype
will be implemented.

Application
Use the ADT
Specification
Defines
the ADT
Implementation
Implements the ADT
When we use abstract data types, our programs into two pieces:
The Application: The part that uses the abstract datatype.
The implementation: The part that implements the abstract data type.
These two pieces are completely independent. It should be possible to take

the implementation developed for one application and use it for a completely
different application with no changes.
If programming in teams, implementers and application-writers can work

completely independently once the specification is set.
Specification
Let us now look in detail at how we specify an abstract datatype. We will use
'stack' as an example. The data structure stack is based on the everyday
notion of a stack, such as a stack of books, or a stack of plates. The defining
property of a stack is that you can only access the top element of the stack,
all the other elements are underneath the top one and can't be accessed
except by removing all the elements above them one at a time.
The notion of a stack is extremely useful in computer science, it has many

applications, and is so widely used that microprocessors often are stack-
based or at least provide hardware implementations of the basic stack

operations.
First, let us see how we can define, or specify, the abstract concept of a
stack. The main thing to notice here is how we specify everything needed in
order to use stacks without any mention of how stacks will be implemented.

1. Define Structural and Behavioral definitions.
2. Define abstract data type?
3. Discuss the properties of ADT?
2.3 Pre and Post Conditions

2.3.1 Preconditions
These are properties about the inputs that are assumed by an operation. If
they are satisfied by the inputs, the operation is guaranteed to work properly.
If the preconditions are not satisfied, the operation's behavior is unspecified:
it might work properly (by chance), it might return an incorrect answer, it
might crash.
2.3.2 Postconditions
Specify the effects of an operation. These are the only things you may
assume have been done by the operation. They are only guaranteed to hold
if the preconditions are satisfied.
Note: The definition of the values of type 'stack' make no mention of an

upper bound on the size of a stack. Therefore, the implementation must
support stacks of any size. In practice, there is always an upper bound -the
amount of computer storage available. This limit is not explicitly mentioned,
but is understood -it is an implicit precondition on all operations that there is
storage available, as needed. Sometimes this is made explicit, in which

case it is advisable to add an operation that tests if there is sufficient storage

available for a given operation.
Operations
The operations specified before are core operations -any other operation on
stacks can be defined in terms of these ones. These are the operations that
we must implement in order to implement 'stacks', everything else in our
program can be independent of the implementation details.
lt is useful to divide operations into four kinds of functions:

1. Those that create stacks out of non-stacks, e.g. CREATE_STACK,
READ_STACK, CONVERT_ARRAY _TO_STACK
2. Those that 'destroy' stacks (opposite of create) e.g. DESTROY_STACK
3. Those that 'inspect' or 'observe' a stack, e.g. TOP, IS_EMPTY,
WRITE_STACK
4. Those that takes stacks (and possibly other things) as input and produce
other stacks as output, e.g. PUSH, POP
A specification must say what an operation's input and outputs are, and
definitely must mention when an input is changed. This falls short of
completely committing the implementation to procedures or functions (or
whatever other means of creating 'blocks' of code might be available in the
programming language). Of course, these details eventually need to be
decided in order for code to actually be written. But these details do not
need to be decided until code-generation time; throughout the earlier stages
of program design, the exact interface (at code level) can be left unspecified.
2.3.3 Checking Pre Conditions

It is very important to state in the specification whether each precondition
will be checked by the user or by the implementer. For example, the
precondition for POP may be checked either by the procedure(s) that call

POP or within the procedure that implements POP? Either way is possible.
Here are the pros and cons of the 2 possibilities:
User Guarantees Preconditions

The main advantage, if the user checks preconditions -and therefore
guarantees that they will be satisfied when the core operations are invoked -
is efficiency. For example, consider the following:
PUSH(S, 1);
POP(S);
It is obvious that there is no need to check if S is empty -this precondition
of POP is guaranteed to be satisfied because it is a postcondition of PUSH.
2.3.4 Implementation Checks Preconditions

There are several advantages to having the implementation check its own
preconditions:
1. It sometimes has access to information not available to the user (e.g.
implementation details about space requirements), although this is often
a sign of a poorly constructed specification.
2. Programs won't bomb mysteriously -errors will be detected (and
reported?) at the earliest possible moment. This is not true when the
user checks preconditions, because the user is human and occasionally
might forget to check, or might think that checking was unnecessary
when in fact it was needed.
3. Most important of all, if we ever change the specification, and wish to
add, delete, or modify preconditions, we can do this easily, because the
precondition occurs in exactly one place in our program.
There are arguments on both sides. The literatures specifies that

procedures should signal an error if their preconditions are not satisfied.
This means that these procedures must check their own preconditions.

That's what our model solutions will do too. We will thereby sacrifice some
efficiency for a high degree of maintainability and robustness.
An additional possibility is to selectively include or exclude the

implementation's condition checking code, e.g. using #ifdef:
#ifdef SAFE
if (! condition) error("condition not satisfied");
#endif
This code will get included only if we supply the DSAFE argument to the
compiler (or otherwise define SAFE). Thus, in an application where the user
checks carefully for all preconditions, we have the option of omitting all
checks by the implementation.

1. Explain the pre and Post conditions with an suitable example.
2. Discuss the advantages of implementation checks preconditions.
2.4 Linear Data Structure

2.4.1 The Array Data Structure
As an example, most programming languages have an array type as one of
the built-in types. We will define an array as a homogeneous, ordered, finite,
fixed-length list of elements. To further define these terms in the context of
an array:
a) homogeneous -every element is the same
b) ordered -there is a next and previous in the natural order of the structure
c) finite -there is a first and last element
d) fixed-length -the list size is constant

Mapping the array to the three levels of a data structure:

1. At the abstract level
 Accessing mechanism is direct, random access
 Construction operator
 Storage operator
 Retrieval operator
2. At the application level
 Used to model lists (characters, employees. etc).
3. At the implementation level
 Allocate memory through static or dynamic declarations
 Accessing functions provided -[ ] and =.
2.4.3 Using an Array and Lists as a Data Structure

An array can be used to implement containers.
Given an index (i.e. subscript), values can be quickly fetched and/or stored in
an array. Adding a value to the end of an array is fast (particularly if a variable
is used to indicate the end of the array); however, inserting a value into an
array can be time consuming because existing elements must be rotated.
Since array elements are typically stored in contiguous memory locations,

looping through an array can be done easily and efficiently.
When elements of an array are sorted, then binary searching can be used to
find particular values in the array. If the array elements are not sorted, then
a linear search must be used. After an array has been defined, its length (i.e.
number of elements) cannot be changed.
Arrays: Fast and Slow

The following are some comments on the efficiency of arrays:
a) Changing the length of an array can be slow.

b) Inserting elements at the end of an array is fast (assuming the index of

the end-of array is stored; if you have to search for the end-of-array,
then this operation is slow).
c) Inserting elements near the beginning of an array can be slow.
d) Accessing an array element using an index is fast.
e) Searching a non-sorted array for a value can be slow.
f) Searching a sorted array for a value can be fast.
2.4.3 Elementary Data Structures

“Mankind's progress is measured by the number of things we can do without
thinking." Elementary data structures such as stacks, queues, lists, and
heaps will be the "of-the- shelf' components we build our algorithm from.
There are two aspects to any data structure:
1) The abstract operations which it supports.
2) The implementation of these operations.
The fact that we can describe the behavior of our data structures in terms of
abstract operations explains why we can use them without thinking, while
the fact that we have different implementation of the same abstract
operations enables us to optimize performance.
In this book we consider a variety of abstract data types (ADTs), including

stacks, queues, deques, ordered lists, sorted lists, hash tables, trees,
priority queues. In just about every case, we have the option of
implementing the ADT using an array or using some kind of linked data
structure.
Because they are the base upon which almost all of the ADTs are built, we
call the array and the linked list the foundational data structures. It is
important to understand that we do not view the array or the linked list as
ADTs, but rather as alternatives for the implementation of ADTs.

Arrays
Probably the most common way to aggregate data is to use an array. In C
an array is a variable that contains a collection of objects, all of the same
type.
For example, int a[5]; allocates an array of five integers and assigns it to the
variable a.
The elements of an array are accessed using integer-valued indices. In C

the first element of an array always has index zero. Thus, the five elements
of array a are a[0] ,a[1]…..a[4]. All arrays in C have a length, the value of
which is equal to the number of array elements.
How are C arrays represented in the memory of the computer? The

specification of the C language leaves this up to the system implementers.
However, Figure illustrates a typical implementation scenario.
The elements of an array typically occupy consecutive memory locations.

That way given i, it is possible to find the position of a[I] in constant time. On
the basis of Figure. we can now estimate the total storage required to
represent an array. Let S(n) be the total storage (memory) needed to
represent an array of n ints. S(n) is given by
S(n)  size of (int[n])  (n+ 1) size of (int.)

where the function size of (x) is the number of bytes used for the memory
representation of an instance of an object of type x.
In C the sizes of the primitive data types are fixed constants. Hence size of
(int.) = 0(1)
In practice. an array object may contain additional fields. For example. it is

reasonable to expect that there is a field which records the position in
memory of the first array element. In any event the overhead associated
with a fixed number of fields is 0(1). Therefore, S(n)=O(n).
Multi-Dimensional Arrays
A multi-dimensional array of dimension n (i.e. an n-dimensional array or
simply n-D array) is a collection of items which is accessed via n subscript
expressions. For example. in a language that supports it. (i, j)th the element
of the two-dimensional array x is accessed by writing x[i,j].
The C programming language does not really support multi-dimensional

arrays. It does however support arrays of arrays. In C a two-dimensional
array x is really an array of one- dimensional arrays:
int x[3][5];
The expression x[i] selects the ith one-dimensional array; the expression
x[i][j]selects the j th element from that array.
The built-in multi-dimensional arrays suffer the same indignities that simple
one-dimensional arrays do: Array indices in each dimension range from zero
to length –1, where length is the array length in the given dimension. There
is no array assignment operator. The number of dimensions and the size of
each dimension is fixed once the array has been allocated.

1. Write the advantages of linear data structure.
2. Write points on the efficiency of arrays in contact to data structure.

2.5 What the application needs ?

Terms describing the data structure from the point of view of the application.
which only cares how it behaves and not how it is implemented.
List
Generic term for a collection of objects. May or may not contain duplicates.
Application may or may not require that it be kept in a specified order.
Ordered list
A list in which the order matters to the application. Therefore for example.
the implementer cannot scramble the order to improve efficiency.
Set
List where the order does not matter to the application (implementer can
pick order so as to optimize performance) and in which there are no
duplicates.
Multi-set
Like a set but may contain duplicates.
Double-ended queue (dequeue)

An ordered list in which insertion and deletion occur only at the two ends of
the list. That is elements cannot be inserted into the middle of the list or
deleted from the middle of the list.
Stack
An ordered list in which insertion and deletion both occur only at one end
(e.g. at the start).
Queue
An ordered list in which insertion always occurs at one end and deletion
always occurs at the other end.

Ordered Lists and Sorted Lists

The most simple yet one of the most versatile containers is the list. In this
section we consider lists as abstract data types. A list is a series of items. In
general, we can insert and remove items from a list and we can visit all the
items in a list in the order in which they appear.
In this section we consider two kinds of lists-ordered lists and sorted lists. In
an ordered list the order of the items is significant. The order of the items in
the list corresponds to the order in which they appear in the book. However,
since the chapter titles are not sorted alphabetically, we cannot consider the
list to be sorted. Since it is possible to change the order of the chapters in
book, we must be able to do the same with the items of the list. As a result,
we may insert an item into an ordered list at any position.
On the other hand, a sorted list is one in which the order of the items is
defined by some collating sequence. For example, the index of this book is
a sorted list. The items in the index are sorted alphabetically. When an item
is inserted into a sorted list, it must be inserted at the correct position.
Ordered Lists
An ordered list is a list in which the order of the items is significant. However,
the items in an ordered lists are not necessarily sorted. Consequently, it is
possible to change the order of items and still have a valid ordered list.
A searchable container is a container that supports the following additional

operations:
1) insert: used to put objects into the container;
2) withdraw: used to remove objects from the container;
3) find: used to locate objects in the container;
4) isMember: used to test whether a given object instance is in the
container.

Sorted Lists
The next type of searchable container that we consider is a sorted list. A
sorted list is like an ordered list: It is a searchable container that holds a
sequence of objects. However, the position of an item in a sorted list is not
arbitrary .The items in the sequence appear in order, say, from the smallest
to the largest. Of course, for such an ordering to exist, the relation used to
sort the items must be a total order.
Lists-Array Based Implementation :

Deleting and inserting an item requires moving up and pushing down the
existing items (O(n) in the worst case)
Linked Lists
Makes use of pointers, and it is dynamic. Made up of series of objects
called the nodes. Each node contains a pointer to the next node. This is
remove process (insertion works in the opposite way).
Comparison of List Implementations

Array-Based Lists: [Average and worst cases]

 Insertion and deletion are O(n).

 Direct access is O(1)
 Array must be allocated in advance
 No overhead if all array positions are full
Linked Lists:
 Insertion and deletion O(1)
 Direct access is O(n)
 Finding predecessor is O(n)
 Space grows with number of elements
 Every element requires overhead.
Linked Lists
Elements of array connected by contiguity
 Reside in contiguous memory
 Static (compile time) allocation (typically)
Elements of linked list connected by pointers

 Reside anywhere in memory
 Dynamic (run time) allocation
2.6 Implementation methods

There are a variety of options for the person implementing a list (or set or
stack or whatever).
a) array
We all know what arrays are. Arrays are included here because a list can be
implemented using a I D array. If the maximum length of the list is not
known in advance. code must be provided to detect array overflow and
expand the array. Expanding requires allocating anew, longer array, copying
the contents of the old array, and deallocating the old array.

Arrays are commonly used when two conditions hold. First the maximum
length of the list can be accurately estimated in advance (so array
expansion is rarely needed). Second, insertion and deletion occur only at
the ends of the list. (Insertion and deletion in the middle of an array-based
list is slow.)
b) linked list
A list implemented by a set of nodes, each of which points to the next. An
object of class (or struct) "node" contains a field pointing to the next node,
as well as any number of fields of data. Optionally, there may be a second
"list" class (or struct) used as a header for the list. One field of the list class
is a pointer to the first node in the list. Other fields may also be included in
the "list" object, such as a pointer to the last node in the list, the length of the
list, etc.
Linked lists are commonly used when the length of the list is not known in
advance and/or when it is frequently necessary to insert and/or delete in the
middle of the list.
c) doubly-linked vs. singly-linked lists

In a doubly-linked list, each node points to the next node and also to the
previous node. In a singly-linked list, each node points to the next node but
not back to the previous node.
d) circular list
A linked list in which the last node points to the first node. If the list is
doubly-linked, the first node must also point back to the last node.
2.7 Non Linear Data Structures

2.7.1 Trees
we consider one of the most Important non-linear Information structures-
trees. A tree Is often used to represent a hierarchy. This is because the

relationships between the Items In the hierarchy suggest the branches of a

botanical tree.
For example, a tree-like organization charts often used to represent the lines
of responsibility in a business as shown in Figure. The president of the
company is shown at the top of the tree and the vice-presidents are
indicated below her. Under the vice-presidents we find the managers and
below the managers the rest of the clerks. Each clerk reports to a manager.
Each manager reports to a vice-president, and each vice-president reports
to the president.
It just takes a little imagination to see the tree in Figure. Of course. The tree
is upside-down. However, this is the usual way the data structure is drawn.
The president is called the root of the tree and the clerks are the leaves.
A tree is extremely useful for certain kinds of computations. For example.

Suppose we wish to determine the total salaries paid to employees by
division or by department. The total of the salaries in division A can be found
by computing the sum of the salaries paid in departments Al and A2 plus the
salary of the vice-president of division A. Similarly. The total of the salaries
paid in department Al is the sum of the salaries of the manager of
department Al and of the two clerks below her.
Clearly, in order to compute all the totals. It is necessary to consider the

salary of every employee. Therefore, an implementation of this computation

must visit all the employees in the tree. An algorithm that systematically
visits all the items in a tree is called a tree traversal.
In the same chapter we consider several different kinds of trees as well as

several different tree traversal algorithms. In addition. We show how trees
can be used to represent arithmetic expressions and how we can evaluate
an arithmetic expression by doing a tree traversal. The following is a
mathematical definition of a tree:
Definition (Tree) A tree T is a finite. Non-empty set of nodes ,

T = {r} U TI, U T2 U …U Tn with the following properties:
3. A designated node of the set, r, is called the root of the tree: and
4. The remaining nodes are partitioned into n≥ O subsets T, T. …Tn each
of which is a tree for convenience, we shall use the notation T= {r. T,
T, …T} denote the tree T.
Notice that Definition is recursive-a tree is defined in terms of itself!

Fortunately, we do not have a problem with infinite recursion because every
tree has a finite number of nodes and because in the base case a tree has
n=0 subtrees.
It follows from Definition that the minimal tree is a tree comprised of a single
root node. For example Ta = {A}.
Finally. The following Tb = {B, {C}} is also a tree
Ta = {D, {E. {F}}, {G.{H,II}}, {J, {K}. {L}}, {M}}}
How do Ta Tb. & Tc resemble their arboreal namesake? The similarity

becomes apparent when we consider the graphical representation of these
trees shown in Figure. To draw such a pictorial representation of a tree, T =
{r. T1 ,T2, …Tn, beside each other below the root. Finally, lines are drawn
from rto the roots of each of the subtrees. T1T2…….Tn

Figure : Examples of trees.
Of course, trees drawn in this fashion are upside down. Nevertheless, this is
the conventional way in which tree data structures are drawn. In fact, it is
understood that when we speak of “up” and “down,” we do so with respect
to this pictorial representation. For example, when we move from a root to a
subtree, we will say that we are moving down the tree.
The inverted pictorial representation of trees is probably due to the way that
genealogical lineal charts are drawn. A lineal chart is a family tree that
shows the descendants of some person. And it is from genealogy that much
of the terminology associated with tree data structures is taken.
Figure shows one representation of the tree Tc defined in Equation. In this

case, the tree is represented as a set of nested regions in the plane. In fact,
what we have is a Venn diagram which corresponds to the view that a tree
is a set of sets.
Figure: An alternate graphical representation for trees.
2.7.2 Binary Tree

Used to implement lists whose elements have a natural order (e.g. numbers)
and either (a) the application would like the list kept in this order or (b) the

order of elements is irrelevant to the application (e.g. this list is

implementing a set).
Each element in a binary tree is stored in a "node" class (or struct). Each
node contains pointers to a left child node and a right child node. In some
implementations, it may also contain a pointer to the parent node. A tree
may also have an object of a second "tree" class (or struct) which as a
header for the tree. The "tree" object contains a pointer to the root of the
tree (the node with no parent) and whatever other information the
programmer wants to squirrel away in it (e.g. number of nodes currently in
the tree).
In a binary tree, elements are kept sorted in left to right order across the tree.
That is if N is a node, then the value stored in N must be larger than the
value stored in left-child(N) and less than the value stored in right-child(N).
Variant trees may have the opposite order (smaller values to the right rather
than to the left) or may allow two different nodes to contain equal values.
2.7.3 Hash Tables

A very common paradigm in data processing involves storing information in
a table and then later retrieving the information stored there. For example,
consider a database of driver's license records. The database contains one
record for each driver's license issued. Given a driver's license number. We
can look up the information associated with that number. Similar operations
are done by the C compiler. The compiler uses a symbol table to keep track
of the user-defined symbols in a Java program. As it compiles a program,
the compiler inserts an entry in the symbol table every time a new symbol is
declared. In addition, every time a symbol is used, the compiler looks up the
attributes associated with that symbol to see that it is being used correctly.
Typically the database comprises a collection of key-and-value pairs.

Information is retrieved from the database by searching for a given key. In

the case of the driver'~ license database, the key is the driver's license
number and in the case of the symbol table, the key is the name of the
symbol.
In general, an application may perform a large number of insertion and/ or

look-up operations. Occasionally it is also necessary to remove items from
the database. Because a large number of operations will be done we want
to do them as quickly as possible.
Hash tables are a very practical way to maintain a dictionary. As with bucket
sort, it assumes we know that the distribution of keys is fairly well-behaved.
Once you have its index. A hash function is a mathematical function which
maps keys to integers.
In bucket sort, our hash function mapped the key to a bucket based on the
first letters of the key. "Collisions" were the set of keys mapped to the same
bucket. If the keys were uniformly distributed. then each bucket contains
very few keys!
The resulting short lists were easily sorted, and could just as easily be
searched
We examine data structures which are designed specifically with the

objective of providing efficient insertion and find operations. In order to meet
the design objective certain concessions are made. Specifically, we do not
require that there be any specific ordering of the items in the container. In
addition, while we still require the ability to remove items from the container,
it is not our primary objective to make removal as efficient as the insertion
and find operations.

Ideally we would' build a data structure for which both the insertion and find
operations are 0(1) in the worst case. However, this kind of performance
can only be achieved with complete a priori knowledge. We need to know
beforehand specifically which items are to be inserted into the container.
Unfortunately, we do not have this information in the general case. So, if we
cannot guarantee 0(1) performance in the worst case, then we make it our
design objective to achieve 0(1) performance in the average case.
The constant time performance objective immediately leads us to the

following conclusion: Our implementation must be based in some way K\h
element of an array in constant time, whereas the same operation in a
linked list takes O{k) time.
In the previous section, we consider two searchable containers-the ordered

list and the sorted list. In the case of an ordered list, the cost of an insertion
is 0(1) and the cost of the find operation is O(n). For a sorted list the cost of
insertion is O(n) and the cost of the find operation is O(log n) for the array
implementation.
Clearly, neither the ordered list nor the sorted list meets our performance
objectives. The essential problem is that a search, either linear or binary, is
always necessary. In the ordered list, the find operation uses a linear search
to locate the item. In the sorted list, a binary search can be used to locate
the item because the data is sorted. However, in order to keep the data
sorted, insertion becomes O(n).
In order to meet the performance objective of constant time insert and find
operations. we need a way to do them without performing a search. That is,
given an item x, we need to be able to determine directly from x the array
position where it is to be stored.

Hash Functions
It is the job of the hash function to map keys to integers. A good hash
function:
1. Is cheap to evaluate
2. Tends to use all positions from O...M with uniform frequency.
3. Tends to put similar keys in different parts of the tables (Remember the
Shifletts!!)
The first step is usually to map the key to a big integer, for example
k=wth
h =  1284 x char (key[I])
1=0
This last number must be reduced to an integer whose size is between

1 and the size of our hash table. One way is by h(k) = k mod M where M is
best a large prime not too close to 2i -1, which would just mask off the high
bits. This works on the same principle as a roulette wheel!

1. Define Trees. Discuss its usage in different applications.
2. Write note on:
a) Binary Tree b) Hash Tables
2.8 Summary
This unit covers all overview and concepts of data structure with its
applications. Data structures represent places to store data for use by a
computer program. As you would imagine, this describes a spectrum of data
storage techniques, from the very simple to the very complex. We can look
at this progression, from the simple to the complex, At the lowest level, there
are data structures supplied and supported by the CPU (or computer chip),
itself. These vary from chip to chip, but are almost always of the very

primitive sort. They typically include the simple data types, such as integers,
characters, floating point numbers, and bit strings. On these contacts
discussed the various structured data types, Abstract data types, Linear and
non linear data structure.
2.9 Terminal Questions

1. Define Data Structure? Explain the types of structured data type.
2. Explain Abstract data types with its characteristics.
3. Discuss the linear data structure with suitable example.
4. Discuss the various types of data structure applications.
5. Write note on:
a) Elementary Data Structures
b) Ordered list
c) Linked list
d) Queue
e) Slack
f) Binary tree
g) Hash tables

02-Unit2

Uploaded by

02-Unit2

Uploaded by

Data Structures using ‘C’ Unit 2

Unit 2 Overview of Data Structures

Sikkim Manipal University Page No.: 26

2.5 What the application needs ?

Sikkim Manipal University Page No.: 27

2.1.1 What is a Data Structure?

Sikkim Manipal University Page No.: 28

Based on the concept of Abstract Data Types (ADT), we define a data

2.1.2 Definition of data structure

Sikkim Manipal University Page No.: 29

2.1.3 The Abstract Level

2.1.4 The Application Level

2.1.5 Implementation Level

Self Assessment Questions

2.2 Data Types and Structured Data Type

Sikkim Manipal University Page No.: 30

 a domain(= a set of values)

Structural and Behavioral Definitions

Behavioral Definition of the domain for ‘Fraction’

You are probably thinking. this is crazy. CREATE_FRACTION could be any

This turns into a constraint on CREATE_FRACTION:

Sikkim Manipal University Page No.: 31

NORMALIZE (CREATE_FRACfION(N,D) * CREATE_FRACfION(D,)) =

So you see CREATE_FRACTION cannot be any old function, its behavior is

2.2.1 Common Structures

 Structured Data Types

Sikkim Manipal University Page No.: 32

2.2.2 Abstract Data Types

Sikkim Manipal University Page No.: 33

called ’abstraction’ and is illustrated in Figure. The model defines an

It is said that "computer science is the science of abstraction." But what

As an example consider the administration of employees in an institution.

Certainly not all of these properties are necessary to solve the

Of course, the pure description is not enough. There must be some

abstract employees. For example there must be an operation which allows

Abstraction is used to suppress irrelevant details while at the same time

To sum up. abstraction is the structuring of a nebulous problem into well-

Unfortunately, the terminology can be somewhat confusing: The word

Sikkim Manipal University Page No.: 35

location in memory. The value attribute is the quantity which that

In designing object-oriented programs, one of the primary concerns of the

2.2.2.1 Properties of Abstract Data Types

Sikkim Manipal University Page No.: 37

as many instances of an abstract employee as needed to describe every

Let's try to put the characteristics of an ADT in a more formal way:

Definition An abstract data type (ADT) is characterized by the following

These are just 2 of many different possibilities. Obviously these differences

2.2.2.2 Generic Abstract Data Types

This additional information could be specified by a generic parameter which

Sikkim Manipal University Page No.: 39

As an example the description of the ADT Integer is presented. Let k be an

Consequently, the postcondition of this operation is sum = N+k. Don't

Sikkim Manipal University Page No.: 40

Real programming languages are free to choose an arbitrary

2.2.2.3 Programming with Abstract Data Types

Sikkim Manipal University Page No.: 41

Use the ADT

Implements the ADT

The Application: The part that uses the abstract datatype.

These two pieces are completely independent. It should be possible to take

If programming in teams, implementers and application-writers can work

The notion of a stack is extremely useful in computer science, it has many

based or at least provide hardware implementations of the basic stack

Self Assessment Questions

2.3 Pre and Post Conditions

Note: The definition of the values of type 'stack' make no mention of an

Sikkim Manipal University Page No.: 43

case it is advisable to add an operation that tests if there is sufficient storage

lt is useful to divide operations into four kinds of functions:

2.3.3 Checking Pre Conditions

Sikkim Manipal University Page No.: 44

User Guarantees Preconditions

2.3.4 Implementation Checks Preconditions

There are arguments on both sides. The literatures specifies that

Sikkim Manipal University Page No.: 45

An additional possibility is to selectively include or exclude the