02-Unit2
02-Unit2
Structure:
2.1 Introduction
2.1.1 What is a Data Structure?
2.1.2 Definition of data structure
2.1.3 The Abstract Level
2.1.4 The Application Level
2.1.5 Implementation Level
Self Assessment Questions
2.2 Data Types and Structured Data Type
2.2.1 Common Structures
2.2.2 Abstract Data Types
2.2.2.1 Properties of Abstract Data Types
2.2.2.2 Generic Abstract Data Types
2.2.2.3 Programming with Abstract Data Types
Self Assessment Questions
2.3 Pre and Post Conditions
2.3.1 Preconditions
2.3.2 Postconditions
2.3.3 Checking Pre & Post Conditions
2.3.4 Implementation Checks Preconditions
Self Assessment Questions
2.4 Linear Data Structure
2.4.1 The Array Data Structure
2.4.2 Using an Array and Lists as a Data Structure
2.4.3 Elementary Data Structures
Self Assessment Questions
2.1 Introduction
Data structures represent places to store data for use by a computer
program. As you would imagine, this describes a spectrum of data storage
techniques, from the very simple to the very complex. We can look at this
progression, from the simple to the complex, in the following way.
At the lowest level, there are data structures supplied and supported by the
CPU (or computer chip), itself. These vary from chip to chip, but are almost
always of the very primitive sort. They typically include the simple data types,
such as integers, characters, floating point numbers, and bit strings. To
some extent, the data types supported by a chip reflect the hardware design
of the chip. Things such as, how wide (how many bits) are the registers,
how wide is the data bus, does the ALU have an accumulator, does the ALU
support floating point operations?
At the second level of the data structures spectrum are the data structures
supported by particular programming languages. These vary a lot from
language to language. Most languages offer arrays, and many offer arrays
of arrays (matrices). Most of the popular languages provide support for
some sort of record structure. In C these are structs and in Pascal these are
records. A few offer strings as a first class data type (e.g. C++ and Java). A
few languages support linked lists directly in the language (e.g. Lisp and
Scheme). Object oriented languages often offer general lists, stacks, and
even trees.
At the top level of this taxonomy are those data structures that are created
by the programmer, using a particular programming language. In this regard,
it is important to note what tools are provided by a language to facilitate the
implementation of complex data structures envisioned by a programmer.
Things such as arrays, arrays of arrays, pointers, record structures are all
helpful in this regard. Using the available tools, a programmer can build
general lists, stacks, queues, dequeues, tress (of many types), graphs, sets,
and much, much more.
In this book we will focus on those data structures in the top level, those that
are usually created by the application programmer. These are the data
structures that. generally, impact the problem solution and implementation in
the most dramatic ways: size, efficiency, readability , and maintainability .
Objectives
At the end of this unit, you will be able to understand the:
Meaning and brief introduction of Data Structure
Discussed the various types of abstract levels
Brief introduction of Abstract data type and its properties
Operations and implementations of methods of Pre and Post Conditions.
Concepts and methods of Linear and Non Linear Data structure.
The proper choice of a data structure can lead to more efficient programs.
Some example data structures are: array, stack, queue, linked list, binary
tree, hash table, heap, and graph. Data structures are often used to build
databases. Typically, data structures are manipulated using various
algorithms.
Working with and collecting information on any subject, it doesn't take very
long before you have more data than you know how to handle. Enter the
data structure. In his book Algorithms, Data Structures and Problem Solving
with C, Mark Allen Weiss writes "A data structure is a representation of
data and the operations allowed on that data." Webopedia states, "the
term data structure refers to a scheme for organizing related pieces of
information."
The values of type fraction are defined to be the values that are produced by
this function for any valid combination of inputs. The parameter names were
chosen to suggest its intended behavior: CREATE_FRACTION(N.D) should
return a value representing the fraction N/D (N for numerator. D for
denominator).
In this style of definition, the domain of a data type -the set of permissible
values -plays an almost negligible role. Any set of values will do, as long as
we have an appropriate set of operations to go along with it.
Structural Relationships
Not all structured data types have this sort of internal structural
relationship. Fractions are structured, but there is no internal relationship
between the sign, numerator, and denominator. But many structured
data types do have an internal structural relationship, and these can be
classified according to the properties of this relationship.
Linear Structure:
The most common organization for components is a linear structure. A
structure is linear if it has these 2 properties:
Property P1 Each element is 'followed by' at most one other element.
Property P2 No two elements are 'followed by' the same element.
‘An array is an example of a linearly structured data type’. We generally
write a linearly structured data type like this:
A->B->C->D (this is one value with 4 parts).
- counter example 1 (violates Pl): A points to B and C B<-A->C
- counter example 2 (violates P2): A and B both point to C A->C<-B
This implies that the model focuses only on problem related stuff and that
you try to define properties of the problem. These properties include:
the data which are -affected and
the operations which are identified by the problem
The name of a variable is the textual label used to refer to that variable
in the text of the source program. The address of a variable denotes is
Indeed, it has been only since the advent of the so-called object-oriented
programming languages that the we see programming languages which
provide the necessary constructs to properly declare abstract data types.
For example, in Java, the class construct is the means by which both a set
of values and an associated set of operations is declared. Compare this with
the struct construct of C or Pascal's record, which only allow the
specification of a set of values!
With the first property it is possible to create more than one instance of an
ADT as exemplified with the employee example.
Example of the fraction data type, how might we actually implement this
data type in C?
Implementation 1:
typedef struct { int numerator, denominator; } fraction;
main()
{
fraction f;
f.numerator = 1;
f.denominator = 2;
……………
}
Implementation 2 :
#define numerator 0
#define denominator 1
typedef int fraction[2];
main()
Sikkim Manipal University Page No.: 38
Data Structures using ‘C’ Unit 2
{
fraction f;
f[numerator] = 1;
f[denominator] = 2;
……………
}
List<Apple> listOfApples;
The angle brackets now enclose the data type for which a variant of the
generic ADT List should be created. ListOf Apples offers the same interface
as any other list, but operates on of type Apple.
Notation :
As ADTs provide an abstract view to describe properties of sets of entities,
their use is independent from a particular programming language. We
therefore introduce a notation here. Each ADT description consists of two
parts:
- Data: This part describes the structure of the data used in the ADT in an
informal way.
- Operations: This part describes valid operations for this ADT, hence, it
describes its interface. We use the special operation constructor to
describe the actions which are to be performed once an entity of this
ADT is created and destructor to describe the actions which are to be
performed once an entity is destroyed. For each operation the provided
arguments as well as preconditions and postconditions are given.
Data
A sequence of digits optionally prefixed by a plus or minus sign. We refer to
this signed whole number as N.
Operations
Constructor
Creates a new integer.
add(k)
Creates a new integer which is the sum of N and k.
similar to add. this operation creates a new integer of the difference of both
integer values. Therefore the postcondition for this operation is sum = N-k.
Set(k)
Set N to k. The postcondition for this operation is N = k
……
end
The description above is a specification for the ADT Integer. Please notice,
that we use words for names of operations such as "add". We could use the
more intuitive "+" sign instead, but this may lead to some confusion: You
must distinguish the operation "+" from the mathematical use of "+" in the
postcondition. The name of the operation is just syntax whereas the
semantics is described by the associated pre- and postconditions. However,
it is always a good idea to combine both to make reading of ADT
specifications easier.
Application
Specification
Defines
the ADT
Implementation
When we use abstract data types, our programs into two pieces:
The implementation: The part that implements the abstract data type.
Specification
Let us now look in detail at how we specify an abstract datatype. We will use
'stack' as an example. The data structure stack is based on the everyday
notion of a stack, such as a stack of books, or a stack of plates. The defining
property of a stack is that you can only access the top element of the stack,
all the other elements are underneath the top one and can't be accessed
except by removing all the elements above them one at a time.
First, let us see how we can define, or specify, the abstract concept of a
stack. The main thing to notice here is how we specify everything needed in
order to use stacks without any mention of how stacks will be implemented.
2.3.2 Postconditions
Specify the effects of an operation. These are the only things you may
assume have been done by the operation. They are only guaranteed to hold
if the preconditions are satisfied.
Operations
The operations specified before are core operations -any other operation on
stacks can be defined in terms of these ones. These are the operations that
we must implement in order to implement 'stacks', everything else in our
program can be independent of the implementation details.
A specification must say what an operation's input and outputs are, and
definitely must mention when an input is changed. This falls short of
completely committing the implementation to procedures or functions (or
whatever other means of creating 'blocks' of code might be available in the
programming language). Of course, these details eventually need to be
decided in order for code to actually be written. But these details do not
need to be decided until code-generation time; throughout the earlier stages
of program design, the exact interface (at code level) can be left unspecified.
POP or within the procedure that implements POP? Either way is possible.
Here are the pros and cons of the 2 possibilities:
That's what our model solutions will do too. We will thereby sacrifice some
efficiency for a high degree of maintainability and robustness.
This code will get included only if we supply the DSAFE argument to the
compiler (or otherwise define SAFE). Thus, in an application where the user
checks carefully for all preconditions, we have the option of omitting all
checks by the implementation.
Given an index (i.e. subscript), values can be quickly fetched and/or stored in
an array. Adding a value to the end of an array is fast (particularly if a variable
is used to indicate the end of the array); however, inserting a value into an
array can be time consuming because existing elements must be rotated.
When elements of an array are sorted, then binary searching can be used to
find particular values in the array. If the array elements are not sorted, then
a linear search must be used. After an array has been defined, its length (i.e.
number of elements) cannot be changed.
The fact that we can describe the behavior of our data structures in terms of
abstract operations explains why we can use them without thinking, while
the fact that we have different implementation of the same abstract
operations enables us to optimize performance.
Because they are the base upon which almost all of the ADTs are built, we
call the array and the linked list the foundational data structures. It is
important to understand that we do not view the array or the linked list as
ADTs, but rather as alternatives for the implementation of ADTs.
Arrays
Probably the most common way to aggregate data is to use an array. In C
an array is a variable that contains a collection of objects, all of the same
type.
For example, int a[5]; allocates an array of five integers and assigns it to the
variable a.
where the function size of (x) is the number of bytes used for the memory
representation of an instance of an object of type x.
In C the sizes of the primitive data types are fixed constants. Hence size of
(int.) = 0(1)
Multi-Dimensional Arrays
A multi-dimensional array of dimension n (i.e. an n-dimensional array or
simply n-D array) is a collection of items which is accessed via n subscript
expressions. For example. in a language that supports it. (i, j)th the element
of the two-dimensional array x is accessed by writing x[i,j].
int x[3][5];
The expression x[i] selects the ith one-dimensional array; the expression
x[i][j]selects the j th element from that array.
The built-in multi-dimensional arrays suffer the same indignities that simple
one-dimensional arrays do: Array indices in each dimension range from zero
to length –1, where length is the array length in the given dimension. There
is no array assignment operator. The number of dimensions and the size of
each dimension is fixed once the array has been allocated.
List
Generic term for a collection of objects. May or may not contain duplicates.
Application may or may not require that it be kept in a specified order.
Ordered list
A list in which the order matters to the application. Therefore for example.
the implementer cannot scramble the order to improve efficiency.
Set
List where the order does not matter to the application (implementer can
pick order so as to optimize performance) and in which there are no
duplicates.
Multi-set
Like a set but may contain duplicates.
Stack
An ordered list in which insertion and deletion both occur only at one end
(e.g. at the start).
Queue
An ordered list in which insertion always occurs at one end and deletion
always occurs at the other end.
In this section we consider two kinds of lists-ordered lists and sorted lists. In
an ordered list the order of the items is significant. The order of the items in
the list corresponds to the order in which they appear in the book. However,
since the chapter titles are not sorted alphabetically, we cannot consider the
list to be sorted. Since it is possible to change the order of the chapters in
book, we must be able to do the same with the items of the list. As a result,
we may insert an item into an ordered list at any position.
On the other hand, a sorted list is one in which the order of the items is
defined by some collating sequence. For example, the index of this book is
a sorted list. The items in the index are sorted alphabetically. When an item
is inserted into a sorted list, it must be inserted at the correct position.
Ordered Lists
An ordered list is a list in which the order of the items is significant. However,
the items in an ordered lists are not necessarily sorted. Consequently, it is
possible to change the order of items and still have a valid ordered list.
Sorted Lists
The next type of searchable container that we consider is a sorted list. A
sorted list is like an ordered list: It is a searchable container that holds a
sequence of objects. However, the position of an item in a sorted list is not
arbitrary .The items in the sequence appear in order, say, from the smallest
to the largest. Of course, for such an ordering to exist, the relation used to
sort the items must be a total order.
Linked Lists
Makes use of pointers, and it is dynamic. Made up of series of objects
called the nodes. Each node contains a pointer to the next node. This is
remove process (insertion works in the opposite way).
Linked Lists:
Insertion and deletion O(1)
Direct access is O(n)
Finding predecessor is O(n)
Space grows with number of elements
Every element requires overhead.
Linked Lists
Elements of array connected by contiguity
Reside in contiguous memory
Static (compile time) allocation (typically)
a) array
We all know what arrays are. Arrays are included here because a list can be
implemented using a I D array. If the maximum length of the list is not
known in advance. code must be provided to detect array overflow and
expand the array. Expanding requires allocating anew, longer array, copying
the contents of the old array, and deallocating the old array.
Arrays are commonly used when two conditions hold. First the maximum
length of the list can be accurately estimated in advance (so array
expansion is rarely needed). Second, insertion and deletion occur only at
the ends of the list. (Insertion and deletion in the middle of an array-based
list is slow.)
b) linked list
A list implemented by a set of nodes, each of which points to the next. An
object of class (or struct) "node" contains a field pointing to the next node,
as well as any number of fields of data. Optionally, there may be a second
"list" class (or struct) used as a header for the list. One field of the list class
is a pointer to the first node in the list. Other fields may also be included in
the "list" object, such as a pointer to the last node in the list, the length of the
list, etc.
Linked lists are commonly used when the length of the list is not known in
advance and/or when it is frequently necessary to insert and/or delete in the
middle of the list.
d) circular list
A linked list in which the last node points to the first node. If the list is
doubly-linked, the first node must also point back to the last node.
For example, a tree-like organization charts often used to represent the lines
of responsibility in a business as shown in Figure. The president of the
company is shown at the top of the tree and the vice-presidents are
indicated below her. Under the vice-presidents we find the managers and
below the managers the rest of the clerks. Each clerk reports to a manager.
Each manager reports to a vice-president, and each vice-president reports
to the president.
It just takes a little imagination to see the tree in Figure. Of course. The tree
is upside-down. However, this is the usual way the data structure is drawn.
The president is called the root of the tree and the clerks are the leaves.
must visit all the employees in the tree. An algorithm that systematically
visits all the items in a tree is called a tree traversal.
It follows from Definition that the minimal tree is a tree comprised of a single
root node. For example Ta = {A}.
Finally. The following Tb = {B, {C}} is also a tree
Ta = {D, {E. {F}}, {G.{H,II}}, {J, {K}. {L}}, {M}}}
Of course, trees drawn in this fashion are upside down. Nevertheless, this is
the conventional way in which tree data structures are drawn. In fact, it is
understood that when we speak of “up” and “down,” we do so with respect
to this pictorial representation. For example, when we move from a root to a
subtree, we will say that we are moving down the tree.
The inverted pictorial representation of trees is probably due to the way that
genealogical lineal charts are drawn. A lineal chart is a family tree that
shows the descendants of some person. And it is from genealogy that much
of the terminology associated with tree data structures is taken.
Each element in a binary tree is stored in a "node" class (or struct). Each
node contains pointers to a left child node and a right child node. In some
implementations, it may also contain a pointer to the parent node. A tree
may also have an object of a second "tree" class (or struct) which as a
header for the tree. The "tree" object contains a pointer to the root of the
tree (the node with no parent) and whatever other information the
programmer wants to squirrel away in it (e.g. number of nodes currently in
the tree).
In a binary tree, elements are kept sorted in left to right order across the tree.
That is if N is a node, then the value stored in N must be larger than the
value stored in left-child(N) and less than the value stored in right-child(N).
Variant trees may have the opposite order (smaller values to the right rather
than to the left) or may allow two different nodes to contain equal values.
the case of the driver'~ license database, the key is the driver's license
number and in the case of the symbol table, the key is the name of the
symbol.
Hash tables are a very practical way to maintain a dictionary. As with bucket
sort, it assumes we know that the distribution of keys is fairly well-behaved.
Once you have its index. A hash function is a mathematical function which
maps keys to integers.
In bucket sort, our hash function mapped the key to a bucket based on the
first letters of the key. "Collisions" were the set of keys mapped to the same
bucket. If the keys were uniformly distributed. then each bucket contains
very few keys!
The resulting short lists were easily sorted, and could just as easily be
searched
Ideally we would' build a data structure for which both the insertion and find
operations are 0(1) in the worst case. However, this kind of performance
can only be achieved with complete a priori knowledge. We need to know
beforehand specifically which items are to be inserted into the container.
Unfortunately, we do not have this information in the general case. So, if we
cannot guarantee 0(1) performance in the worst case, then we make it our
design objective to achieve 0(1) performance in the average case.
Clearly, neither the ordered list nor the sorted list meets our performance
objectives. The essential problem is that a search, either linear or binary, is
always necessary. In the ordered list, the find operation uses a linear search
to locate the item. In the sorted list, a binary search can be used to locate
the item because the data is sorted. However, in order to keep the data
sorted, insertion becomes O(n).
In order to meet the performance objective of constant time insert and find
operations. we need a way to do them without performing a search. That is,
given an item x, we need to be able to determine directly from x the array
position where it is to be stored.
Hash Functions
It is the job of the hash function to map keys to integers. A good hash
function:
1. Is cheap to evaluate
2. Tends to use all positions from O...M with uniform frequency.
3. Tends to put similar keys in different parts of the tables (Remember the
Shifletts!!)
The first step is usually to map the key to a big integer, for example
k=wth
h = 1284 x char (key[I])
1=0
2.8 Summary
This unit covers all overview and concepts of data structure with its
applications. Data structures represent places to store data for use by a
computer program. As you would imagine, this describes a spectrum of data
storage techniques, from the very simple to the very complex. We can look
at this progression, from the simple to the complex, At the lowest level, there
are data structures supplied and supported by the CPU (or computer chip),
itself. These vary from chip to chip, but are almost always of the very
primitive sort. They typically include the simple data types, such as integers,
characters, floating point numbers, and bit strings. On these contacts
discussed the various structured data types, Abstract data types, Linear and
non linear data structure.