Module 1 - Data Representation, and Data Structures-1
Module 1 - Data Representation, and Data Structures-1
1.1 Data
Data are simply values or set of values. A data refers to a single unit of values
and is either the value of a variable or a constant. For example, a data item is
a row in a database table, which is described by a data type. A data item that
does not have subordinate data items is called an elementary item. A data
item that is composed of one or more subordinate data items is called a group
item. A record can be either an elementary item or a group item. For example,
an employee’s name may be divided into three sub items –first name, middle
name and last name but the social_security_number would normally be
treated as a single item.
In the above diagram above, (ID, Age, Gender, First, Middle, Last, Street, Area)
are elementary data items, whereas (Name, Address) are group data items.
Data are frequently organized into a hierarchy of fields, records and files. In
order to understand these terms, let us see the following example.
Page 1 of 20
The term "information'' is sometimes used for data with given attributes, or,
in other words, meaningful or processed data
The way that data are organized into the hierarchy of fields, records and files
reflects the relationship between attributes, entities and entity sets. That is,
a field is a single elementary unit of information representing an attributes of
an entity. A record is the collection of field values of a given entity and a file is
the collection of records of the entities in a given entity set. Each record in a
file may contain many field items, but the value in a certain field may uniquely
determine the record in the file. Such a field K is called a Primary Key, and
the values K1, K2....Kn in such field are called keys or key values.
Records may also be classified according to length. A file can have fixed-length
or variable-length records. In fixed-length records, all the record contain the
same data items with the same amount of space assigned to each data item.
In variable-length records, file records may contain different lengths. For
example, student records usually have variable lengths, since different
students take different numbers of courses. Usually, variable-length records
have a minimum and maximum length.
The above organization of data into field, record and files may not be complex
enough to maintain and efficiently process certain collections of data. For this
reason, data are also organized into more complex types of structures. The
study of such data structures which form the subject matter of the text,
includes the following three steps:-
(a) Logical or mathematical description of the structure
(b) Implementation of the structure on a computer
(c) Quantitative analysis of the structure, which includes determining the
amount of memory needed to store the structure and the time required
to process the structure.
A Data type simply refers to a defined kind of data, that is, a set of possible
values and basic operations on those values. A data type consists of:
✓ a domain (= a set of values)
✓ A set of operations that may be applied to the values.
Computer memory is all filled with zeros and ones. If we have a problem and
wanted to code it, it is very difficult to provide the solutions in terms of zeros
Page 2 of 20
and ones. To help users, programming languages and compilers are providing
the facility of data types.
For example, integer takes 2 bytes (actual value depends on compiler), float
takes 4 bytes, etc. What this means is that in memory we are combining 2
bytes (16 bits) and calling it as integer. Similarly, combining 4 bytes (32 bytes)
and calling it as float. A data types reduces coding efforts.
The number of bits allocated for each primitive data types depends on the
programming languages, compilers and operating system. Different languages
may use different sizes for each of the primitive data types. Depending on the
size of the data types, the total available values (domain) will also change. For
example, “int” may take 2 bytes or 4 bytes. If it takes (16 bits), then, the total
possible values are -32,768 to +32, 767 (-215 to 215 – 1). If it takes 4 bytes (32
Page 3 of 20
bits), then the possible values are between -2, 147, 483, 648 t0
+2,147,483,648 (-231 to 231 – 1).
Page 4 of 20
A “struct” in C's and C++'s is example of a composite type, a datatype that
composes a fixed set of labelled fields or members. It is so called because of
the struct keyword used in declaring them, which is short for structure.
A struct declaration consists of a list of fields, each of which can have any
type. Structures is declared is C and C++ as shown below:
struct Account {
int account_number;
char *first_name;
char *last_name;
float balance;
}; a1
The variable of the structure can be accessed by simply using the instance of
the structure followed by the dot (.) operator and then the field of the
structure. For example,
a1.balance = 20000.00
class Student
{
public:
int id; //field or data member
float salary; //field or data member
String name;//field or data member
public:
display (string staffName, int staffId, float pay); // function
}
Java Classes are created using the Java keyword class as shown in the
example below:
class Bicycle {
// state or field
private int gear = 5;
// behavior or method
public void braking() {
System.out.println("Working of Braking");
Page 5 of 20
}
}
In other words, we can say that abstract data types are the entities that are
definitions of data and operations but do not have implementation details. In
this case, we know the data that we are storing and the operations that can
be performed on the data, but we do not know about the implementation
details. The reason for not having implementation details is that every
programming language has a different implementation strategy, for example;
a C data structure is implemented using structures while a C++ data
structure is implemented using objects and classes.
An ADT does not specify how data will be organized in memory and what
algorithms will be used for implementing the operations. It is called
“abstract” because it gives an implementation-independent view. The
process of providing only the essentials and hiding the details is known as
abstraction. Commonly used ADTs include Linked Lists, Stacks, Queues,
Priority Queues, Binary Trees, Disjoint Sets (Union and Find), Hash Tables,
Graphs, etc.
The ADT is made of with primitive datatypes, but operation logics are hidden.
It has basic operations lsuch as insertion, deletion, or updating. For example,
some ADT operations used with Stack are:
▪ isFull(), This is used to check whether stack is full or not
▪ isEmpry(), This is used to check whether stack is empty or not
▪ push(x), This is used to push x into the stack
▪ pop(), This is used to delete one element from top of the stack
▪ peek(), This is used to get the top most element of the stack
▪ size(), this function is used to get number of elements present into the
stack
Page 6 of 20
1.3.4 Enumerated Data Types
An Enumerated type is a type whose legal values consist of a fixed set of
constants. Common examples include compass directions, which take the
values North, South, East and West and days of the week, which take the
values Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, and
Saturday.
C++ Enums can be thought of as classes that have fixed set of constants.
Enumeration and implemented the enum keyword similar to that of Java.A
simple example of enum data type used in C++ program.
#include <iostream>
using namespace std;
enum week {Monday, Tuesday, Wednesday, Thursday, Friday,
Saturday, Sunday };
int main()
{
week day;
day = Friday;
cout << "Day: " << day+1<<endl;
return 0;
}
#include <iostream>
include <string>
Page 7 of 20
using namespace std;
int main () {
// Create a string variable
string greeting = "Hello";
// Output string value
cout << greeting;
return 0;
}
(b) Pointer data type: A Pointer is a variable that stores the memory
address of an object. The pointer then simply “points” to the object.
The type of the object must correspond with the type of the pointer.
Pointers are used extensively in both C and C++ for three main
purposes:
A C++ program that illustrates how a Pointer data types is used in C++ is
shown below:
// C++ program to point address of a pointer
#include <iostream>
using namespace std;
int main ()
{
int *ptr, var; // * is called a deference operator
Page 8 of 20
var = 5;
// Assign address of var to ptr
ptr = &var; // & is called Reference Operator
(c) Union data types: A Union is a special data type available that allows
to store different data types in the same memory location. You can
define a union with many members, but only one member can contain
a value at any given time. Unions provide an efficient way of using the
same memory location for multiple-purpose. With a union, all members
share the same memory. The Program below illustrates how Union is
used in C++.
#include <iostream>
using namespace std;
// Creating Union
union Job {
float salary;
int workerNo;
} j;
int main () {
// Assigning values to member of the Union
j.salary = 12.3;
// when j.workerNo is assigned a value,
// j.salary will no longer hold 12.3
j.workerNo = 100;
cout <<("Salary = ”<< j.salary);
cout <<"Number of workers = ", j.workerNo);
Page 9 of 20
return 0;
}
Salary = 0.0
Number of workers = 100
(d) Text data types: A Text data type can hold any letter, number, symbol
or punctuation mark. It is a variable-length data type that can store
long character strings. It is sometimes referred to as 'alphanumeric' or
'string'. The data can be pure text or a combination of text, numbers
and symbols. An SQL Text data type can hold up to 2,147,483,647
bytes of data. They are variants of Text data type, for example, in
MySQL, the variants of Text data type are TINYTEXT, TEXT,
MEDIUMTEXT and LONGTEXT.
Page 10 of 20
A Data structure is a structured set of variables associated with one another
in different ways, cooperatively defining components in the system and
capable of being operated upon in the program. Data structures are the basis
of programming tools and the choice of data structures should provide the
following:
(a) The data structures should satisfactorily represent the relationship
between data elements.
(b) The data structures should be easy so that the programmer can easily
process the data.
Page 11 of 20
As applications are getting complex and data rich, there are three common
problems that applications face now-a-days.
(a) Data Search − Consider an inventory of 1 million (106) items of a store.
If the application is to search an item, it has to search an item in 1
million (106) items every time slowing down the search. As data grows,
search will become slower.
(b) Processor Speed − Processor speed although being very high, falls
limited if the data grows to billion records.
(c) Multiple Requests − As thousands of users can search data
simultaneously on a web server, even the fast server fails while
searching the data.
(a) Linear Data Structures: In linear data structures, values are arranged
in linear fashion. Arrays, linked lists, stacks and queues are examples
of linear data structures in which values are stored in a sequence.
(b) Non-Linear Data Structures: This type is opposite to linear. The data
values in this structure are not arranged in order. Tree, graph, table
and sets are examples of non-linear data structures.
Page 12 of 20
Other Classifications are:
(a) Homogenous and Non-Homogenous Data Structures
✓ Homogenous: In this type of data structures, values of the same types
of data are stored, as in an array.
✓ Non-homogenous: In this type of data structures, data values of
different types are grouped, as in structures and classes.
(b) Dynamic and Static
✓ Dynamic: In dynamic data structures such as references and pointers,
size and memory locations can be changed during program execution.
✓ Static: A static data structure is one designed for a certain number and
type of elements. It is designed for one particular use (e.g. one particular
application) and it is never added to or deleted from.
1.6.2 Stack
A Stack is a particular kind of abstract data type or collection in which the
principal (or only) operations on the collection are the addition of an entity to
the collection, known as push and removal of an entity, known as pop. The
relation between the push and pop operations is such that the stack is a Last-
In-First-Out (LIFO) data structure. In a LIFO data structure, the last element
added to the structure must be the first one to be removed. This is equivalent
to the requirement that, considered as a linear data structure, or more
abstractly a sequential collection, the push and pop operations occur only at
one end of the structure, referred to as the top of the stack. Often a peek or
Page 13 of 20
top operation is also implemented, returning the value of the top element
without removing it.
1.6.3 Queue
A Queue is a particular kind of collection in which the entities in the collection
are kept in order and the principal (or only) operations on the collection are
the addition of entities to the rear terminal position and removal of entities
from the front terminal position. This makes the queue a First-In-First-Out
(FIFO) data structure. In a FIFO data structure, the first element added to the
queue will be the first one to be removed. This is equivalent to the requirement
that once an element is added, all elements that were added before have to be
removed before the new element can be invoked. A queue is an example of a
linear data structure.
1.6.4 Deque
A double-ended queue (dequeue, often abbreviated to deque, pronounced
deck) is an abstract data type that generalizes a queue, for which elements
can be added to or removed from either the front (head) or back (tail). It is also
often called a head-tail linked list, though properly this refers to a specific
data structure implementation
Page 14 of 20
associated with it. In a priority queue, an element with high priority is served
before an element with low priority. If two elements have the same priority,
they are served according to their order in the queue.
A linked list is a linear data structure where each element is a separate object.
Each element (we will call it a node) of a list is comprising of two items - the
data and a reference to the next node. The last node has a reference to null.
The entry point into a linked list is called the head of the list. It should be
noted that head is not a separate node, but the reference to the first node. If
the list is empty then the head is a null reference.
A linked list is a dynamic data structure. The number of nodes in a list is not
fixed and can grow and shrink on demand. Any application which has to deal
with an unknown number of objects will need to use a linked list.
One disadvantage of a linked list is that it does not allow direct access to the
individual elements. If you want to access a particular item, then you have to
start at the head and follow the references until you get to that item. Another
disadvantage is that a linked list uses more memory compare with an array -
we extra 4 bytes (on 32-bit CPU) to store a reference to the next node.
Page 15 of 20
Another important type of a linked list is called a circular linked list where
last node of the list points back to the first node (or the head) of the list.
1.6.6 Record
A Record (also called a tuple or struct) is an aggregate data structure. A record
is a value that contains other values, typically in fixed number and sequence
and typically indexed by names. The elements of records are usually called
fields or members. A record is a special type of data structure that, unlike
arrays, collects different data types that define a particular structure such a
book, product, person and many others. The programmer defines the data
structure under the Type user definition as shown below.
Type
Str25 = String[25];
TBookRec = Record
Title, Author,
ISBN : Str25;
Price : Real;
end;
Var
myBookRec : TBookRec;
1.6.7 Trees
A Tree is a widely used abstract data type (ADT) that simulates a hierarchical
tree structure, with a root value and subtrees of children, represented as a
set of linked nodes. A tree data structure can be defined recursively (locally)
as a collection of nodes (starting at a root node), where each node is a data
structure consisting of a value, together with a list of references to nodes (the
"children"), with the constraints that no reference is duplicated, and none
points to the root.
Page 16 of 20
child's parent node (or ancestor node, or superior). A node has at most one
parent.
An internal node (also known as an inner node, inode for short, or branch
node) is any node of a tree that has child nodes. Similarly, an external node
(also known as an outer node, leaf node, or terminal node) is any node that
does not have child nodes.
The topmost node in a tree is called the root node. Depending on definition,
a tree may be required to have a root node (in which case all trees are non-
empty), or may be allowed to be empty, in which case it does not necessarily
have a root node. Being the topmost node, the root node will not have a parent.
It is the node at which algorithms on the tree begin, since as a data structure,
one can only pass from parents to children. Note that some algorithms (such
as post-order depth-first search) begin at the root, but first visit leaf nodes
(access the value of leaf nodes), only visit the root last (i.e., they first access
the children of the root, but only access the value of the root last). All other
nodes can be reached from it by following edges or links.
The height of a node is the length of the longest downward path to a leaf from
that node. The height of the root is the height of the tree. The depth of a node
is the length of the path to its root (i.e., its root path). This is commonly needed
in the manipulation of the various self-balancing trees, AVL Trees in
particular. The root node has depth zero, leaf nodes have height zero, and a
tree with only a single node (hence both a root and leaf) has depth and height
zero. Conventionally, an empty tree (tree with no nodes, if such are allowed)
has depth and height −1.
Page 17 of 20
1.6.8 Graph
A Graph is an abstract data type that is meant to implement the graph and
hypergraph concepts from mathematics. A graph data structure consists of a
finite (and possibly mutable) set of ordered pairs, called edges or arcs, of
certain entities called nodes or vertices. As in mathematics, an edge (x,y) is
said to point or go from x to y. The nodes may be part of the graph structure,
or may be external entities represented by integer indices or references. A
graph data structure may also associate to each edge some edge value, such
as a symbolic label or a numeric attribute (cost, capacity, length, etc.).
Page 18 of 20
1.6.10 Heap Data Structures
Heap is a special case of balanced binary tree data structure where the root-
node key is compared with its children and arranged accordingly. If α has
child node β then −
key(α) ≥ key(β)
Heap data structure is a complete binary tree that satisfies the heap property,
where any given node is:
(a) always greater than its child node/s and the key of the root node is the
largest among all other nodes. This property is also called max heap
property.
(b) always smaller than the child node/s and the key of the root node is
the smallest among all other nodes. This property is also called min
heap property.
There are six basic operations that can be performed on data structure which
are:
▪ Create
▪ Traversing
▪ Searching
▪ Sorting
▪ Inserting
▪ Deleting
▪ Merging
▪ Display
▪ Updating
▪ Destroying
(a) Create: The create operation results in reserving memory for program
elements. This can be done by declaration statement. Creation of data
structure may take place either during compile-time or run-time. The
malloc() function of C language is used for creation.
(c) Searching: Searching is finding out the location of a given element from
a set of numbers.
Page 19 of 20
(d) Sorting: Sorting is the process of arranging a list of elements in a
sequential order. The sequential order may be descending order or an
ascending order according to the requirements of the data structure.
(g) Merging: The process of combining the elements of two data structures
into a single data structure is called merging.
(i) Display: This operation displays all the elements in the entire array
using a print statement.
Page 20 of 20