0% found this document useful (0 votes)
40 views

Database System

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Database System

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 224

DATABASE SYSTEM

MCA
Second Semester

Bharathidasan University
Centre for Distance and Online Education
Chairman:
Dr. M. Selvam
Vice-Chancellor
Bharathidasan University
Tiruchirappalli-620 024
Tamil Nadu
Co-Chairman:
Dr. G. Gopinath
Registrar
Bharathidasan University
Tiruchirappalli-620 024
Tamil Nadu

Course Co-Ordinator:
Dr. A. Edward William Benjamin
Director-Centre for Distance and Online Education
Bharathidasan University
Tiruchirappalli-620 024
Tamil Nadu
The Syllabus is Revised from 2021-22 onwards

Author:

Dr. T. Dheepak, Asst Professor, Dept of Computer Science, CDOE


Reviewer
Bharathidasan University, Trichy
Mrs. T. Lucia Agnes Beena, Asst. Professor & Head, Dept of Information Techonology, St. Joseph's college, Trichy - 620 002

Authors
Manas Ghosh and Sudipta Pathak (Units: 1.4-1.7.3, 1.10-1.12, 1.13.4-1.14, 2.0-2.2.3, 2.4, 2.7, 3.2-3.5, 3.11-3.15, 4.3-4.3.1, 4.5-4.9, 5.0-5.4.2,
5.5, 5.6.3, 5.8-5.10.4)
Vikas Publishing House (Units: 1.0-1.3.3, 1.8-1.9, 1.13-1.13.3, 1.15-1.19, 2.3-2.3.3, 2.5-2.6, 2.8-2.17, 3.0-3.1, 3.6-3.10, 4.0-4.2, 4.4, 5.4.3,
5.6-5.6.2, 5.7, 5.11-5.19)

"The copyright shall be vested with Bharathidasan University"


All rights reserved. No part of this publication which is material protected by this copyright notice
may be reproduced or transmitted or utilized or stored in any form or by any means now known or
hereinafter invented, electronic, digital or mechanical, including photocopying, scanning, recording
or by any information storage or retrieval system, without prior written permission from the Publisher.

Information contained in this book has been published by VIKAS® Publishing House Pvt. Ltd. and has
been obtained by its Authors from sources believed to be reliable and are correct to the best of their
knowledge. However, the Publisher, its Authors shall in no event be liable for any errors, omissions
or damages arising out of use of this information and specifically disclaim any implied warranties or
merchantability or fitness for any particular use.

Vikas® is the registered trademark of Vikas® Publishing House Pvt. Ltd.


VIKAS® PUBLISHING HOUSE PVT LTDE-
28, Sector-8, Noida - 201301 (UP)
Phone: 0120-4078900 • Fax: 0120-4078999
Regd. Office: A-27, 2nd Floor, Mohan Co-operative Industrial Estate, New Delhi 1100 44
Website: www.vikaspublishing.com • Email: helpline@vikaspublishing.com
SYLLABI-BOOK MAPPING TABLE
Database System

Syllabi Mapping in Book

Module I: Introduction To Database Systems and E-R Model


Overview, A Historical Perspective, Files System versus DBMS,
Advantages of DBMS, Describing and storing data in a DBMS Unit 1: Introduction to Database
Transaction management, Structure of a DBMS, People who work with Management Systems and
Databases, Overview of Database Design. Entities, Attributes and Entity E-R Model
Sets, Relationships and Relationship sets, Additional Features of E-R (Pages 3-36)
Model: Key Constraints. Conceptual Design with the E-R Model,
Overview of Unified Modeling Languages

Module II: Relational Model and Query Languages


Introduction to the Relational Model. Integrity Constraints over Relations
: Primary Key, Foreign Key and General Constraints. E-R Model to
Relational Model: Entity Sets to Tables, Relationship Sets to Tables, Unit 2: Relational Model and
Translating, Relationship Sets with Key Constraints. Translating Query Languages
Relationship Sets with Participation Constraints, Translating Weak Entity (Pages 37-73)
Sets, Translating Class Hierarchies. Translating E-R Diagrams with
Aggregation, Introduction to Query Languages, Relational Algebra:
Selection and Projection Operations. Set Operations, Renaming, Joins,
Division

Module III: Structured Query Language


Overview of SQL, Basic Queries in SQL, Union, Intersect and Except,
Nested Queries, Aggregate Operators, Null Values, Complex Integrity Unit 3: Structured Query Language
Constraints in SQL, Triggers and Views in SQL, Embedded SQL and (Pages 75-129)
Cursors

Module IV: Relational Database Design


Introduction to Schema Refinement, Functional Dependencies, Properties
of Decomposition, Normal Forms: First Normal Form, Second Normal Unit 4: Relational Database Design
Form, Third Normal Form, Boyce Codd Normal Form, Fourth Normal (Pages 131-153)
Form, Fifth Normal Form

Module V: Transaction Management, Concurrency Control and


Distributed System
The ACID Properties, Transactions and Schedules, Concurrent Execution
of Transactions: Serialisability, Anomalies Due to Interleaved Execution. Unit 5: Transaction Management,
Schedules Involving Aborted Transactions, Lock-Based Concurrency Concurrency Control and
Control: 2 PL – Basic & Strict, Dead Locks, Introduction to Crash Distributed System
Recovery, Concurrency Control:2 PL, Serialisability, and Recover-ability,: (Pages 155-217)
Introduction to Locking Management: Dealing with Deadlock
.Introduction to ARIES, Introduction to Distributed Databases:
Advantages & Disadvantages, Transaction Types – Local & Global
Transactions, Distributed DBMS Architectures, Design of Distributed
Databases: Replication, Fragmentation.
CONTENTS
INTRODUCTION 1
UNIT 1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS AND E-R MODEL 3-36
1.0 Introduction
1.1 Unit Objectives
1.2 Overview of Database Management System
1.3 A Historical Perspective
1.3.1 Navigational DBMS of the 1960s
1.3.2 Relational DBMS of the 1970s
1.3.3 Later Trends
1.4 File System versus DBMS
1.5 Advantages of DBMS
1.6 Describing and Storing Data in DBMS Transaction Management
1.7 Structure of a DBMS
1.7.1 Logical DBMS Architecture
1.7.2 Mapping between the Levels
1.7.3 Physical DBMS Architecture
1.8 People who work with Databases
1.9 Overview of Database Design
1.10 Entities
1.11 Attributes and Entity Sets
1.12 Relationships and Relationship Sets
1.13 Additional Features of E–R Model
1.13.1 Works-in-Relationship
1.13.2 Maintaining Ternary Relationships
1.13.3 Working with Participation Constraints
1.13.4 Key Constraints
1.14 Conceptual Design with E–R Model
1.15 Overview of Unified Modelling Languages
1.15.1 Software Development Methods
1.15.2 Modelling
1.15.3 Diagrams Overview
1.15.4 Structure Diagrams
1.15.5 Behaviour Diagrams
1.15.6 Meta Modelling
1.15.7 Criticism of UML
1.16 Summary
1.17 Key Terms
1.18 Answers to ‘Check Your Progress’
1.19 Questions and Exercises
1.20 Learning Outcomes
UNIT 2 RELATIONAL MODEL AND QUERY LANGUAGES 37-73
2.0 Introduction
2.1 Unit Objectives
2.2 Integrity Constraints over Relations
2.2.1 Primary Key
2.2.2 Foreign Key
2.2.3 General Constraints
2.3 E–R Model to Relational Model
2.3.1 Entity Sets to Tables
2.3.2 Translating
2.3.3 Relationship Sets with Key Constraints
2.4 Translating Relationship Sets with Participation Constraints
2.5 Translating Weak Entity Sets
2.6 Translating Class Hierarchies
2.7 Translating E–R Diagrams with Aggregation
2.8 Introduction to Query Languages
2.9 Relational Algebra
2.9.1 Primitive Operations
2.9.2 Selection Operation
2.9.3 Projection Operation
2.10 Set Operations
2.10.1 Joins
2.11 Renaming
2.12 Joins
2.13 Division
2.14 Summary
2.15 Key Terms
2.16 Answers to ‘Check Your Progress’
2.17 Questions and Exercises
2.18 Learning Outcomes
UNIT 3 STRUCTURED QUERY LANGUAGE 75-129
3.0 Introduction
3.1 Unit Objectives
3.2 Overview of SQL
3.2.1 SQL in Oracle
3.2.2 Different Interfaces for SQL Commands in Oracle
3.2.3 SQL Statements
3.3 Basic Queries in SQL
3.4 Union
3.5 Intersect and Except
3.6 Nested Queries
3.7 Aggregate Operators
3.8 Null Values
3.9 Complex Integrity Constraints in SQL
3.10 Triggers and Views in SQL
3.11 Embedded SQL and Cursors
3.12 Summary
3.13 Key Terms
3.14 Answers to ‘Check Your Progress’
3.15 Questions and Exercises
3.16 Learning Outcomes
UNIT 4 RELATIONAL DATABASE DESIGN 131-153
4.0 Introduction
4.1 Unit Objectives
4.2 Introduction to Schema Refinement
4.3 Functional Dependencies (FD)
4.3.1 Dependency Types
4.4 Properties of Decomposition
4.5 Normal Forms
4.5.1 First Normal Form
4.5.2 Second Normal Form
4.5.3 Third Normal Form
4.5.4 Boyce-Codd Normal Form
4.5.5 Fourth Normal Form
4.5.6 Fifth Normal Form
4.6 Summary
4.7 Key Terms
4.8 Answers to ‘Check Your Progress’
4.9 Questions and Exercises
4.10 Learning Outcomes
UNIT 5 TRANSACTION MANAGEMENT, CONCURRENCY CONTROL
AND DISTRIBUTED SYSTEM 155-216
5.0 Introduction
5.1 Unit Objectives
5.2 The ACID Properties
5.3 Transactions and Schedules
5.3.1 Transaction
5.3.2 Transaction Processing Steps
5.3.3 Interleaved vs Simultaneous vs Serial Transaction
5.3.4 Transaction Action
5.3.5 Transaction States and Additional Operations
5.4 Concurrent Execution of Transactions
5.4.1 Motivation for Concurrent Execution of Transactions
5.4.2 Serializability
5.4.3 Anomalies due to Interleaved Execution
5.5 Schedules Involving Aborted Transactions
5.6 Lock-Based Concurrency Control
5.6.1 Binary Locks
5.6.2 Shared or Exclusive Locks
5.6.3 Two-Phase Locking: Basic and Strict
5.7 Introduction to Crash Recovery
5.8 Concurrency Control
5.8.1 The Lost Update Problem
5.8.2 Uncommitted Dependency: The Dirty Read Problem
5.8.3 Unrepeatable Read or Inconsistent Retrievals Problem
5.8.4 Phantom Reads
5.8.5 Recoverability
5.9 Introduction to Locking Management
5.9.1 Types of Locks
5.10 Dealing with Deadlock
5.10.1 Deadlock
5.10.2 Deadlock Prevention Protocol
5.10.3 Deadlock Detection
5.10.4 Starvation
5.11 Introduction to ARIES
5.12 Introduction to Distributed Databases
5.12.1 Distributed Database
5.12.2 Distributed Data Transparency Features
5.12.3 Advantages and Disadvantages of Distributed Databases
5.12.4 Catalogue Management
5.12.5 Update Propagation
5.12.6 Recovery Control
5.12.7 Concurrency Control
5.13 Transaction Types
5.13.1 Implicit Transactions
5.13.2 Explicit Transactions
5.13.3 Local and Global Transactions
5.14 Distributed DBMS Architectures
5.14.1 Homogeneous Distributed Database Systems
5.14.2 Heterogeneous Distributed Database Systems
5.15 Design of Distributed Databases
5.15.1 Levels of Data and Processing
5.15.2 Design
5.15.3 Replication
5.15.4 Characteristics of DBMS Replication
5.15.5 Fragmentation
5.16 Summary
5.17 Key Terms
5.18 Answers to ‘Check Your Progress’
5.19 Questions and Exercises
5.20 Learning Outcomes
Introduction
INTRODUCTION

NOTES
A database management system or DBMS simply helps to manage databases. It is
a set of software that facilitates the storage and retrieval of data in a structured
manner.
Most modern enterprises find DBMSs indispensable. The smooth functioning of
an enterprise and its very existence may be heavily dependant on the proper selection
of a DBMS. A wrong choice may have an adverse effect which will be reflected
in the company’s performance and productivity for years to come. If sufficient
thought is not given to the selection of an appropriate DBMS, the enterprise using
it may end up wasting precious time and money in using and maintaining an unsuitable
system. Also, once installed and implemented, it is not only impossible to replace or
change a DBMS but it will definitely be an expensive proposition.
This book, Database System, gives the reader an insight into all aspects of a
DBMS. It begins with an introduction to database management systems, its advantages
and its structure in the first unit. Here, it also gives the reader an idea of database
design, relationships and relationship sets and the features of the E¯R model.
The second unit deals with integrity constraints over relations where it discusses
the primary and foreign keys. It also discusses the translation of relationship sets with
participation constraints; and translating weak entity sets and class hierarchies.
Primitive, selection and projection operations in relational algebra have also been
discussed in the second unit.
The third unit is devoted to discussing structured query language. Among other
things, it covers the basic queries in SQL, nested queries, aggregate operators,
triggers and views in SQL.
The fourth unit focusses on relational database design with special emphasis on
functional dependencies, properties of decomposition and the normal forms.
The last unit explains transaction management with emphasis on the ACID
properties, transaction processing steps, transaction action, transaction states and
additional operations. This unit also discusses concurrency control, locking management,
deadlock detection and prevention, as well as the design of distributed databases.
In accordance with the self-instructional mode format, each unit begins with an
Introduction to the topic and is followed by an outline of the Unit Objectives. The
detailed content is then presented in a simple and structured form, interspersed with
Check Your Progress questions to test the student’s understanding. A Summary of
the content, along with a list of Key Terms and a set of Questions and Exercises is
provided at the end of each unit for recapitulation. Relevant examples/illustrations
have been included for better understanding of the topics.

Self-Instructional Material 1
Introduction to Database
Management Systems and
UNIT 1 INTRODUCTION TO E–R Model

DATABASE MANAGEMENT
NOTES
SYSTEMS AND E–R MODEL
Structure
1.0 Introduction
1.1 Unit Objectives
1.2 Overview of Database Management System
1.3 A Historical Perspective
1.3.1 Navigational DBMS of the 1960s
1.3.2 Relational DBMS of the 1970s
1.3.3 Later Trends
1.4 File System versus DBMS
1.5 Advantages of DBMS
1.6 Describing and Storing Data in DBMS Transaction Management
1.7 Structure of a DBMS
1.7.1 Logical DBMS Architecture
1.7.2 Mapping between the Levels
1.7.3 Physical DBMS Architecture
1.8 People who work with Databases
1.9 Overview of Database Design
1.10 Entities
1.11 Attributes and Entity Sets
1.12 Relationships and Relationship Sets
1.13 Additional Features of E–R Model
1.13.1 Works-in-Relationship
1.13.2 Maintaining Ternary Relationships
1.13.3 Working with Participation Constraints
1.13.4 Key Constraints
1.14 Conceptual Design with E–R Model
1.15 Overview of Unified Modelling Languages
1.15.1 Software Development Methods
1.15.2 Modelling
1.15.3 Diagrams Overview
1.15.4 Structure Diagrams
1.15.5 Behaviour Diagrams
1.15.6 Meta Modelling
1.15.7 Criticism of UML
1.16 Summary
1.17 Key Terms
1.18 Answers to ‘Check Your Progress’
1.19 Questions and Exercises
1.20 Learning Outcomes

1.0 INTRODUCTION
In this unit, you will learn about database management systems. Data is correctly
considered a resource because for any organization to take proper decisions at the
proper time, it is imperative to have correct and up-to-date data in hand. It helps in
the proper utilization of organizational resources. The data that an organization
Self-Instructional Material 3
Introduction to Database must necessarily maintain for its operations is termed as operational data. Sometimes
Management Systems and
E–R Model data and information are interchangeably used, but it is necessary to understand
that data is a known fact that can be recorded and information is processed or
organized or summarized data.
NOTES You will also get a historical perspective of database management systems
(DBMS). This unit traces the development of DBMS through the 1960s to the present.
The file system versus DBMS and the advantages of DBMS are discussed.
You will also learn how to describe and store data in a DBMS transaction.
Databases store large amounts of information that are available for long periods of
time. During this period, data is continually accessed and processed. Databases are
generally stored permanently on magnetic disk secondary storage devices. This
unit also explains the structure of a DBMS. These are very complex systems; so, to
understand general database concepts and the structure and capabilities of a DBMS,
it is useful to examine the architecture of a typical database management system.
You will also learn about the people who work with databases and get an
overview of database design. The latter refers to a system that stores and retrieves
data systematically from the database.
Other important topics that you will learn are entity attributes and entity sets.
An entity is described by a set of properties called attributes. An employee entity
might have attributes such as employee code, employee name, salary, date of joining,
etc. Further, this unit explains relationships and relationship sets, additional features
of E–R model and conceptual design of an E–R model. The latter facilitates a database
by enabling the designer to express the logical properties of the database in an
enterprise schema. In the end, you will get an overview of unified modelling
languages. Developed by the end of the 1990s, unified modelling language (UML)
is a standardized general-purpose modelling language. It uses a set of graphical
notations to create abstract models of specific systems.

1.1 UNIT OBJECTIVES


After going through this unit, you will be able to:
• Get a historical perspective of DBMS
• Compare the file system with DBMS
• Explain the advantages of DBMS
• Discuss the structure of a DBMS
• Know the people who work with databases
• Get an overview of database design
• Understand the concept of entities
• Describe attributes and entity sets
• Explain relationships and relationship sets
• Know more about the additional features of the E–R model
• Get an overview of modelling languages

4 Self-Instructional Material
Introduction to Database
1.2 OVERVIEW OF DATABASE MANAGEMENT Management Systems and
E–R Model
SYSTEM
A database is an organization of data that is kept in the form of linked lists or tables NOTES
and a database management system (DBMS). In other words, it is a system of software
programs that controls this organized data in storing, managing and retrieving data
in a database. DBMSs are categorized according to data types and their structures.
When a request (through an application program) for data is given to a DBMS, an
instruction is passed on to the operating system to transfer the required data. As the
size and type of data grows, these can be added to the database without any
obstruction in the working of the existing system.
Organizations use different kinds of database for different purposes, for
example, they may use one kind of data for daily transaction processing and send
these details on to another system using another DBMS from which random enquiries
can be made and analysis of data (in the context of business environment) can be
carried out. Database Administrator (DBA) and systems analysts take decision related
overall systems design. The detailing of the database design is done by DBAs.
As the size of data grows, different types of data are kept for different purposes
on a database server or servers, as the case may be. These servers are computers
(keeping the actual databases) running only the DBMS and related software. These
servers are multiprocessor computers with large memory and arrays of disks for
stable storage. Such an array of disks is known as Redundant Array of Inexpensive
Disks (RAID).
The concept of database has also added to the capability of operating systems.
Most database applications have DBMS in their core, and these are connected to
one or more servers using a high-speed channel. In an environment, where large
volumes of transactions are processed, hardware database accelerators are also used.
DBMSs are built around a kernel having multitasking capability with built-in support
for networking. With the growth of technology, such functions are performed by
modern operating system.
However, all these developments did not take place in one day, but have
devopled from one stage to another in due course of time.

1.3 A HISTORICAL PERSPECTIVE


The concept of a database has existed ever since mankind started using records.
Data retrieval was manual, and it was time consuming to search for the required
data. With the advent of the computer system, it became possible to keep data in an
electronic form. Today’s modern systems can be applied to a wide variety of databases
and applications, however, the older systems were tightly linked to the custom
databases and lacked both speed and flexibility. However, there was a trade-off—to
gain speed, flexibility was sacrificed.

1.3.1 Navigational DBMS of the 1960s


The capacity of computers increased and by the mid-1960s a number of general-
purpose database systems entered the market. With the growth of database systems,
Self-Instructional Material 5
Introduction to Database the need for standardization was felt, and a product named Integrated Data Store
Management Systems and
E–R Model (IDS), developed by Charles Bachman, was launched by the consortium of IT
professionals—Conference on Data Systems Languages (CODASYL). CODASYL
later founded the ‘Database Task Group’. It also standarized COBOL and in 1971,
NOTES the standard was delivered. Thus, a number of commercial products based on it
more developed.
Various data sets were linked to form a large network and CODASYL
‘manually’ navigated such linked data sets. Retrieval of data was slow. A link was
used for the first record in the database that contained pointers to other pieces of
data. These pointers were traversed one after the other until the required record was
found and returned by the system. Even a simple query required searching through
the entire data set till the matching record(s) were found. There was, essentially, no
concept of ‘find’ or ‘search’ because such operations were highly expensive.
In 1968, IBM developed its own DBMS, known as IMS written for the Apollo
program and tested on the System/360. It was almost similar in concept to
CODASYL, except that it was a strictly hierarchical model whereas the DBMS by
CODASYL was a network model. Both of these concepts were called navigational
databases models because of their method of accessing data—navigating through
databases. Thus, before the advent of the relational model of databases, the network
model and the hierarchical model were popular. CODASYL’s IDS and IDMS and
CINCOMs TOTAL database were termed as network databases whereas IBM’s
IMS was known as a hierarchical database.
1.3.2 Relational DBMS of the 1970s
Edgar Codd (while working at IBM in California) found that the navigational models
lacked many features and therefore, outlined a new approach to database construction
giving rise to Relational Database Management System (RDBMS), which he called
(A Relational Model of Data for Large Shared Data Banks).
He developed a new system for storage while working with large databases.
He suggested the use of tables of fixed-length for this purpose instead of linked
lists. The linked-list system was found very inefficient, more so in cases where
while storing databases a few fields/records were to be left empty. By using the
relational model, this problem was solved as data was split into a series of normalized
tables, and optional elements were moved out of the main table. In a relational
model, records are linked together using a key.
For example, to track information about user names, login information,
addresses and phone numbers, and so on, data is kept in different normalized tables
like, user table, address table, phone number table, and so on. These fields were not
present in the navigational approach and the entire data was placed in a single
record with no place for unused data.
In the relational model, a few bits of information serve as key, which uniquely
define a record. To collect information about a user, information stored in the related
tables are found by locating this key. For example, if any field is unique, say the
login name of a user, other attributes, like addresses and phone numbers are recorded
with the login name as key. For each login name there is a record which is unique.
In case of a bank account, the user’s name may not be unique, here the account
number is taken as unique and serves as a key.

6 Self-Instructional Material
To avoid looping for collecting records, Codd gave a solution of using a set- Introduction to Database
Management Systems and
oriented language. RDBMS uses a branch of mathematics known as tuple calculus, E–R Model
and it supports all the operations of normal databases, such as, inserting, updating,
deleting, etc. It also provides a simple system for retrieving sets of data in a single
operation. NOTES
In the beginning of 1973, Ingres Corporation developed its first product and
also tested it, and by 1979 it was used widely. Ingres used a language for data access
called QUEL which was based on the relational model. System R was developed at
IBM and this led to IBM’s SQL/DS and DB2, then Oracle, HP’s Allbase, Tandem’s
Non-Stop SQL. This system used SEQUEL as a query language.
In 1976, P. Chen developed the Entity–Relationship (E–R) model for database
design. This brought about important conceptual data models. Such high-level
modelling enabled designers to focus more on data instead of logical table structures.
1.3.3 Later Trends
Early 1980s: RDBMS was used for commercial purposes in business and by the
mid-1980s, Structured Query Language (SQL) got the status of an established query
language. DB2 became the marketable product of IBM and earlier models, network
and hierarchies started loosing market. Even today, some legacy systems are using
them. The IBM PC was developed and this caused the emergence of many database
companies and many products, such as RIM, RBASE 5000, PARADOX, OS/2
Database Manager, Dbase III, and IV entered the market. Further development led
to the products like Foxbase, Visual FoxPro, Watcom SQL, etc.
Early 1990s: The popularity of databases spread in the field of business and
commerce and during this period, efforts were focussed on the development of
client tools for more powerful applications, such as PowerBuilder (Sybase), Oracle
Developer, VB (Microsoft), and so on. The client–server model got its place in
business world influencing business decisions. Personal productivity tools, such as
spreadsheets and databases, such as Excel/Access (MS) and ODBC became more
and more popular. This was the beginning of prototypes for Object Database
Management Systems (ODBMS).
Mid-1990s: World Wide Web/Internet appeared on the horizon, and its use
was started by people, rather than by developers. Remote access to computer systems
with legacy data became a possibility and client–server became popular and average
users were able to access the Internet through their desktop computers. Desktop
microcomputers became cheaper and got more popular. Web/DB started growing
exponentially.
Late-1990s: With the exponential growth of the Internet and Web technology
and also hardware products, many software tools were also developed. These are:
Active Server Pages, Front Page, Java Servlets, JDBC, Enterprise Java Beans,
ColdFusion, Dream Weaver, Oracle Developer 2000, and so on. Widespread use of
many software products like gcc, cgi, Apache, MySQL, and so on, became open
and was available. During this period, Online Transaction Processing (OLTP) and
Online Analytic Processing (OLAP) became popular.
Early 21st century: The Internet got the status of information super highway.
More and more activities became Net-based and many business processes were
made digital. Growth of database applications was more than that of the Internet.
Self-Instructional Material 7
Introduction to Database More and more interactive applications appeared with the availability and use of
Management Systems and
E–R Model personal digital assistants (PDAs), point of sale (POS) transactions and availability
of vendors for these products. More user-friendly gadgets appeared in the market.
In large database markets, three main companies—IBM (buys Informix), Microsoft
NOTES and Oracle dominated the scene. Online business transactions were carried out. Net
banking, online fund transfer, etc., have become commonplace. Data mining, data
warehousing and data marts are commonly used techniques today.
Future trends: Huge systems with high memory demand have been launched
in the market. Large scientific and technological databases, such as genome project,
geological, national security and space exploration data need to be handled.
More powerful database applications will come in the future as successors to
SQL. XML is gaining popularity and becoming more and more powerful. XML
with Java for databases is going to be the next big thing of the future. Distributed
transaction processing is emerging for business planning applications.
Object-oriented applications will become increasingly popular. Object
Database Management Group (ODMG) standards will bring products in the market
that may perhaps beat all other previous products.

1.4 FILE SYSTEM VERSUS DBMS


A comparison between a file processing system and a DBMS is given in the following
table:
File processing system versus DBMS

DBMS File processing system


It is a shared set of data that is logically related. It It is a set of application programs that serves the
describes the data designed to meet the information end-users in various ways, such as by producing
requirements of an organization. reports. Each program defines and manages its own
data.
Redundancies and inconsistencies in data are Redundancies and inconsistencies in data exist due
reduced due to single file formats and duplication of to single file formats and duplication of data.
data is eliminated.
Data is easily accessed due to standard query Data cannot be easily accessed due to special
procedures. application programs needed to access data.
Isolation/retrieval of the required data is possible Data isolation is difficult due to different file
due to common file format and there are provisions formats and also because new application programs
to retrieve data easily. have to be written.
It provides program and data independence. There is a dependency between application program
and data. This is because, definition of data is
embedded in the application program rather than
stored separately.
Integrity constraints, whether new or old, can be Introduction of integrity constraints is tedious and
enforced as per need. Data integrity is determined new application programs have to be written in
on the data itself. order to determine data integrity in each application.
Atomicity of updates is possible. Atomicity of updates may not be maintained.
Several users can access data at the same time, i.e. Concurrent accesses may cause problems such as
concurrently without problems. inconsistencies.
Security features can be enabled in a DBMS very It may be difficult to enforce security features
easily.
Cost of the system depends on application. It involves high cost as additional hardware is
needed and conversion of the existing one is
required.
A DBMS is usually a large piece of software adding Size of the software is small as compared to the
to an overhead. DBMS.

8 Self-Instructional Material
Introduction to Database
1.5 ADVANTAGES OF DBMS Management Systems and
E–R Model

The following are the advantages of DBMS:


Centralized control of data and data source: In a DBMS, all files are integrated NOTES
into one system thus making data management more efficient by providing
centralized control on the operational data. By providing centralized control of data,
a DBMS brings a number of advantages including reducing redundancy, avoiding
inconsistencies, sharing of data, giving better service to users, enforcing standards,
etc.
Data consistency: Minimal data redundancy means improved data
consistency; it also improves data quality.
Data independence: Data independence can be defined as the immunity of
applications to change in the physical representation and access technique. The
provision of data independence is a major objective for database systems. If the
data is well designed, the user can access different combinations of the same data
for query and reporting purposes.
Data integration: Since related data is stored in one single database, enforcing
data integrity is much easier. In a DBMS, the integrity rules can be enforced with
minimum programming in the application programs.
Data sharing: Related data can be shared across programs since data is stored
in a centralized manner. This provides improved data sharing, data accessibility
and responsiveness. Even new applications can be developed to operate against the
same data.
Access control: A DBMS should provide facilities for protecting the security
and privacy of data from unauthorized users.
Ease of application development: A DBMS provides increased productivity
for application development. The application programmer is not required to build
the functions that handle issues such as data integrity, security and concurrent access.
All that a programmer is required to do is implement the application business rules.
This makes the development of application more convenient. It is also easier to add
functional modules than it is in file-based systems.
Reduced maintenance: Though data in a database can be shared by many
applications and systems, maintenance is minimal and easy due to the centralized
nature of the system.

1.6 DESCRIBING AND STORING DATA IN DBMS


TRANSACTION MANAGEMENT
Databases store large amount of information that must be available for long period
of time. During this period, data is continually accessed and processed. Databases
are generally stored permanently on magnetic disk. This is because databases are
too large to fit entirely in the main memory.
Most database applications will only need a small portion of the database at
once for processing. When a portion of the database is needed, it is first located on

Self-Instructional Material 9
Introduction to Database the disk; copied to main memory for processing, and then rewritten to the disk if if
Management Systems and
E–R Model any changes are made.
File organization is a way of arranging the records in a file when the file is
stored on the disk. Data files are organized so as to facilitate access to records and
NOTES to ensure their efficient storage. There are a number of different primary file
organizations. Actually, these file organizations determine how the records of a file
are physically written to the disk, and how they can be accessed. A heap file
(unordered file), places records on disk by appending new records to the end of the
file, and a sorted file (sequential file) keeps the records ordered by the value of a
particular field, called a sort key. Selection of file organization depends on the
following two factors:
• Typical DBMS applications need a small subset of the database at any given
time.
• When a portion of the data is needed it must be located on disk, copied to
memory for processing and rewritten to disk if the data was modified.

1.7 STRUCTURE OF A DBMS


Database management systems are very complex systems. To understand general
database concepts and the structure and capabilities of a DBMS, it is useful to
examine the architecture of a typical database management system.
There are two different ways to interpret the architecture of a typical DBMS:
the logical DBMS architecture that deals with the way data is stored and presented
to users and the physical DBMS architecture that is concerned with the software
components that make up a DBMS.
1.7.1 Logical DBMS Architecture
The logical architecture of a DBMS is known as the three-level architecture. It was
suggested by the American National Standards Institute/Standards Planning and
Requirements Committee (ANSI/SPARC).
The logical architecture describes how users perceive data in a database. A
DBMS provides the user with an abstract view of data. Users can access and
manipulate data without worrying about where it is located or how it is actually
stored and maintained. This is done by defining levels of abstraction. There are
three levels of abstraction, which are as follows:
• Physical or internal level is the lowest level of abstraction. It describes
how data is actually stored on physical media. It provides a low-level
description of the physical database.
• Logical or conceptual level is the next higher level of abstraction. It
describes what data is stored and how data is interrelated.
• External or view level is the highest level of abstraction as seen by a user.
It provides a window on the conceptual view, which allows the user to see
only the data of interest to them, that is, this level of abstraction describes
only a part of the entire database or a subset of the database.

10 Self-Instructional Material
Figure 1.1 Shows the logical architecture of a typical DBMS. Introduction to Database
Management Systems and
E–R Model

NOTES

Physical storage

Fig. 1.1 Logical Architecture of DBMS

The three-level database architecture allows a clear separation of data


representation as the users see it and the physical data structure layout. This separation
of different views of data is flexible and adaptable. This flexibility and adaptability
is known as data independence.
Since a schema defines each view, there exist several schemas in the database
partitioned according to the levels of abstraction. The internal view is expressed by
the internal schema, which contains the definition of the stored record, the method
of representing the data fields and the access aids used. The conceptual schema
defines this conceptual view. There is only one conceptual schema per database.
Each external view is described by means of a schema called an external schema or
a subschema. Any number of external schemas can be defined and they can overlap
each other.
1.7.2 Mapping between the Levels
Mapping is the transformation of requests and results between different levels of
abstraction.
Mapping can be conceptual/internal; or external/conceptual mapping.
The conceptual/internal mapping exists between the conceptual and internal
levels. It defines the correspondence between the records and the fields of the
conceptual view and the files and data structures of the internal view.
If a modification is made to the structure of the stored database, then
accordingly, a change must be made in the conceptual/internal mapping to ensure
that the view from the conceptual level remains as it is. In other words, if the physical
structure of the database gets modified, the DBMS has knowledge of these
modifications and continues to provide the same logical view as before the changes.
This is called physical data independence.
The conceptual/external mapping exists between the external and conceptual
levels. This defines the correspondence between a particular external view and the
conceptual view. If the structure of the database is modified at the conceptual level,
then the external/conceptual mapping must also change accordingly so that the
Self-Instructional Material 13
Introduction to Database view from the external level remains constant. This mapping provides logical data
Management Systems and
E–R Model independence for the database.
Two types of data independence can be defined with respect to the three-
level architecture, that is, logical data independence and physical data independence.
NOTES The ability to modify the conceptual schema without modifying the external schemas
or application programs is called logical data interdependence. At this level, changes
are usually made to improve performance.
The ability to modify the internal schema without changing the conceptual
schemas or external schemas is known as physical data interdependence.
If the conceptual view is separated from the internal view, it allows the
provision of a logical description of the database without any need for specifying
physical structures.
Modifications to the internal schema may be required because some physical
files need reorganization. This is usually done when the logical database structure
is modified. Sometimes, it is needed to make a change in the logical structure of the
data.
1.7.3 Physical DBMS Architecture
The related and interconnected software components of a DBMS are described by
the physical architecture. At an extremely basic level, the physical DBMS architecture
can be split into two parts: the back end and the front end. Management of the
physical database and provision of relevant support and mappings for the internal,
external and conceptual levels is the responsibility of the back end. In addition, the
back end is also responsible for the other advantages of a DBMS, such as access
control, security and integrity. (See Figure 1.2)
End-User

Database User
Utilities Applications
Front End

API

Query Language Processor


Parser Optimizer

DBMS Engine
Back End
Transaction File Manager
Manager

Physical
Database

Fig. 1.2 Physical DBMS Architecture

12 Self-Instructional Material
The front end is really just any application that runs on top of the DBMS and Introduction to Database
Management Systems and
acts as a user interface. These may be applications provided by the DBMS vendor, E–R Model
the user or a third party.
The back end can be further divided into the functional software components,
which are as follows: NOTES
Data manipulation language (DML) precompiler: It converts DML
statements embedded in an application program to normal procedure calls in a host
language. Through the DML precompiler, DML commands and application programs
written in host language are separated. DML commands are sent to the DML
interpreter for translating into object code for database access and the rest of the
program is sent to the compiler of the host language. Object codes for the commands
and the rest of the program are combined together through linking and sent to the
DBMS engine (also called database manager) for execution. The precompiler must
interact with the query processor in order to generate the appropriate code. The
precompiler interacts with the query processor.
DML compiler: It translates DML statements into low-level instructions that
a query processor understands.
Data definition language (DDL) interpreter or compiler: It interprets DDL
statements and records definitions into data dictionary.
Query evaluation engine: It executes low-level instructions generated by
the DML compiler.
It mainly deals with solving all problems related to queries and query
processing. It helps the database system simplify and facilitate access to data.
Database manager: A program module providing an interface between low-
level data (stored in the database) and the application programs and queries which
are submitted to the system.
The functions of the database manager include:
Efficient storage, retrieval and updation of data
• Interaction with the file manager
• Ensuring a state of consistency in the database irrespective of system
failures
• Maintenance of integrity controls and user authority for data accession
File manager: It manages allocation of disk space and data structures used
to represent information on disk.
In addition, several data structures are required for physical system
implementation.
• Data files: Data are stored in data files, which store the database itself.
• Data dictionary: Actually, this is a critical element in the DBMS.
The results of compilation of DDL statements are a set of tables that is
stored in a special file called data dictionary that documents data in a
database. A data dictionary contains metadata (data about data). Metadata
is data about the storage details of a database.
• Indices: To improve the performance of a DBMS, a set of access aids in
the form of indexes are usually provided in the database systems. An index
is a data structure that helps access data through a given part of their

Self-Instructional Material 13
Introduction to Database value. It provides fast access to data. There are several indexes and for
Management Systems and
E–R Model implementing indexes, many techniques are used; however, each technique
is specific to a particular database application. A DBMS provides
commands to build, maintain and destroy such indexes.
NOTES • Statistical data file: The query processor uses statistical data to optimize
queries.
• Log file: Each log record comprises the values for database items before
and after a modification, and it can be utilized for the purpose of recovery.

1.8 PEOPLE WHO WORK WITH DATABASES


Databases are related to DBMS that is available on popular platforms, such as
Structure Query Language (SQL) server and others. They interoperate over multiple
operating systems. Network users, called ‘web people’, are able to get the databases
over a network.
There are mainly two types of people who are engaged in work with databases.
They are known as database implementers and database end-users. The implementers
develop DBMS software, whereas the end-users use software written by database
application developers. Basically, database developers work for end-users to develop
software tools, such as DBMS vendors. DBMS vendors facilitate various types of
tools for users, such as report generation tools, spreadsheets providing charts, and
so on. Personal databases are maintained by individuals, whereas enterprise-based
and complex types of databases, such as banking sector databases are maintained
by Database Administrators (DBA). A DBA is responsible for designing the
conceptual and physical schemas, security and authorization, database availability,
recovery from failures and database tuning. In some exceptions, people are not
authorized to access data easily from important databases. Foxpro/Visual Foxpro
allows Macintosh users to get and share database files with individual users over a
network.
These days, the prime question is ‘Why do people work with databases?’
They can ask queries and prepare reports as per database queries. Some people pay
an initial fee to access databases to grow their businesses, such as private banks.
They take complete database comprising names and mobile/phone numbers of people
to sell new products to them.

1.9 OVERVIEW OF DATABASE DESIGN


Database design refers to a system that stores and retrieves data systematically
from the database. However, first you must know the description of a few basic
terms of DBMS so that you can design a suitable database as per the requirement of
an organization. These are as follows:
• Data: Data means known facts that can be recorded and used to produce
useful information, e.g., names, telephone numbers and addresses of the
employees of an organization.
• Database: A database is a collection of interrelated data. For example, you
may create a database by recording the names, telephone numbers and
addresses of employees in an indexed addressed book or on a diskette, using
14 Self-Instructional Material
a personal computer and software, such as MS ACCESS or MS EXCEL. Introduction to Database
Management Systems and
This recorded information is a collection of related data with an implicit E–R Model
meaning and hence is called a database.
• Defining a database: This involves specifying the data types, structures and
constants for data to be stored in the database. NOTES
• Constructing a database: This is a process of storing real data on some
storage medium with the help of DBMS.
• Manipulating a database: This involves performing functions, such as
querying the database to retrieve specific data, updating the database to reflect
changes made by the user and generating reports from data.
• Database system: The database and DBMS software together form a database
system.
To understand the basics of DBMS design, you must know the terms and
definitions that are used in DBMS technology. DBMS is a software program that
may run on a user machine or a server computer. DBMS accepts queries from users
and responds to these queries. The DBMS design has the following features:
• Structured data: DBMS enables you to structure the data as tables, records
or objects.
• Query language: DBMS provides a query language, such as SQL to
process the user requests.
• Multi-user access: DBMS allows users to access data stored in a database.
At the same time, it provides security features that restrict some users
from viewing or manipulating data.
• Data dictionary: DBMS provides a data dictionary that contains the
structure of a database.
• Data abstraction: It allows a DBA to logically separate data from the
programs that use the data. There are three levels of abstraction in a
database: external, conceptual and internal. The external level represents
the user view of the database; the conceptual level allows you to map the
internal and external levels; and the external level represents the operating
system and DBMS level.

1.10 ENTITIES
Any ‘thing’, that is, an object, a being or an event that has an independent existence
in the real world is called an entity. The entity may have a physical existence in the
real world, such as in the case of a house, a bicycle or an employee. It may also be
an object with a conceptual existence, for example, a company, a job or a university
course.

1.11 ATTRIBUTES AND ENTITY SETS


An entity is described by a set of properties called attributes. An employee entity
might have attributes, such as employee code, employee name, salary, date of joining,
etc.

Self-Instructional Material 15
Introduction to Database An entity set is a set of entities of the same type that share the same properties
Management Systems and
E–R Model or attributes.
The individual entities that constitute a set are said to be the extension or
instance of the entity set. As all entities in an entity set have the same attributes,
NOTES entity sets also share the attributes of the contained entities. Therefore, attributes
are descriptive properties possessed by each member of an entity set.
In other words, an entity is determined by its instantiations. A particular
instance of an attribute is a value. For example, ‘Hari Nandan Tunga’ is one value
of the attribute Name. The value of the attribute can be different for each entity in a
set. A set of permitted values exists for each attribute. This is called domain or a
value set of that attribute. The domain of the attribute employee_name, for instance,
may be the set of all text strings of a particular length.
Types of Attributes
An attribute, as used in the E–R model, can be characterized by the following attribute
types: Figure 1.3 shows the different types of attributes.
• Simple vs composite: A simple attribute has an atomic value, i.e., an attribute
that cannot be decomposed into subparts. For example, employee code, salary
of an employee. In contrast, composite attributes can be divided into subparts
(that is, other attributes). For example, employee-name could be structured
as a composite attribute consisting of first-name, middle-initial and last-name.
Composite attributes help us to group together related attributes making
modelling more readable and flexible. The advantage of defining a composite
attribute for an entity is that a user can refer to the entire composite attribute
as well as only a component of the attribute according to the need.
• Single-valued vs multi-valued: An attribute is said to be single-valued if it
has a single value for any instance. There may be instances where an attribute
has a set of values for a specific entity. This type of attribute is said to be
multi-valued. For example, an employee may have several degrees or he or
she may have several different phones, each with its own number. It is to be
noted that a single-valued attribute that changes value over time does not
imply that it is multi-valued. A single-valued attribute is not necessarily a
simple attribute.
• Stored vs derived: Although most of the attributes will be stored, there will
be some attributes that can be derived or computed from other related
attributes. A derived attribute need not be physically stored within the database;
instead, it is derived by using an algorithm. For example, a person’s AGE
attribute can be derived by subtracting the date of birth (DOB) from the current
date. Similarly, the annual salary of an employee can be derived by multiplying
the salary by twelve.
• Null attributes: When a certain entity does not possess a value for an attribute,
a NULL value is used. This will mean ‘not applicable’ or that the ‘value is
unknown’ or ‘non-existent’. A null value for ‘apt_number’ attribute, for
example, could indicate that the address is not inclusive of an apartment
number or that there is an apartment number which is not known to us, or
that it is unknown whether an apartment number is included in the employee’s
address or not.
16 Self-Instructional Material
Multivalued Introduction to Database
attribute Management Systems and
E–R Model

Derived attribute
Age
Door no. NOTES
Ph no. Address

Street
Employee
City

Date of
Emp_id birth
Name

Composite Attribute

Key Simple Attribute

Fig. 1.3 E–R Diagram showing Different Types of Attributes

Attributes of an entity set may be divided into the following two groups:
• Identifiers: A set of one or more attributes that uniquely identifies an instance
of an entity. These are also termed as key attributes; for example, employee
code attribute of the EMPLOYEE entity set.
• Descriptors: These attributes provide a non-unique characteristic of an entity
instance. These are said to be non-key attributes; for example, employee name,
salary, date of join, etc., attributes of the EMPLOYEE entity set.

1.12 RELATIONSHIPS AND RELATIONSHIP SETS


A relationship represents an association between two or more entities. For example,
we can define a relationship that associates an employee, ‘Jayanta Dutta’ whose
employee code is ‘E01’ with project ‘P01’. The relationship specifies that the
employee E01, i.e., ‘Jayanta Dutta’ is assigned on project ‘P01’.
A relationship type or relationship set is a set of ‘similar in kind’ relationships
among one or more entities. A relationship set is a set of relationships of the same
type.
Classifying Relationships
The degree of a relationship is the number of entities associated with the relationship.
The n-ary relationship is the general form for degree n. Relationships are unary
(recursive), binary and ternary where the degree is 1, 2 and 3, respectively. (See
Figure 1.4)
The mapping of associated entity instances in a relationship is described by
the connectivity of the relationship.
The actual number of related occurrences for each of the two entities is called
the cardinality of the relationship.

Self-Instructional Material 17
Introduction to Database The number of relationships wherein an entity can participate in a particular
Management Systems and
E–R Model relationship type is specified by the cardinality ratio. In the most common case of
binary relationships, the cardinality ratios are as follows:
• One-to-one (1:1)
NOTES
• One-to-many (1:M or 1:*)
• Many-to-one (M:1 or *:1)
• Many-to-many (M:N)
one-to-one one-to-many

A B A B

many-to-one many-to-many
A B A B

A associated with (no restrictions)


at most one entity
in set B

Fig. 1.4 Binary Relationships

One-to-one: A one-to-one relationship type (1-1 or 1:1) means that there is


at the most one entity from each type participating in the relationship.

1
1 Head
Teacher _ of School

Partial

Total

Fig. 1.5 E–R Diagram showing Non-uniform Binary 1:1 Relationship

18 Self-Instructional Material
Figure 1.5 depicts non-uniform binary 1:1 relationship. As the relationship, Introduction to Database
Management Systems and
Head_of, is connected to two entities, Teacher and School, so it is a binary E–R Model
relationship. Again, a school possesses only one single head master who belongs to
the Teacher entity set. On the other hand, one person cannot act as a head master of
more than one school. So at the most one entity from both entity sets can be associated NOTES
with the relationship Head_of. Observe that every teacher cannot act as a head
master of a school. Therefore, there are some teachers besides the head master. So,
Teacher entity participates partially in the relationship but as every school possesses
at least one head master and more than one teacher. Hence, the entity School
participates totally.
Figure 1.6 is an example of a uniform binary 1:1 relationship:

Employee Chair
Sits_on

Total
Total

Fig. 1.6 E–R Diagram showing Uniform Binary 1:1 Relationship

In Figure 1.6, we see that Sits_on is a binary relationship as it involves two


entity sets. Since, at a time, only one employee can sit on a chair, so only one entity
from this set can participate in the relationship Sits_on. Only one employee can use
one chair. As only one entity of the Chair entity set is involved in the relationship, it
is a binary 1:1 relationship. As each employee has a chair to sit and every chair is
meant for one employee only, the participation here is total on both sides.
One-to-many: A relationship wherein a single entity of a single entity type
can be related to multiple entities of another type is called a one-to-many relationship
type (1-N or 1:N). However, each entity of the other type is related to one entity of
the first type, at the most. (see Figures 1.7 (a) and (b).
There exists a one-to-many relationship between the entity Manager and the
entity Employee, because there are several employees working under a manager.
The reverse of this relationship would be a many-to-one relationship, that is,
Employee to Manager. This is because, one manager has several employees working
under him. If there is an entity set Employee, for instance, there could be just a
single instance of the entity set Manager to whom that employee directly reports
(the assumption here is that no employee reports to more than one manager).

1 Manages N
Manager Employees

Fig. 1.7 (a) E–R Diagram showing Binary 1:N Relationship

Self-Instructional Material 19
Introduction to Database
Management Systems andE–
R Model 1
Employees N Work_ Manager
under

NOTES

Fig. 1.7 (b) E–R Diagram showing Binary N:1 Relationship

Many-to-many: A many-to-many relationship type (M-N or M:N) is one in


which a single entity of one entity type is related to N entities of another type and
vice versa. Figure 1.8 shows the M-N relationship.
The relationship between the entity EMPLOYEE and the entity PROJECT
can be understood as follows: Each employee could be involved in several projects;
various employees could be working on a given project. This EMPLOYEE—
PROJECT relationship is a many-to-one relationship.

M N
Employee Works_ Project
for

Fig. 1.8 E–R Diagram showing Binary M:N Relationship

Unary Relationship
Up to now, we have worked with binary relationships only. Now, we shall observe
unary relationship where only one entity participates in the relationship. Like binary
relationship, unary relationship can also be subdivided into the following categories:
• Unary 1:1
• Unary 1:N
• Unary M:N
Unary 1:1

Wife

Employee Married_to

Husband

Fig. 1.9 E–R Diagram showing Unary 1:1 Relationship

20 Self-Instructional Material
In Figure 1.9, an employee is married to another employee. Both employees Introduction to Database
Management Systems and
here belong to the same Employee entity. So only one entity Employee is associated E–R Model
with the relationship set Married_to. If we assume that one employee can get married
to only one employee, then it becomes a unary 1:1 relationship.
Unary 1:N NOTES

Manager

1
Manager_
Employee
of

Subordinate

Fig. 1.10 E–R Diagram showing Unary 1:N Relationship

Here, Employee is an entity set to which belongs an employee who is


manager_of other employees belonging to the same Employee set. So here, the
employee entity who is the manager is associated with more than one employee.
Thus, it is an unary 1:N relationship (here we assume that an employee works under
only one manager). Figures 1.10 and 1.11 depict 1:N relationship and unary M:N
relationship respectively.
Unary M:N

M
Guarantor
Employee
_of

Fig. 1.11 E–R Diagram showing Unary M:N Relationship


Self-Instructional Material 21
Introduction to Database Ternary Relationship
Management Systems and
E–R Model

Medicine
NOTES

Doctor Prescribes Patient

Fig. 1.12 E–R Diagram showing Ternary Relationship

In ternary relationships, (see Figure 1.12) more than two entities are involved.
In this example, Prescribes is a ternary relationship that associates three entity sets,
Doctor, Medicine and Patient.

1.13 ADDITIONAL FEATURES OF E–R MODEL


The E–R model is beneficial for a database designer in various ways. The following
are the additional features of the E–R model:
• The constructs used in the E–R model can easily be transformed into relational
tables.
• The E–R model is simple and easy to understand and is, therefore, used by a
database designer to communicate a database design to users accessing the
database.
• It is used as a design plan by a database developer to implement a data model
in database management system.
Observe some of the constructs in the E–R model that allow describing some
subtle properties of data. The expressiveness of the E–R model is a big reason for
its widespread use. The other additional features are as follows:
1.13.1 Works-in-Relationship
Consider the works-in-relationship shown in Figure 1.13. An employee can work in
several departments, and a department can have several employees, as illustrated in
the works-in-relationship shown in the figure. Employee 231-31-5368 has worked
in Department 51 since 3/3/93 and in Department 56 since 2/2/92. Department 51
has two employees. Now consider another relationship set called Manages between
the Employees and Departments entity sets such that each department has at the
most one manager, although a single employee is allowed to manage more than one
department. The restriction that each department has at the most one manager is an
example of a key constraint, and it implies that each Department entity appears in
22 Self-Instructional Material
at most one Manages relationship in any allowable instance of Manages. This Introduction to Database
Management Systems and
restriction is indicated in the E–R diagram of the figure by using an arrow from E–R Model
Departments to Manages. Intuitively, the arrow states that given a Departments
entity, we can uniquely determine the Manages relationship in which it appears.
NOTES
since
name dname
ssn lot did budget

Employees Manages Departments

Fig.1.13 Key Constraint in E–R Model

A relationship set like Manages is sometimes said to be one-to-many, to


indicate that one employee can be associated with many departments (in the capacity
of a manager), whereas each department can be associated with at most one employee
as its manager. In contrast, the works-in-relationship set, in which an employee is
allowed to work in several departments and a department is allowed to have several
employees, is said to be many-to-many. If we add the restriction that each employee
can manage at most one department to the Manages relationship set, which would
be indicated by adding an arrow from Employees to Manages, we have a one-to-
one relationship set.
1.13.2 Maintaining Ternary Relationships
If an entity Set E has a key constraint in a relationship Set R, each entity in an
instance of E appears in at most one relationship in (a corresponding instance of) R.
To indicate a key constraint on entity Set E in relationship Set R, we draw an arrow
from E to R. Ternary relationship with key constraints. Each employee works in at
the most one department and at a single location. Each department can be associated
with several employees and locations, and each location can be associated with
several departments and employees, however, each employee is associated with a
single department and location.

1.13.3 Working with Participation Constraints


The key constraint on Manages tells us that a department has at the most one manager.
A logical question to ask is whether every department has a manager. Let us say that
every department is required to have a manager. This requirement is an example of
a participation constraint; the participation of the entity set Departments in the
relationship set Manages is said to be total. A participation that is not total is said to
be partial. As an example, the participation of the entity set Employees in Manages
is partial, since not every employee gets to manage a department. Revisiting the
works-in-relationship set, it is natural to expect that each employee works in at
least one department and that each department has at least one employee. It means
that the participation of both Employees and Departments in Works In is total. If
the participation of an entity set in a relationship set is total, a thick line connects
the two; independently, the presence of an arrow indicates a key constraint.

Self-Instructional Material 23
Introduction to Database 1.13.4 Key Constraints
Management Systems and
E–R Model
We can define constraints on the super-types relationships to restrict the participation
of entities to the subtypes.
NOTES • Inclusion Constraints
o The disjoint constraint specifies that the subtypes of a super type are disjoint.
This means that an entity can be a member of only one subtype.
o The non-disjoint (OVERLAPING) constraints specify that the subtypes are
overlapping, and an entity may be a member of more than one subtype.
Based on inclusion constraint, a generalization hierarchy can either be
overlapping or disjoint. In an overlapping hierarchy an entity instance can be part
of multiple subtypes. It is quite possible an employee may be manager as well as
secretary in a company. In a disjoint hierarchy, an entity instance can be in only one
subtype. For example, the entity EMPLOYEE, may have two subtypes,
PERMANENT and CONTRACT- BASIS. An employee may be one type or the
other but not both.
• Completeness Constraints
A total specialization constraint specifies that every entity in the super class must
be a member of some of its subclasses. For example, a student must belong to one
of the subclasses of Post-graduate and Undergraduate. (see Figure 1.14).

Employee Work For Department

Secretary Engineer Technician

Professional
Belongs To
Organization

Fig. 1.14 Completeness Constraints

Cardinality constraint
It specifies lower and upper bounds on the number of relationships each entity can
participate in.
Maximum cardinality: The numbers inside the relationship show the
maximum number of entities that can occur on one side of the relationship.
Minimum cardinality: It is the minimum number of entities that can occur
on one side of the relationship.
The mapping cardinality of a relationship set depends on the real world
relationships it is modelling. Cardinality constraint is denoted by minimum
cardinality and maximum cardinality pair enclosed within parenthesis. Figure 1.15
depicts this.

24 Self-Instructional Material
Introduction to Database
Management Systems and
EMPLOYEE Are_in DEPT E–R Model

(1,N) (1,N)
NOTES
(0,N) (0,1)

SUPERVISE

Fig. 1.15 E–R Diagram showing Maximum and Minimum Cardinality

1.14 CONCEPTUAL DESIGN WITH E–R MODEL


The E–R model facilitates database design by enabling the designer to express the
logical properties of the database in an enterprise schema. Identification of real-
world objects, referred to as entities, forms the basis of this model. These entities
are described by their attributes and are connected by relationships among them.
The E–R model has its own set of symbols for drawing the E–R diagram that depicts
the logical model of the database.
Regardless of the database system selected, the enterprise schema will be
valid. It is capable of remaining constant even if the DBMS is modified. The E–R
model enables you to express restrictions or constraints on the entities or
relationships.
To take an example, officials of the domestic airport in Delhi faced a problem
of inconsistency in storing of the information related to passengers and airplanes
stationed at the airport. The changes made by an official to the information stored in
his own database were not reflected instantaneously to other officials, which in turn
caused inconvenience to passengers because other officials also made different
changes to the same information at the same time in their own databases.
Moreover, it was difficult for the officials to update their databases according
to the changes made by others. To overcome this problem, it was decided to organize
the information using DBMS. Therefore, a centralized database was to be designed
for which a DBA was hired.
Conceptual Design of Centralized Database
Before actually designing the centralized database for airport management, the DBA
was first required to create and finalize the conceptual design of the database. The
DBA used the E–R model to develop the conceptual design of database as expected. To
meet the aforesaid requirements of the database, the DBA collected and organized all
the information about airplanes stationed and maintained at the airport.

Self-Instructional Material 25
Introduction to Database Figure 1.16 shows the E–R model for airport management.
Management Systems and
E–R Model

NOTES

Fig. 1.16 E–R Model for Airport Management

The database for airport management includes the following tables:


• Airport info: This table is used to store the information related to the
departure and arrival at airports of different airplanes. Table 1.1 shows
the structure of the airport information table.
Table 1.1 Airport Information

Field Name Description


Airplane_id It defines the id of the airplane.
(Foreign key)
Airline It stores the name of the airline
to which an airplane belongs.
Weekdays It specifies the days of the
departure and arrival of an
airplane.
Arrival_time It is the time at which an
airplane arrives at an airport.
Departure_time It is the time at which an
airplane departs from an
airport.
26 Self-Instructional Material
Conclusion Introduction to Database
Management Systems and
E–R Model
The DBA submitted this conceptual design of the database to the airport authority
that helped the officials to:
• Visualize the flow of processes from one segment to another. NOTES
• Analyse the flow hierarchy of the processes and find out the problems that
may arise in the future.
• Design an error-free database that in turn will help to maintain the relationship
among various tables of the database.
Efficiently maintain the information and update the tables if required.

1.15 OVERVIEW OF UNIFIED MODELLING


LANGUAGES
Developed by the end of the 1990s, the unified modelling language (UML) is a
standardized general-purpose modelling language. It uses a set of graphical notations
to create abstract models of specific systems. It is an open method used for
specification, visualization, construction and documentation of the system under
development. It handles artifacts of an object-oriented system. This modelling
language has a standard way to write blueprints of a system. This includes few
conceptual components, such as actors, business processes, system components
and activities. These are briefly explained as follows:
• Actors: An actor specifies a role played by a user or a system interacting
with a subject. It models a role played by an entity in interacting with the
subject that is external to the subject. Roles played by human users, external
hardware or other subjects are also represented by actors. The actor need not
represent a specific physical entity. A particular facet (i.e., ‘role’) of some
entity, relevant to the specification, is also known as actor. A single physical
instance may take the role of different actors and a given actor may have
multiple instances.
• Business processes: Customers’ need may cause a business process to start
and to end when a customer’s need is fulfilled. But this is a continuous process,
since new customers appear and it starts again. A business process is a
collection of related and structured activities producing either a specific service
or a specific product. The management process handles the business process
and it governs the system operation. Typical management processes include
‘Corporate Governance’, ‘Strategic Management’, ‘Operational Processes’
and ‘Supporting Processes’. Operational processes constitute the core business,
creating primary value for the business. These include purchasing,
manufacturing, marketing and sales. Supporting processes support the core
business processes like accounting, recruitment and technical support.
• System’s components and activities: In UML, a ‘component’ represents a
modular part of a system. This module encapsulates its content and is
replaceable within its environment. An ‘activity’ is a major task that must
take place for fulfilment of an operation contract and activities can be
represented in a diagram known as activity diagram.

Self-Instructional Material 27
Introduction to Database UML also includes certain concrete things such as:
Management Systems and
E–R Model • Programming language statements
• Database schemas
• Reusable software components
NOTES
The approach of UML is to combine the best practice from the concepts of
data modelling. These are: entity relationship diagrams, business modelling, object
modelling and component modelling. Business modelling shows workflow in a
business environment. Such modelling is applicable to all processes, throughout
the software development life cycle and can be used for implementing different
technologies. UML has utilized the concepts of the Object-modelling technique
(OMT) and Object-oriented software engineering (OOSE) by unifying them into a
single modelling language which are common to these and widely used. It is a
standard modelling language capable of modelling concurrent as well as distributed
systems. Although it is not an industrial standard, yet it is taking shape under the
Object Management Group (OMG).
Using Query View Transformation (QVT) like transformations a UML model
may be transformed to other representations, for example, Java, supported by the
OMG. QVT is a model-driven architecture, a standard for model transformation
and is defined by the Object Management Group.
To make it capable of supporting all OO methods, concepts from other OO
methods were integrated with UML. For this reason, this language is useful in many
engineering applications. It is applicable from single process, single user applications
to concurrent as well as distributed systems. This has made UML not only rich but
also vast.
This is an international standard, given by ISO/IEC 19501:2005 Information
technology —Open Distributed Processing—Unified Modelling Language (UML)
Version 1.4.2
1.15.1 Software Development Methods
UML is a modelling language and not a development method. However, it was
designed in such a way that it is compatible with object-oriented software
development methods. After the evolution of UML, some of these methods have
been reconstructed to match with the new notations like that of OMT, and new
methods based on UML. IBM’s Rational Unified Process (RUP) is best known for
this. Other UML-based methods are: Abstraction Method, Dynamic Systems
Development Method and others. These are designed with a view to provide more
specific solutions.

1.15.2 Modelling
The UML model consists of many diagrams, each making a partial graphical
representation of a system’s model. The model also has a ‘semantic backplane’—
written use cases, driving the elements in the model and diagrams.
UML diagrams of a system model have two different views:
(i) Static view: This is also known as structural view. This shows the static
structure of the system that uses objects, attributes, operations and
relationships. The structural view is composed of class diagrams and
composite structure diagrams.
28 Self-Instructional Material
(ii) Dynamic view: This view represents the dynamic behaviour of the Introduction to Database
Management Systems and
system. This shows collaborations among objects, along with changes E–R Model
in the internal states of objects. This comprises of sequence diagrams,
activity diagrams and state machine diagrams.
Exchange of UML models among UML tools requires the use of XMI NOTES
interchange format.
1.15.3 Diagrams Overview
UML has undergone many revisions and its version, known as UML 2.0, has thirteen
types of diagrams, six of them representing structural applications and the other
seven are behavioural, and these are divided into three categories. Four out of seven
diagrams that represent the general types of behaviour represent different aspects of
interaction. The hierarchical view of these diagrams is shown in Figure 1.17.

Diagram

Structure Behaviour
Diagram Diagram

Class Component Object Activity User Case State Machine


Diagram Diagram Diagram Diagram Diagram Diagram

Composit Deployment Package Interaction


Structure Diagram Diagram Diagram
Diagram
Sequence Interaction
Diagram Overview
Diagram
Communication Timing
Diagram Diagram

Fig. 1.17 Hierarchical view of UML Diagrams

The UML diagram contains comments, explaining usage, constraint or intent


like those of engineering drawings.
1.15.4 Structure Diagrams
Structure diagrams show essential things in the system being modelled. There are
six types of structural diagrams, as briefly explained here.
• Class diagram: It shows the classes in the system, their attributes and the
relationships among the classes.
• Component diagram: It shows the way a software system is broken down
into components showing dependencies among them.
• Composite structure diagram: It describes the internal structure of a class
along with possible collaborations among them.
• Deployment diagram: It indicates the hardware used in the system for
implementations and the environments in which execution takes place. This
also indicates artifacts deployed on the hardware system.
• Object diagram: It shows either a complete or partial view of the structure at
a particular time.

Self-Instructional Material 29
Introduction to Database • Package diagram: It shows the logical groupings into which a system is split
Management Systems and
E–R Model
up, showing dependencies among these groupings.
Since these show the structure of a system, they help to document the
architecture of software systems.
NOTES
1.15.5 Behaviour Diagrams
These diagrams emphasize on things that must happen in the modelled system.
These consist of:
• Activity diagram: It shows the business and operational processes. It also
shows the step-by-step workflows in a system. It shows the overall flow of
control.
• State machine diagram: It uses standardized notations for describing systems;
starting from computer programs up to business processes.
• Use case diagram: It shows the functionality in terms of actors. The goals of
actors are denoted as use cases and dependencies among these use cases are
also shown.
Behavioural diagrams are used to describe the functionality of software
systems.
Interaction diagrams
In terms of the set theory, a set of interaction diagrams is a subset of behaviour
diagrams. They emphasize the flow of data and control among the things in the
modelled system.
• Communication diagram: It shows the interactions among objects or parts as
a sequence of messages. It depicts a combination of information derived from
class, sequence, and use case diagrams and describe both, the static structure
as well as dynamic behaviour of a system.
• Interaction overview diagram: It is a type of activity diagram with nodes
representing interaction.
• Sequence diagram: It shows the way objects communicate in terms of a
sequence of messages and the life spans of objects in relation to those
messages.
• Timing diagrams: It is a specific type of interaction diagram showing timing
constraints.
The Protocol State Machine is a subvariant of the State Machine. It may be
used to model network communication protocols.
1.15.6 Meta Modelling
The Object Management Group (OMG) developed an architecture known as meta
modelling for defining the UML. It is called the Meta-Object Facility (MOF) and is
a standard for model-driven engineering. It is designed as a ‘four-layered
architecture’. There is a meta-meta model at the top layer known as M3 layer. This
model is the language used by MOF to build metamodels, known as M2-models.
M2-models describe elements of the M1-layer, and hence the model, known as M1-
models. The last layer is the M0-layer, also called data layer, and it describes real-
world objects.
30 Self-Instructional Material
Beyond the M3-model, MOF describes the means for creating and Introduction to Database
Management Systems and
manipulating model and meta models by defining an interface known as CORBA to E–R Model
describe those operations. Similarities are there between MOF’s M3-model and
UML structure model, MOF’s meta models are modelled as UML class diagrams. A
supporting standard for MOF is XMI, defining an XML-based exchange format for NOTES
MOF’s M3-, M2- or M1-Layer.
1.15.7 Criticism of UML
Although widely recognized and used as a modelling standard, it is criticized due to
certain deficiencies. Some of them are:
Language bloat: UML is unusually large and complex, containing many
diagrams and constructs that are not much in use and hence are redundant.
Weak visualization: It employs many line styles, graphically very similar
and their meanings reside in permutations of lines, (whether dotted or solid) and
arrow heads (whether open or closed, filled or empty). Thus, the same line style
may mean different things in different diagram types.
Problems in learning and adopting: Problems as stated above can make
learning UML problematic.

1.16 SUMMARY
In this unit, you have learned about DBMS. A database management system provides
software that allows application programs to deal with data field names regardless
of the field’s location within a record; record location within a file and file location
within a device. In a DBMS, all the files are integrated into one system, thus making
data management more efficient by providing centralized control of the operational
data. Database management systems are not only used in commercial applications,
but also in many scientific/engineering applications. Data is stored in data files that
store the database itself. You have also got a historical perspective of DBMS, such
as the navigational DBMS of the 1960s and the relational DBMS of the 1970s. It
familiarized you with the latest trends in DBMS.
You have learned about the features and advantages of DBMS. The three
main features of DBMS are: centralized data management, data independence and
data integration. This unit also discussed the method of describing and storing data
in DBMS transaction management. Databases store large amounts of information
that must be available for long periods of time for continuous access and processing.
In this unit, you have also learned about the structure of a DBMS. One of the
key features of the database approach is to bring about a clear separation between
the logical and physical structures of data. The term logical structure refers to the
way the programmers see it and the physical structure refers to the way data is
actually stored on storage medium. A data dictionary is a critical element in DBMS.
It can be treated as a special file that stores information about the organization and
usage of data contained in the database. This information is called meta data (which
means data about data). It is sometimes termed as a system catalog that documents
the data in the database. The result of compilation of DDL statements is a set of
tables that is stored in the data dictionary that documents data in a database. The
logical architecture of a DBMS is known as the three-level architecture and was
Self-Instructional Material 31
Introduction to Database suggested by the American National Standards Institute/Standards Planning and
Management Systems and
E–R Model Requirements Committee (ANSI/SPARC). The process of transforming requests
and results between different levels of abstraction is called mapping. There are
actually two types of mappings: conceptual/internal mapping and external/conceptual
NOTES mapping.
The other important topics that were dealt with in this unit are: people who
work with databases, overview of database design, entities and attributes and entity
sets. You have also learned about relationships and relationship sets. While a
relationship represents an association between two or more entities, a relationship
set is a set of ‘similar in kind’ relationships among one or more entities.
This unit also explained the additional features of the E–R model and the
conceptual design with the E–R model. The E–R model is used to facilitate database
design by allowing the designer to express the logical properties of the database in
an enterprise schema. An E–R diagram can graphically represent the overall logical
structure of a database. The relative simplicity and pictorial clarity of this
diagramming technique may well account for the widespread use of the E–R model.
Finally, you have learned about UML or unified modelling languages. It was
developed at the end of the 1990s and is a standardized general-purpose modelling
language. It uses a set of graphical notations to create abstract models of specific
systems. It is an open method used for specification, visualization, construction and
documentation of the system under development. It handles artifacts of an object-
oriented system. This modelling language has a standard of writing the blueprints
of a system and includes a few conceptual components such as actors, business
processes, system components and activities.

1.17 KEY TERMS


• Data item: The smallest unit of data that has meaning for its users.
• File: A collection of all occurrences (instances) of similar type of records,
i.e., it is a group of records of a single type.
• Database: A shared collection of logically-related data along with their
descriptions. It is designed to satisfy the information needs of an organization.
• Database management system: Also called DBMS, it is a collection of
interrelated data (usually called database) and a set of programs to access,
update and manage those data (which form part of a database management
system).
• Database application: A program or a set of programs that interacts with the
database at some point in its execution.
• Data manipulation language: Popularly known as DML, it is a language
that enables users to access or manipulate data.
• DML compiler: It translates DML statements into low-level instructions that
a query processor understands.
• DDL interpreter or compiler: It interprets DDL statements and records
definitions into data dictionary.
• Query evaluation engine: It executes low-level instructions generated by
the DML compiler. It mainly deals with solving all problems related to queries
32 Self-Instructional Material
and query processing. It helps the database system simplify and facilitate Introduction to Database
Management Systems and
access to data. E–R Model
• Database manager: It is program module that provides the interface between
the low-level data stored in the database and the application programs and
queries submitted to the system. NOTES

1.18 ANSWERS TO ‘CHECK YOUR PROGRESS’


1. Database is a system of software programs that controls organized data in
storing, managing and retrieving these data in a database. DBMS are
categorized according to data types and their structures.
2. Edgar Codd found the navigational models lacking in many features and
outlined a new approach towards database construction. It gave rise to
Relational Database Management System (RDBMS), called ‘A Relational
Model of Data for Large Shared Data Banks’.
3. A DBMS provides increased productivity for application development. The
application programmer is not required to build the functions that handle
issues, such as data integrity, security and concurrent access. All that a
programmer is required to do is implement the application business rules.
This makes the development of applications more convenient. It is also easier
to add functional modules than it is in file-based systems.
4. Following are the factors on which selection of file organization depends:
(i) Typical DBMS applications need a small subset of the database at any
given time and (ii) When a portion of the data is required it must be located
on disk, copied to memory for processing and rewritten to the disk again if
the data was modified.
5. There are three levels of abstraction, which are as follows:
• Physical or internal level is the lowest level of abstraction. It describes
how data is actually stored on a physical media. It provides a low-level
description of the physical database.
• Logical or conceptual level is the next higher level of abstraction. It
describes what data is stored and how data is interrelated.
• External or view level is the highest level of abstraction as seen by a user.
It provides a window to the conceptual view, which allows the user to see
only the data of interest to them. That is, this level of abstraction describes
only a part of the entire database or a subset of the database.
6. A database administrator (DBA) is the person (or a group of people)
responsible for the overall control of the database system. Once the database
is installed and starts functioning properly in a production environment of an
organization, the database administrator takes over charge and performs
specific DBA-related activities including:
• Database maintenance
• Database backup
• Grant of rights to database users
• Management of print jobs
• Ensuring of quality service to all users
Self-Instructional Material 33
Introduction to Database 7. There are mainly two types of people who work with databases. They are
Management Systems and
E–R Model known as database implementers and database end-users. These implementers
develop DBMS software, whereas the end-users use software written by
database application developers.
NOTES 8. Data means known facts that can be recorded and used to produce useful
information, for example, names, telephone numbers and addresses of the
employees of an organization.
9. Data abstraction allows a DBA to logically separate data from the programs
that use the data. There are three levels of abstraction in a database: external,
conceptual and internal. The external level represents the user view of the
database; the conceptual level allows you to map the internal and external
levels; and the external level represents the operating system and DBMS
level.
10. An entity is described by a set of properties called attributes. An employee
entity might have attributes, such as employee code, employee name, salary,
date of joining, and so on.
11. Attributes of an entity set may be divided into the following two groups:
• Identifiers: A set of one or more attributes that uniquely identifies an
instance of an entity. These are also termed as key attributes; For example,
employee code attribute of the EMPLOYEE entity set.
• Descriptors: These attributes provide a non-unique characteristic of an
entity instance. These are said to be non-key attributes. For example,
employee name, salary, date of joining etc., are attributes of the
EMPLOYEE entity set.
12. In ternary relationships, more than two entities are involved. For example,
Prescribes is a ternary relationship that associates three entity sets, Doctor,
Medicine and Patient.
13. The E–R model facilitates database design by enabling the designer to express
the logical properties of the database in an enterprise schema. Identification
of real-world objects, referred to as entities, forms the basis of this model.
These entities are described by their attributes and are connected by
relationships among them.
14. The E–R model is beneficial for a database designer in various ways. The
following are the additional features of the E–R model:
• The constructs used in the E–R model can easily be transformed into
relational tables.
• The E–R model is simple and easy to understand and is, therefore, used
by a database designer to communicate a database design to users accessing
the database.
It is used as a design plan by a database developer to implement a data model
in the database management system.
15. Developed towards the end of the 1990s, UML is a standardized general-
purpose modelling language. It uses a set of graphical notations to create
abstract models of specific systems. It is an open method used for specification,
visualization, construction and documentation of the system under
development. It handles artifacts of an object-oriented system.

34 Self-Instructional Material
16. Behavioural diagrams emphasize on things that must happen in the modelled Introduction to Database
Management Systems and
system. These consist of: E–R Model
• Activity diagram: It shows the business and operational processes. It also
shows the step-by-step workflows in a system. It even shows the overall
flow of control. NOTES
• State machine diagram: It uses standardized notations for describing
systems; starting from computer programs up to business processes.
• Use case diagram: It shows the functionality in terms of actors. Goals of
actors are denoted as use cases and dependencies among these use cases
are also shown.
Behavioural diagrams are used to describe the functionality of software
systems.
17. Although widely recognized and used as a modelling standard, UML is
criticized due to certain deficiencies. Some of them are:
• Language bloat: UML is unusually large and complex containing many
diagrams and constructs that are not much in use and are hence are
redundant.
• Weak visualization: It employs many line styles, graphically very similar
and their meanings reside in permutations of lines, (whether dotted or
solid) and arrow heads (whether open or closed, filled or empty). Thus,
the same line style may mean different things in different diagram types.
• Problems in learning and adopting: Problems as stated above can make
learning UML problematic.

1.19 QUESTIONS AND EXERCISES

Short-Answer Questions
1. What is a DBMS? Why do we need a DBMS?
2. What are the functions of DBMS?
3. Write a note on navigational DBMS of the 1960s.
4. What are the latest trends in RDBMS?
5. Why do we store data in a database instead of in a file?
6. What are the advantages of a database management system?
7. What do you understand by logical DBMS architecture?
8. Briefly explain the physical DBMS architecture.
9. What are attributes and entity sets? What are the differences between simple
and composite attributes?
10. Distinguish between single-valued and multi-valued attributes.
11. How are relationships classified in a DBMS?
12. What do you understand by works-in-relationship?
13. How are ternary relationships made?
14. What are unified modelling languages?
15. What is meta modelling?
Self-Instructional Material 35
Introduction to Database Long-Answer Questions
Management Systems and 1. Discuss the development of DBMS from the historical perspective.
E–R Model
2. Describe the approach for handling of data in the early file-based systems.
Discuss the disadvantages of this approach.
NOTES 3. Explain the general structure of a DBMS.
4. Write a note on the features of database design.
5. Explain attributes and entity sets.
6. Discuss relationships and relationship sets.
7. Draw a diagram of the three-schema architecture of a DBMS. How can it be
used to explain the concept of data independence?
8. What do you understand by database independence?
9. What are logical and physical data independence? What is the difference
between the two?
10. What are the different types of database users? Discuss the main architecture
of each.
11. Describe any five components of a DBMS environment and discuss how
they relate to each other.
12. Discuss the concept of data independence and explain its importance in a
database environment.
13. Explain the utility of E–R modelling.
14. In the context of an E–R relationship diagram, diagrammatically represent at
least three examples for each of the following types of relationships: (a) One-
One (b) One-Many (c) Many-Many.
15. Draw an E–R diagrams of an university, a library and a banking system.
16. Design and draw an E–R diagram for a database given the following: Airplanes
have a model design (Phantom, Tomcat, Crusader, etc.), a unique side number,
total flight hours, 14 pilots have certifications (pilot, carrier, check pilot, etc.),
social security numbers, total hours and names. Pilots are scheduled to fly
airplanes on specific dates, only the latest of which is recorded.
17. Draw an ER diagram for a database catalog. A database contains tables and
attributes. Each table has a name and a set of attributes. Two tables in the
database cannot have the same name. Each attribute has a name and a type.
Two attributes in the same table cannot have the same name; however, two
attributes in different tables can have the same name.
18. Construct an E–R diagram for a hospital with a set of medical doctors and a
set of patients. Associate with each patient a log of the various tests and
examinations conducted.

1.20 LEARNING OUTCOMES


• The file system with DBMS
• The advantages of DBMS
• The structure of a DBMS
• Know the people who work with databases
• Get an overview of database design
• Understand the concept of entities

36 Self-Instructional Material
Relational Model and

UNIT 2 RELATIONAL MODEL AND Query Languages

QUERY LANGUAGES
NOTES
Structure
2.0 Introduction
2.1 Unit Objectives
2.2 Integrity Constraints over Relations
2.2.1 Primary Key
2.2.2 Foreign Key
2.2.3 General Constraints
2.3 E–R Model to Relational Model
2.3.1 Entity Sets to Tables
2.3.2 Translating
2.3.3 Relationship Sets with Key Constraints
2.4 Translating Relationship Sets with Participation Constraints
2.5 Translating Weak Entity Sets
2.6 Translating Class Hierarchies
2.7 Translating E–R Diagrams with Aggregation
2.8 Introduction to Query Languages
2.9 Relational Algebra
2.9.1 Primitive Operations
2.9.2 Selection Operation
2.9.3 Projection Operation
2.10 Set Operations
2.10.1 Joins
2.11 Renaming
2.12 Joins
2.13 Division
2.14 Summary
2.15 Key Terms
2.16 Answers to ‘Check Your Progress’
2.17 Questions and Exercises
2.18 Learning Outcomes

2.0 INTRODUCTION
In this unit, you will learn about integrity constraints over relations. Integrity
constraints provide a way of ensuring that changes made to a database by authorized
users do not result in data inconsistency. In this regard, the concept of primary key,
foreign key and general constraints will also be explained. You will also be able to
compare the E–R model with the relational model. The E–R model facilitates
database design by enabling the designer to express the logical properties of the
database in an enterprise schema. Identification of real-world objects, referred to as
entities, forms the basis of this model.
You will learn how to translate relationship sets with participation constraints.
The participation constraint specifies whether the existence of an entity is dependent
on the existence of another entity to which it must be related. This unit will also
examine how to translate weak entity sets and class hierarchies. You will learn that
Self-Instructional Material 37
Relational Model and aggregation is a special kind of association that specifies a whole or part relationship
Query Languages
between the aggregate (whole) and a component (part).
You will be introduced to query languages, in this unit. An information retrieval
query language queries the database. It is a language of a special type, where
NOTES definition of semantics of the query are not according to a precise formal syntax;
rather, it is based on an interpretation of the most suitable results of the query put to
the information base. This unit will also discuss relational algebra, which is used
widely in computer science. It is based on the algebra of sets and is an extension of
the first-order logic. The other topics this unit will deal with are set operators,
renaming, joins and division.

2.1 UNIT OBJECTIVES


After going through this unit, you will be able to:
• Understand integrity constraints over relations
• Compare the E–R model with the relational model
• Translate relationship sets with participation constraints
• Translate weak entity sets
• Translate class hierarchies
• Translate E–R diagrams with aggregation
• Understand query languages
• Explain relational algebra
• Describe set operations
• Explain the concept of renaming
• Know more about joins
• Understand the concept of division

2.2 INTEGRITY CONSTRAINTS OVER RELATIONS


Integrity constraints offer a method of ensuring that changes made to the database,
by authorized users, do not cause a loss of data consistency.
An integrity constraint can be any arbitrary predicate applied to the database.
Constraints may be costly to evaluate, so only those integrity constraints are
considered that can be tested with minimal overhead. Data integrity allows defining
of certain data quality requirements that the data in the database needs to meet. The
following are the types of integrity constraints:
• Not Null
A column in a table can be specified not null. It is not possible to insert a null in
such a column. The default is null.
• Unique Key
The unique constraint does not allow duplicate values in a column. If the unique
constraint encompasses two or more columns, no two equal combinations are
allowed. A unique constraint can be extended over multiple columns.
38 Self-Instructional Material
• Primary Key Relational Model and
Query Languages
On a technical level, a primary key combines a unique and a not null constraint.
Additionally, a table can have at the most one primary key. After creating a primary
key, it can be referenced by a foreign key.
NOTES
• Foreign Key
A foreign key constraint also called referential integrity constraint on a column
ensures that the value in that column is found in the primary key of another table.
• Check
A check constraint allows the stating of a minimum requirement for the value in a
column. If more complicated requirements are desired, an insert trigger must be
used. It is also possible to state a check constraint that checks the value of more
than one column.
These constraints arise frequently. Every relation arising from a relationship
set has integrity constraints.
En

R Em

E1 E2

Fig. 2.1 Relationship with Integrity Constraints

Relationship with n number of entities can be defined for various constraints.


One of the entities should be designated as primary key and the other as foreign key.
2.2.1 Primary Key
From the set of candidate keys for a given relation, exactly one is chosen as the
primary key, for primary means of identifying a tuple in a relation; the remainder, if
any, are called alternate keys.
An alternate key is thus a candidate key that is not the primary key. Primary
keys provide the basic tuple-level addressing mechanism in a relational system.
Prime and non-prime attributes: An attribute of relation schema R is called a
prime attribute of R if it is a member of any key of R. An attribute is called non-
prime if it is not a prime attribute; that is, if it is not a member of any candidate key
of R.

2.2.2 Foreign Key


A foreign key is an attribute (may be composite) of one relation R2 whose values are
required to match those of the primary key of some relation R1 (R1 and R2 not
necessarily distinct). We refer to the relation that contains the foreign key as the
referencing relation and the relation that contains the corresponding candidate key
(more strictly primary key) as the referenced relation.
Attribute FK of base relation R2 will qualify as a foreign key only if it satisfies
the following two time-independent properties:
Self-Instructional Material 39
Relational Model and • There exists a base relation R 1with primary key PK such that each non-
Query Languages
null value of FK is identical to the value of PK in some tuple of R1.
• Each value of FK is either wholly null or wholly non-null.
Suppose there is a referencing relation, R1, and a referenced relation, R2.
NOTES Tuples in the referencing relation, R1, have attributes FK (called foreign key
attributes) that reference the primary key attributes of the referenced relation, R2. A
tuple, t1, in R1 is said to reference a tuple, t2, in R2 if t1[FK] = t2[PK].
2.2.3 General Constraints
The accuracy and consistency of the data is referred to as data integrity. It defines
the relationally correct state of a database. Constraints are rules that govern the
behaviour of the data at all times in a database. In order for a database to have
integrity (to ensure that users perform only operations, which leave the database in
a correct, consistent state), it needs to obey several rules called ‘Integrity Constraints’.
Integrity constraints are specified on the database and are expected to hold
on every instance of that schema. Integrity constraints provide a means for ensuring
that modifications made to the database by authorized users do not result in data
inconsistency.
Integrity constraints on a database may be divided into two types:
• Static integrity constraints: These constraints define the valid states of
the data, and they include designations of primary keys, etc.
• Dynamic integrity constraints: These constraints define the side effects
of various kinds of transactions (e.g., insertions and deletions).

2.3 E–R MODEL TO RELATIONAL MODEL


The Entity–Relationship model facilitates database design by enabling the designer
to express the logical properties of the database in an enterprise schema. Identification
of real-world objects referred to as entities, forms the basis of this model. These
entities are described by their attributes and are connected by relationships among
them. The E–R model has its own set of symbols for drawing the E–R diagram
which depicts the logical model of the database.
Regardless of the database system selected, the enterprise schema will be
valid. It is capable of remaining constant even if the DBMS is modified. The E–R
model enables you to express restrictions or constraints on the entities or
relationships.
2.3.1 Entity Sets to Tables
Entities are collected to form entity sets. These entities are depicted as rectangles in
E–R diagrams. Their attributes are depicted as ellipses attached to the rectangles by
lines.
A mechanical procedure for converting an E–R diagram to a relational schema
is described here. The student-section-course database is used as an example:

40 Self-Instructional Material
Relational Model and
Student Faculty Query Languages

Is NOTES
Is teaching
taking

Is
Section a section Course
of

Fig. 2.2 E–R Diagram for Student-Section-Course Database

Create a relation for each entity set. It can use the same name and same set of
attributes as the entity set. The key of the entity set is used as the primary key of the
relation. For example,
Student(ssn, last, first, street address, city, state, country, zip)
Course(dept, number, title, description, credits)
Section(crn, days, time, room)
Faculty(ssn, last, first, title, dept, office, phone)
Relationships with Entity Sets
Many-to-many relationship
The attributes of this relation are the attributes of the relationship, together with the
primary keys of the two related entity sets. The primary key of the relation combines
the primary keys of the related entity sets.
In the database in Figure 2.2, the ‘is taking’ relationship would be converted
into
Enrolled(ssn, crn, gradeOption, credits)
In the case of a recursive many-to-many relationship, it is necessary to rename
the primary keys borrowed from the related entity sets. For example, if we want to
implement the relationship between courses ‘in the prerequisite of’, we would create
a relation as follows:
Prerequisite(prereqDept, prereqNumber, successorDept, successorNumber)
One-to-many relationships
There are two ways to handle this type of relationship. One is to follow essentially
the same approach as with the many-to-many relationship, by creating a new relation
to represent the relationship. For example, for the ‘is teaching’ relationship, we
would create a relation
Teaches(crn, ssn)
The primary key of the relation is just the primary key of the section, not the
combination of both primary keys.
Self-Instructional Material 41
Relational Model and An alternative is to simply include the primary key of the ‘one’ entity set in
Query Languages
the relation representing the ‘many’ entity set. In this case, we would simply add
the instructor ssn to the section relation:
Section (crn, days, time, room, instructorSsn)
NOTES
One-to-one relationships
As with the one-to-many relationships, they can be implemented by adding a new
relation or by including the primary key of one entity set in the relation representing
the other one.
n-ary relationships
Create a relation representing the relationship. Include the primary key of each
participating entity set. Any attributes of the relationship become attributes of the
relation. For example, if students, majors and faculty advisors are related by a ternary
‘advises in’ relationship, we would create a relation:
Advises(studentSsn, advisorSsn, major)
Multi-valued attributes
Suppose, you are using a version of the E–R model that permits multi-valued
attributes. Then, you might model ‘major’ as a multi-valued attribute of ‘student’.
The relational model does not permit non-atomic attributes. Therefore, to implement
this attribute you need to create a new relation to represent it. A tuple of the relation
contains a value of the attribute and the primary key of the owning entity.
For example:
MajorsIn (studentSsn, major)
This is the primary key of this relation .
In some cases, particularly where it is known that there are at most two values
of the attribute, it may be preferable to define two attributes in the owning entity,
rather than creating a separate relation. For example, a student might have a local
address and a home address.
2.3.2 Translating
One of the fastest growing areas in computer science today is the development and
application of DBMSs. These systems collect and maintain data in user-specified
relationships in support of various interrelated applications. They provide a high-
level interface between application programs and the actual data to isolate the user
from such operations as physical I/O, buffering and pointer manipulation. As new
database management systems are being developed with greater capabilities and
features; users of existing DBMS may wish to convert their application programs
and the databases to operate on the new system software. If the decision to upgrade
is made, all of the data and application programs have to be migrated to the new
DBMS. Migration of DBMS involves all aspects connected with the stored data.
The schema of the data, the data definition language, the application procedures as
well as the physical data items must be converted in the same fashion, so that the
new DBMS functions as desired. Migration is an expensive undertaking especially
if a different manufacturer’s hardware is to be acquired. In the past, when users

42 Self-Instructional Material
converted from one file organization to another file organization (for example: Relational Model and
Query Languages
Honeywell’s ISP to IBM’s ISAM), specialized programs were written to translate
the data between file systems. In addition, all the old application programs were re-
written or modified so that they would operate on the new file organization. All
these entail huge expenditure. However, in future, it is foreseeable that even greater NOTES
expenses will be incurred in order to change data from an existing DBMS to a new
DBMS.

2.3.3 Relationship Sets with Key Constraints


You can also represent a relationship set R, in an E–R diagram by using a table.
Consider that R is a relationship set and x 1, x2…..xn is the set of attributes formed
by the union of the primary keys of the entity sets related by R. Also, if y1, y2……yn
are the attributes of R, then R can be represented by a table, r, consisting of columns
formed by union of x1,x2….xn and y1,y2……yn. Here, Number of columns in r = {x1,
x …….xn} {y , y ……y }.
2 1 2 n

For example, Figure 2.3 shows the E–R diagram representing the relationship,
Borrower, between Cust and Loan entity sets.

Cust_ssn Cust_phone no
Loan_no
Loan_amount
Cust_name
Cust_city

Cust Borrower Loan

Fig. 2.3 E–R Diagram representing the Relationship,


Borrower, between Customers and Loan Entity Sets

In the above ERD, consisting of Cust and Loan entity sets, the primary key of
the Cust entity set is Cust_ssn and the primary key of the Loan entity set is Loan_no.
Also, the relationship does not contain any descriptive attributes. Therefore, you
can represent the relationship Borrower by a borrower table consisting of Cust_ssn
and Loan_no. column.
The Table 2.1 lists the tabular representation of the Borrower relationship
set.

Table 2.1 Borrower Table

cust_ssn loan_no
S65243 25000
S65244 30000
S65245 35000

Candidate keys in a relation schema need to be unique for every tuple in any
relation instance of that schema. If a key value in the new tuple t pre-exists in
another tuple in the relation r(R), the key constraints can be violated. The relationship
Self-Instructional Material 43
Relational Model and constraint, is often superior because it avoids creating a distinct table for the
Query Languages
relationship set. The idea is to include the information about the relationship set in
the table corresponding to the entity set with the key taking advantage of the key
constraint. This approach eliminates the need for a separate relationship and queries
NOTES from a combination of two relations. If a relationship set involved n entity sets and
some m sets linked via arrows in the E–R diagram, the relation corresponding to
any one of m sets can be augmented to capture the relationship.

2.4 TRANSLATING RELATIONSHIP SETS WITH


PARTICIPATION CONSTRAINTS
The participation constraint specifies whether the existence of an entity is dependent
on the existence of another entity to which it must be related. There are two types of
participation constraints:
• Total: It indicates that the existence of an entity depends on being related to
another entity.
• Partial: It indicates that the existence of an entity is not dependent on being
related to another entity.
In E–R diagrams, partial participation is denoted by single lines and total participation
is denoted by double lines (or thick lines)

Head
Teacher School
of

Partial

Total

Fig. 2.4 E–R Diagram showing Partial and Total Participation

Figure 2.4 illustrates the concept of total and partial participation. Each school
must have one head master. So, the entity School participates in the relationship
Head of totally. On the contrary, from among many teachers in a school, only one is
chosen to be the head master. So here, the entity Teacher participates partially in the
relationship.

2.5 TRANSLATING WEAK ENTITY SETS


A weak entity can be identified only by considering the primary key of another
(owner) entity. The owner entity set and weak entity set must participate in a one-
to-many relationship set (one owner, many weak entities). The weak entity set must
have total participation in this identifying relationship set.
44 Self-Instructional Material
A relationship set translates a table. The steps to translate it are as follows: Relational Model and
Query Languages
Keys of connected entity sets Columns Attributes of the relationship set, if any Columns.

Multiplicity of the relationship set determines the key of the table. Each entity
set is translated into a table with the keys on the following basis: NOTES
• Basic cases: Entity set E = relation with attributes of E– relationship R =
relation with attributes being keys of related entity sets + attributes of R
• Special cases: Combining two relations– translating weak entity sets–
translating is–a relationships and subclasses. Weak entity set cannot exist
alone.
The weak entity is set on the following factors:
• It is represented as a table.
• It has a key that becomes a key for the table.
• All single-valued attributes become attributes of the table.
• It creates a new table for any multi-valued attributes; if employee has
many offices, add an extra table.
The following represents the weak entity set:
• To build a table/schema for the weak entity set, construct a table with one
column for each attribute in the weak entity set.
• Remember to include discriminator.
• Augment one extra column on the right side of the table; put in there the
primary key of the strong entity set (the entity set that the weak entity set
depends on).
• Primary key of the weak entity set = Discriminator + foreign key.
Translating entity sets to logical schemas and SQL DDL:
CREATE TABLE STUDENTS
(sid INTEGER,
name VARCHAR(15)
PRIMARY KEY (sid) )
CREATE TABLE COURSES
(cid CHAR(15),
subj VARCHAR(30),
sem INTEGER,
PRIMARY KEY (cid) )
A schema is generated with attributes and key(s) of each associated entity
(foreign keys):
CREATE TABLE Takes
(sid INTEGER,
cid CHAR(15),
exp-grade CHAR(1),
PRIMARY KEY (?),

Self-Instructional Material 45
Relational Model and FOREIGN KEY (cid) REFERENCES COURSES,
Query Languages
FOREIGN KEY (sid) REFERENCES STUDENTS)
Weak entity set and identifying relationship set are translated into a single
NOTES table; when the owner entity is deleted, all owned weak entities must also be deleted.
CREATE TABLE Feed_Pets (
name VARCHAR(20),
species INTEGER,
weeklyCost REAL,
ssn CHAR(11) NOT NULL,
PRIMARY KEY (pname, ssn),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE)
The procedure just explained for representing entity sets does not work for
weak entity sets, because they do not contain their own primary keys as attributes.
Some allowance must be made to handle this situation.
The solution is simply to copy the needed key attributes from related entities
into the relation representing the weak entity. For example, if departments are
modelled as entities, and the department code is part of the key of courses, then the
courses become weak entities. The department code must be included as an attribute
of course, so it can be used as part of the primary key.

2.6 TRANSLATING CLASS HIERARCHIES


Class hierarchies are translated by the E–R diagram. This contains the following
arrangement, as shown in Table 2.2.
Table 2.2 Factors to Translate Class Hierarchies

A row of table A relational instance and tuple


A column of table An attribute
A table A schema or relation
Cardinality Number of rows
Degree Number of columns

Basically, two general approaches depend on disjoints and completeness. For


non-disjoint and non-complete class hierarchy:
• Create a table for each super class entity set according to the normal entity
set translation method.
• Create a table for each subclass entity set with a column for each of the
attributes of that entity set plus one for each attribute of the primary key
of the super class entity set.
The primary key assigned from super class entity set can be used as primary
key for the new assigned created table. (See Figure 2.5).
• Do not create a table for the super class entity set.

46 Self-Instructional Material
• Create a table for each subclass entity set; include all attributes of that Relational Model and
Query Languages
subclass entity set and attributes of the super class entity set. Inheritance
states that one entity is a ‘special kind’ of another entity.
• ‘Subclass’ should be member of ‘base class’.
NOTES
a b c d

M R N

isa

e O S P h

g
f

Fig. 2.5 E–R Diagram for Translating Class

Translate the E–R diagram in Figure 2.5 to relations using the first method
and then create a relation for each class and sub-class in the given hierarchy.
Algorithm for creating the E–R Diagram to Translate Class
The following are the steps of the algorithm to create a table and translate the class:
• N is an entity, so you would create a table for it: N(c,d)
• P is an entity, so you would create a table for it: P(h,g)
• Since M is a weak entity, you create one table for it and R, which contains the
key of N as a key: M_R (a,b,c), where c is a foreign key of N. Because R is
a weak entity, we must delete an M_R tuple if the corresponding N tuples
disappear.
• Now at this stage, you create a relation for O, which must include the key of
M. The key of M includes the key of N since it is a weak entity, resulting in:
O(e,a,c), where a and c are foreign keys of M_R. The c is really a foreign key
of N, but since the requirements are that you must refer to the entire key of a
table, you must have it refer to M_R’s key, rather than N’s.
• S is a many-to-many relationship, so you would create a table for it which
includes the attributes of S and the keys of O and P, which together form the
primary key of S: S(f,a,c,g), where a and c are foreign key references to O,
and g is a foreign key reference to P.
Translating this into the required SQL, you will be able to define the hierarchy.
The algorithm for creating is set logically and is predefined.
CREATE TABLE N ( c integer,
d integer,
PRIMARY KEY (c))
CREATE TABLE P ( h integer,

Self-Instructional Material 47
Relational Model and g integer,
Query Languages
PRIMARY KEY (g))
CREATE TABLE M_R ( a integer,
NOTES b integer,
c integer,
PRIMARY KEY (a,c),
FOREIGN KEY (c) REFERENCES N ON DELETE
CASCADE)
CREATE TABLE O( e integer,
a integer,
c integer,
PRIMARY KEY(a,c),
FOREIGN KEY (a,c) REFERENCES M_R)
CREATE TABLE S ( f integer,
a integer,
c integer,
g integer,
PRIMARY KEY(a,c,g),
FOREIGN KEY (a,c) REFERENCES O),
FOREIGN KEY (g) REFERENCES P)

2.7 TRANSLATING E–R DIAGRAMS WITH


AGGREGATION
One drawback of the E–R model is that there is no way to establish relationships
among relationships (See Figure 2.6).

Name Id Hours Number

Employee Work Project

Uses

Machinery

Id

Fig. 2.6 E–R model


48 Self-Instructional Material
It appears as if Work and Uses could be combined into one relationship but it would Relational Model and
Query Languages
obscure the logical structure of the schema. The best use of aggregation is in the
form of an abstraction where relationships are considered as higher-level entities.
(See Figure 2.7).
NOTES
WORK

Name Id Hours Number

Employee Work Project

Uses

Machinery

Id

Fig. 2.7 Work and Uses

2.8 INTRODUCTION TO QUERY LANGUAGES


Query means asking questions or making enquiries. A computer language that makes
a query is a query language. There are many query languages. These languages
make queries to databases as well as information systems. In fact, an information
system is also a special kind of database for storing information that is retrieved
using a query language. Thus, there are two types of query languages—database
query languages and information retrieval query languages (IRQL).
An information retrieval query language makes queries into a database.
Information retrieval query language is a special syntax where definition of semantics
of the query are not according to a precise formal syntax. Rather, it is based on an
interpretation of the most suitable results of the query put to the information base.
The following are some of the query languages:
Datalog: It is a query language that defines rules for retrieval from deductive
databases. Syntactically, it is a subset of Prolog, which is a language for logic
programming, associated with artificial intelligence and computational linguistics.
This term was used by a group of researchers, in the mid 1980s, working for the
development of a database theory.
QL: It is a proprietary object-oriented query language. This language is for
retrieval of data from relational databases. It is a combination of features of the
standard query language SQL and Java—the object-oriented programming language.
In fact, ‘QL’ is an object-oriented extension of Datalog, which is a programming
language in logic. In this query language, hierarchical data is queried in a recursive
manner.

Self-Instructional Material 49
Relational Model and Common Query Language (CQL): This language, developed and maintained
Query Languages
by the Z39.50 Maintenance Agency, is a formal language for queries into search
engines, bibliographic catalogues, etc., and is based on the semantics of Z39.50.
The design objective of CQL is to make these queries readable by humans and
NOTES writable and intuitive, yet maintaining the expressiveness of complex queries.
D: Although not a language in itself, it comprises a set of requirements given
by Christopher J. Date and Hugh Darwen and is considered a query language for
relational database management systems (TRDBMS).
Structured Query Language (SQL): It is a query language, well known for
relational databases.
MDX: It is a query language for online analytical processing (OLAP)
databases. OLAP is part of business intelligence, encompassing relational reporting
and data mining. OLAP is generally applied in: (i) Business reporting for sales and
marketing, (ii) Management reporting, (iii) Business process management (BPM),
(iv) Budgeting and forecasting, (v) Financial reporting and similar areas.
Data Mining Extensions (DMX): It is a query language used in data mining
models. The syntax of DMX is more similar to SQL than MDX.
Entity Relationship Role-Oriented Language (ERROL): It is a declarative
query language for database query and manipulation. It is a query language working
on the Entity–Relationship model (ERM) using constructs like those of natural
languages, English and others. It is the most suited to relational models and based
on E–R diagrams, described by the use of simple natural language (NL) sentences.
These simple NL sentences can be combined to form compound or complex sentences
to describe complex operations upon ERM database.
Gellish English: It is a formal language used for queries in Gellish English
databases. It is used for dialogues and also for modelling information and
knowledge.Gellish English is structured and a subset of natural English, formalized
in such a way that it can be interpreted by a computer. It is a standard for modelling
information and knowledge representation.
Information Systems Base Language (ISBL): It is a query language invented
for PRTV, one of the earliest database management systems that implemented
relational model.
Lightweight Directory Access Protocol (LDAP): It is used to query and
modify directory services running TCP/IP. The telephone directory is the most
common example which has a series of names organized alphabetically, with each
name having an address and phone number attached.
Molecular Query Language (MQL): It is a query language for a substructure
search for numerical as well as nominal properties. MQL was designed for complex
and problem-specific search methods.
Object Query Language (OQL): It is a standard for object-oriented databases
modelled after SQL and was developed by the Object Data Management Group
(ODMG). Its overall complexity prevented many vendors from complete
implementation of OQL.
Object Constraint Language (OCL): It is a declarative language used to
describe rules that find application in UML models and are now part of the UML

50 Self-Instructional Material
standard. It is a key component of the new OMG standard recommendation for Relational Model and
Query Languages
Queries/Views/Transformations (QVT) transforming models.
QUEL: It is an access language for relational database, and is similar to
SQL, in most ways.
NOTES
SPARQL Protocol and RDF Query Language (SPARQL): It is an RDF
query language for graphs, and it has a name which is a recursive acronym.
Standardized by the RDF Data Access Working Group (DAWG) of the World Wide
Web Consortium, it is considered a component of the semantic web.
Xquery: It is a query language for data sources that belong to XML and have
a few features of a programming language, designed to query collections of XML
data. It has semantic similarity to SQL.
Xpath: It is a language designed to navigate XML documents.
XSQL: It is the combination of Extensible Markup Language (XML) and
Structured Query Language (SQL) to provide a language and database independent
means for storing SQL queries, clauses and query results. XSQL development is
still in its infancy and welcomes suggestions for improvement (especially in the
form of patches).

2.9 RELATIONAL ALGEBRA


In pure mathematics, relational algebra has an algebraic structure, having relevance
to set theory and mathematical logic.
Relational algebra became prominent after the relational model of database
was published in 1970 by Codds, who proposed this algebra as a foundation for
database query languages. Relational algebra, is widely used in, and now, is a part
of computer science. It is based on the algebra of sets and is an extension of first-
order logic. It is concerned with a set of relations, closed under operators which
operate on one or more relations, yielding another relation.
Relational algebra has similar power of expression as relational calculus and
first-order logic. To avoid a mismatch between relational calculus and algebra, Codd
restricted the operands of this algebra to finite relations only, and he put restricted
support for NOT and OR operators. Similar restrictions are also found in other
computer languages that use logic-based constructs. A new term was defined by
Codd as ‘relational completeness’, which refers to attributes of a language that is
complete with respect to first-order predicate calculus, and also follows restrictions
imposed by Codd.
2.9.1 Primitive Operations
There are a few operators in algebraic systems which are primitive, while others are
defined in terms of the primitive ones. Codd made a similar arbitrary choice for his
algebra. There are six primitive operators of relational algebra, proposed by Codd.
These are:
1. Selection
2. Projection
3. Cartesian product (also called the cross product or cross join)

Self-Instructional Material 51
Relational Model and 4. Set union
Query Languages
5. Set difference
6. Rename
NOTES These six operators are fundamental and omission of any one loses the
expressive power of relational expressions. Many other operators have been defined
in terms of these six. Among the most important are set operations, division and the
natural join.
So, the following operations of relational algebra will be dealt with:
1. Selection
2. Projection
3. Set operations
4. Renaming
5. Joins
6. Division
2.9.2 Selection Operation
A generalized selection is a unary operation. It is written as a propositional formula
consisting of atoms in the normal selection and with logical operator conjunction,
disjunction and negation. This selection selects all those tuples in R for which it
holds.
In relational algebra, a selection is written as σaθb(R) or σaθv(R) where:
• stands for selection
• a and b denote attribute names
• θ shows binary operation
• v denotes a value constant
• R stands for relation
The selection σaθb(R) selects all tuples in R for which θ holds between attributes
‘a’ and ‘b’.
The selection σaθv(R) also selects all tuples in R for which θ holds between the
attribute ‘a’ and the value v.
For example, consider the following diagram in which the first table is about
the relation named Person, the second table shows the result of Age e” 34 (Person),
and the third table has the result for σAge = Weight(Person).
Person Age ? 34(Person) σAge = Weight(Person)

Name Age Weight Name Age Weight Name Age Weight

Rohan 34 70 Rohan 34 70 Sonia 50 54

Manju 25 55 Sonia 50 54

Sohan 29 70 Patel 32 60

Sonia 50 54

Patel 32 60

52 Self-Instructional Material
Semantics of the selection is shown mathematically as: Relational Model and
Query Languages

NOTES
2.9.3 Projection Operation
A projection is mathematically written as a1…..,an (R), where a1,...,an is a set of
attribute names. It is a unary operation. The resultant set from such an operation, is
a set, obtained when all tuples in R contain the set {a1,...,an}. All other attributes are
discarded.
For example, if we use two attributes—name and age—written as (name,
age), then the projection of the relation {(Raman, 5), (Ratan, 8)} attribute field list
(age), yields {5, 8}and age is discarded. It only gives the value of the field age. If
we project (5, 8) only the second component will give 8.
Projection in relational algebra is like a counterpart of existential quantification
in predicate logic. Existentially quantified variables are attributes excluded, i.e.,
corresponding to existentially quantified variables in the predicate, and their
extensions are represented by its projection. Thus, projection is defined as excluded
attributes. In ISBL notations for both have been provided. Other languages have
followed ISBL.
Instance of this concept exists in the category of monoids. The projection of
a string, with removal of one or more letters of an alphabet in the string is a monoid
and is a projection as well.
Let us take an example. There are two relations presented in the following
two diagrams: one is the relation Person and its projection on the attributes Age and
Weight:
Person πAge,Weight(Person)

Name Age Weight Age Weight

Hari 30 70 30 70

Shila 26 62 26 62

Ganesh 27 75 27 75

Sonia 50 55 50 55

Patel 30 70

If the name is N and age is A and weight is W, then the predicate is, ‘N is A
years old and weighs W.’ Projection of this is the predicate, ‘There exists N such
that N is A years old and weighs W.’
In this example, Hari and Patel are of the same age, and they have the same
weight but (name, age) combination appears only once in the result. This is because
it is a relation.
Mathematically, semantics of projection are expressed as follows:

a1,...,a (R) = {t[a1,..., an ] : t R}


n
Self-Instructional Material 53
Relational Model and where t[a1,...,an] indicates the restriction of the tuple t to the set {a1,...,an}
Query Languages
which is mathematically represented as;

NOTES Such a projection, a1,...,an (R), is defined only if {a1,...,an} Header(R). The
header contains attributes and Header (R) is a set of all attributes. Projection on nil
attribute is also possible. This yields a relation of degree zero.
Relational algebra has identical power of expression, like that of domain
relational calculus or tuple relational calculus, but it has less expressive power
first-order predicate calculus without function symbols. Expression wise, relational
algebra is a subset of first-order logic. These represent horn clauses with no recursion
and no negation.

2.10 SET OPERATIONS


Three basic operators have been taken from six basic operators of the set theory, but
with some differences. Some additional constraints are present in their forms as
adopted in relational algebra. For operations of union and difference, these two
relations must have the same set of attributes which is a property known as union-
compatibility. In the operation of difference, set intersection is used and this too
should be union-compatible.
The Cartesian product is also adopted differently from the set theory. Here,
tuples are taken to be ‘shallow’ for the purposes of the operation. Unlike the set
theory, the 2-tuple of Cartesian product in relational algebra has been ‘flattened’
into an n+m-tuple. Mathematically, R × S is expressed as follows:
R S = {r s| r R, s S}
In relational algebra, the two relations involved in the cartesian product must
have disjoint headers (unlike that of union and difference operations). Here, R and
S should be disjoint and, hence, should not have any common attribute.
Rename
A rename operation is mathematically expressed as ρa / b(R). This too, is a unary
operation. The result of ρa / b(R) is renaming of the ‘b’ field, in all tuples by an ‘a’
field. This operation is used to rename the attribute of a relation or even the relation,
R, itself.
2.10.1 Joins

Natural join
It is a binary operator, written as (R S), where R and S are two relations. The result
of this operation is the set of all tuple-combinations in R and S equal on their common
attribute names. For example, natural join of two tables named Employee and Dept
is (Employee Dept) as shown in Table 2.3.

54 Self-Instructional Material
Table 2.3 Two Natural Join Relational Model and
Query Languages

Employee Dept Employee Dept

NOTES
Name ID Dept_N Dept_N Manager Name ID Dept_N Manager

Hari 3411 Finance Finance Ganesh Hari 3411 Finance Ganesh

Shalini 2242 Sales Sales Hemant Shalini 2242 Sales Hemant

Ganesh 3403 Finance Production Charu Ganesh 3403 Finance Ganesh

Hemant 2207 Sales Hemant 2207 Sales Hemant

The natural join is the relational counterpart of logical AND, and has great
importance. The natural join permits combination of relations associated by a foreign
key. For example, in the previous table, a foreign key probably holds from
Employee.Dept_N to Dept.Dep_N, and then the natural join of Employee and Dept
combines every employee with their respective department. Note that this works
because the foreign key holds between attributes with the same name. If this is not
the case such as in the foreign key from Dept.manager to Emp.ID, then these columns
have to be named before the natural join is taken up. Such a join is also known as an
equijoin. Semantics of the natural join are given as follows:

where p is a predicate having truth value T for a binary relation denoted as r


if r is a functional binary relation. One constraint is put; and this requires that R and
S should have at least one common attribute. In the absence of this constraint, the
natural join becomes a Cartesian product.
θ-join and equijoin
Let there be two tables named Car and Boat. These tables list models of cars and
boats against their respective prices. Suppose that a customer intends to buy a car
and a boat without spending more money for the boat in comparison to the car. A θ-
join using the relation ‘CarPrice e’ BoatPrice will give a table having all the possible
options. Table 2.4 show it:

Self-Instructional Material 55
Relational Model and Table 2.4 θ-join and equijoin
Query Languages

Car >< Boat


Car Boat Car Price > Boat Price
NOTES
CarModel CarPrice BoatModel BoatPrice CarModel CarPrice BoatModel BoatPrice

CarA 20'000 Boat1 10'000 CarA 20'000 Boat1 10'000

CarB 30'000 Boat2 40'000 CarB 30'000 Boat1 10'000

CarC 50'000 Boat3 60'000 CarC 50'000 Boat1 10'000

CarC 50'000 Boat2 40'000

If you want to combine tuples from two relations where the combination
condition is not simply the equality of shared attributes, then it is convenient to
have a more general form of join operator, which is the θ-join (or theta-join). The θ-
join is a binary operator that is written as:

where a and b are attribute names, θ is a binary relation that


belongs to the set {<, d’, =, >, e”}, v is a value constant, and R and S are relations.
This operation gives all combinations of tuples in R and S satisfying the relation θ.
θ-join is defined for disjoint headers of S and R.
A fundamental operation is as follows:

θ-join becomes equijoin if operator θ becomes the equality operator (=). θ-


join is not needed for a computer language that supports the natural join and
rename operators, since this can be achieved by selection from the result of a
natural join. This degenerates to a Cartesian product in the absence of shared
attributes.
Semijoin
The semijoin is written as where R and S are relations. It is similar to the natural
join and its result is only the set of all tuples in R for which there is a tuple in S, and
it is equal on their common attribute names. For example, there are two tables
named Employee and Dept and semi join of these two follows in Table 2.5.

56 Self-Instructional Material
Table 2.5 Semijoin Relational Model and
Query Languages
Employee Dept
Employee Dept Employee Dept
NOTES
Name EmpId DeptName DeptName Manager Name EmpId DeptName

Hari 3411 Finance Sales Hemant Shalini 2242 Sales

Shalini 2242 Sales Production Charu Hemant 2207 Production

Ganesh 3403 Finance

Hemant 2207 Production

More formally, the semantics of the semijoin is defined as follows:

where fun( ) gives the definition of natural join.


The semijoin can follow from the natural join as given here:

where, a1, ..., an are the attribute names of R.


Natural join can be stimulated from basic operators and hence this is also
applicable for semijoin.
Antijoin
The antijoin, written as where R and S are relations. It has similarity to the
natural join. The result of an antijoin contains only those tuples in relation R which
are NOT a tuple in S that is equal on their common attribute names.
For example, in the Table 2.6 antijoin of two tables Employee and Dept are:
Table 2.6 Antijoin

Employee Dept Employee Dept

Name EmpId DeptName DeptName Manager Name EmpId DeptName

Hari 3411 Finance Sales Hari Hari 3411 Finance

Shalini 2242 Sales Production Charu Ganesh 3403 Finance

Ganesh 3403 Finance

Hemant 2207 Production

Formal definition of antijoin is:

w where fun( ) is same

as for natural join.


Self-Instructional Material 57
Relational Model and The antijoin is the complement of the semijoin, and is expressed as follows:
Query Languages

Thus, antijoin is the anti-semijoin, and the antijoin operator is expressed as


NOTES semijoin symbol with a bar above it, in place of .

2.11 RENAMING
Rename column(s) in a table is available in RDBMS Oracle 9i Release 2 onwards.
Starting in Oracle 9i Release 2, you can now rename a column. To rename a column
in an existing table, the ALTER TABLE syntax is as follows:
ALTER TABLE table_name
RENAME COLUMN <old_column_name> TO <new_column_name>;
For example:
ALTER TABLE emp RENAME COLUMN ename TO empname;
The column thus remains and is called ename to empname.
To Rename a Column in SQL
SQL> ALTER TABLE NEW_TEST RENAME COLUMN ID TO TEST_ID;
Table altered.
We can use the following command to rename an index. Let us check the existing
primary key constraint and corresponding index first.
SQL> select constraint_name, constraint_Type from
user_constraints where table_name = ‘NEW_TEST’;
CONSTRAINT_NAME C
PK_NEW P
SQL> select index_name, index_Type from user_indexes where
table_name = ‘NEW_TEST’;
INDEX_NAME INDEX_TYPE
PK_NEW NORMAL
The following constraint is renamed by using the following command:
SQL> ALTER TABLE new_test RENAME CONSTRAINT PK_NEW to
PK_NEW_TEST;
Table altered.
Now, if you re-run the above two SELECT statements to see the value of the
constraint and index, you will notice that the constraint has been renamed but not
the index. In Oracle, we need to rename the primary key constraint and its associated
index individually. You will be able to rename the index now.
SQL> ALTER INDEX PK_NEW RENAME TO PK_NEW_TEST;
Index altered.
When a table is renamed, its dependent objects become invalid. One can find out
the dependent objects of a table when the table is renamed, by retrieving information
from the user_dependecies view.
The following is the SQL to get the dependent objects so that we know in
which other places we need to make the corresponding change.
58 Self-Instructional Material
SELECT Type,Name Relational Model and
Query Languages
FROM USER_DEPENDENCIES
WHERE Referenced_Name = ‘NEW_TEST’
AND Referenced_Type = ‘TABLE’
NOTES
/
The output will be displayed as follows:
TYPE NAME
PROCEDURE TEST_PROC
TRIGGER TIB_NEW
It indicates that we need to change trigger and procedure.
When a column is renamed, the name of the procedures, views, triggers, etc.,
is not available through user_dependencies view. For this, you have to search in the
source code of the object for a matching column name. The following is the SQL to
get the dependent objects which depends on a column:
SELECT Type, Name, Line
FROM user_source
WHERE text like ‘%ID%’
/
Output is
TYPE NAME LINE
PROCEDURE TEST_PROC 5
TRIGGER TIB_NEW 6
TRIGGER TIB_NEW 7
The output also indicates the line number you need to look at to make
appropriate changes.

2.12 JOINS
SQL joins are used to query data from two or more tables, based on a relationship
between certain columns in these tables.
Joins between Tables
A query that accesses multiple rows of the same or different tables at one time is
called a join query.
Storage in multitable relations and hence multitable queries involving more
than one table are fundamental to relational databases.
To combine relations, i.e., to perform a join, we have to select data from the
tables, and relate the tables to each other with conditons on some attributes (often
keys attributes).
The JOIN keyword is used in an SQL statement to query data from two or
more tables, based on a relationship between certain columns in these tables.
Tables in a database are often related to each other with keys.

Self-Instructional Material 59
Relational Model and JOINS
Query Languages
• Join conditions should be specified in the WHERE clause where a minimum
of (n–1) conditions are required to join (n) tables.
NOTES • You must add table prefixes to columns.
• Using table aliases (30 character max) will save keystrokes!
• Table aliases are specified in the FROM clause.
Types of Joins
• Equijoin
• Non-Equijoin – uses any other comparison operator other than ( = )
• Outer Join – (+) can be on either side of the WHERE clause but must be on
side of the table without the rows in the condition. Only operators ( = ) &
AND can be used.
• Self Join
A primary key is a column (or a combination of columns) with a unique value
for each row. Each primary key value must be unique within the table. The purpose
is to bind data together, across tables, without repeating all of the data in every
table. (See Table 2.7).
Table 2.7 ‘Persons’ Table

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Note that the ‘P_Id’ column is the primary key in the ‘Persons’ table. This
means that no two rows can have the same P_Id. The P_Id distinguishes two persons
even if they have the same name. (See Table 2.8).
Table 2.8 ‘Orders’ Table

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 1
4 24562 1
5 34764 15

Note that the ‘O_Id’ column is the primary key in the ‘Orders’ table and that
the ‘P_Id’ column refers to the persons in the ‘Persons’ table without using their
names.
Notice that the relationship between the two tables above is the ‘P_Id’ column.
Different SQL JOINs
Before we continue with examples, we will list the types of JOINs you can use, and
the differences between them.
60 Self-Instructional Material
• JOINs: Return rows when there is at least one match in both tables Relational Model and
Query Languages
• LEFT JOINs: Return all rows from the left table, even if there are no matches
in the right table
• RIGHT JOINs: Return all rows from the right table, even if there are no NOTES
matches in the left table
• FULL JOINs: Return rows when there is a match in one of the tables
An equijoin is a join based on equality or matching column values. This
equality is indicated with an equal sign (=) as the comparison operator in the WHERE
clause, as Figure 2.8 shows.
SELECT * FROM manufact, stock
WHERE manufact.manu_code = stock.manu_code

Fig. 2.8 Query

Figure 2.8 joins the manufact and stock tables on the manu_code column. It
retrieves only those rows for which the values of the two columns are equal, some
of which are shown in Figure 2.9.
manu_code SMT
manu_name Smith
lead_time 3
stock_num 1
manu_code SMT
description baseball gloves
unit_price $450.00
unit case
unit_descr 10 gloves/case

manu_code SMT
manu_name Smith
lead_time 3
stock_num 5
manu_code SMT
description tennis racquet
unit_price $25.00
unit each
unit_descr each

manu_code SMT
manu_name Smith
lead_time 3
stock_num 6
manu_code SMT
description tennis ball
unit_price $36.00
unit case
unit_descr 24 cans/case

manu_code ANZ
manu_name Anza
lead_time 5
stock_num 5
manu_code ANZ
description tennis racquet
unit_price $19.80
unit each
unit_descr each

·
·
·

Fig. 2.9 Query Result


Self-Instructional Material 61
Relational Model and In this equijoin, Figure 2.9 includes the manu_code column from both the
Query Languages
manufact and stock tables because the select list requested every column.
You can also create an equi-join with additional constraints, where the
comparison condition is based on the inequality of values in the joined columns.
NOTES These joins use a relational operator in addition to the equal sign (=) in the comparison
condition that is specified in the WHERE clause.
To join tables that contain columns with the same name, qualify each column
name with the name of its table and a period symbol (.), as Figure 2.10 shows.
SELECT order_num, order_date, ship_date, cust_calls.*
FROM orders, cust_calls
WHERE call_dtime >= ship_date
AND cust_calls.customer_num = orders.customer_num
ORDER BY orders.customer_num

Fig. 2.10 Query

Figure 2.10 joins the customer_num column and then selects only those
rows where the call_dtime in the cust_calls table is greater than or equal to the
ship_date in the orders table. Figure 2.11 shows the combined rows that it returns.
order_num 1004
order_date 05/22/1998
ship_date 05/30/1998
customer_num 106
call_dtime 1998-06-12 08:20
user_id maryj
call_code D
call_descr Order received okay, but two of the cans of
ANZ tennis balls within the case were empty
res_dtime 1998-06-12 08:25
res_descr Authorized credit for two cans to customer,
issued apology. Called ANZ buyer to report
the qa problem.

order_num 1008
order_date 06/07/1998
ship_date 07/06/1998
customer_num 110
call_dtime 1998-07-07 10:24
user_id richc
call_code L
call_descr Order placed one month ago (6/7) not received.
res_dtime 1998-07-07 10:30
res_descr Checked with shipping (Ed Smith). Order out
yesterday-was waiting for goods from ANZ.
Next time will call with delay if necessary.

order_num 1023
order_date 07/24/1998
ship_date 07/30/1998
customer_num 127
call_dtime 1998-07-31 14:30
user_id maryj
call_code I
call_descr Received Hero watches (item # 304) instead
of ANZ watches
res_dtime
res_descr Sent memo to shipping to send ANZ item 304
to customer and pickup HRO watches. Should
be done tomorrow, 8/1

Fig. 2.11 Query Result


62 Self-Instructional Material
Inner Join = Equijoin: Inner joins or equijoins are the most common type of joins, Relational Model and
Query Languages
they use equality ‘=’ of common attributes to join tables:
SELECT * FROM table-1, table-2 WHERE condition;
or
SELECT * FROM table-1 [ NATURAL ] INNER JOIN table-2 NOTES
[ ON join_condition | USING ( join_column_list ) ]
join_condition: A qualification condition similar to the WHERE condition.
join_column_list: A USING column list ( a, b, ... ) is shorthand for the ON condition
left_table.a = right_table.a AND left_table.b = right_table.b ...
The join condition determines whether a join is an equijoin or a non-equijoin. An
equijoin is a join with a join condition containing an equality operator. An equi-join
combines rows that have equivalent values for the specified columns. When a join
condition relates two tables by an operator other than equality, it is a non-equijoin.
A query may contain equijoins as well as non-equijoins.
Equijoins are the most commonly used. An example of an equi-join is as
follows:

SELECT e.first_name, d.department_name


FROM employees e INNER JOIN departments d
ON e.department_id = d.department_id
/

FIRST_NAME DEPARTMENT_NAME

Steven Executive
Neena Executive
Lex Executive
Alexander IT
Bruce IT

The syntax will be the same as that of the inner join, but the operator will be not
equal ‘<>’.
For example:
SELECT projects.name AS "Project Name", teachers.name
FROM teachers, projects
WHERE teachers.id != projects.teacher;

Project Name | name


- + -
game | Volker
perl | Volker
compiler | Elke
xpaint | Elke
(4 rows) Self-Instructional Material 63
Relational Model and Non-equijoins are less frequently used. An example of a non-equijoin is as follows:
Query Languages

SELECT zip_codes.zip_code, zones.ID AS zip_zone,

NOTES zones.low_zip, zones.high_zip


FROM zones INNER JOIN zip_codes
ON zip_codes.zip_code BETWEEN zones.low_zip
AND zones.high_zip
/

ZIP_CODE ZIP_ZONE LOW_ZIP HIGH_ZIP


- - -
57000 1 57000 57999
84006 2 84000 84999

With table aliases, you can join a table to itself. In this case, the same table is given
two different alias names. Each alias then represents a different instance of the
table.
Example ‘projects where students 2 and 4 work together’:
SELECT DISTINCT *
FROM assign a, assign b
WHERE a.stud = 2 AND b.stud = 4 AND a.project = b.project;

project | stud | percentage | project | stud | percentage


+ +- + + + -
1 | 2 | 10 | 1 | 4 | 60
3 | 2 | 70 | 3 | 4 | 30
(2 rows)

This joins a table to itself. This table appears twice in the FROM clause and
is followed by table aliases that qualify column names in the join condition.
SELECT a.field1, b.field2 FROM R1 a, R1 b WHERE a.field
= b.field
A self join is a join of a table to itself. This table appears twice (or more) in
the FROM clause and is followed by table aliases that qualify column names in the
join condition and the SELECT clause. Take, for example, the employees table,
the manager of one employee is also an employee. The rows for both are in the
same employees table.
To get information about an employee and his/her manager, you have to join
the employee table to itself, thereby treating employees as if it were two separate
tables. The following example query uses a self join to return the name of each
employee along with the name of the employee’s manager:

64 Self-Instructional Material
Relational Model and
SELECT e.last_name employee, m.last_name manager Query Languages

FROM employees e INNER JOIN employees m


ON e.manager_id = m.employee_id;
NOTES

EMPLOYEE MANAGER

Kochhar King
De Haan King
Hunold De Haan
Ernst Hunold
Austin Hunold
Pataballa Hunold

...

106 row(s) retrieved

But

SELECT COUNT (*)


FROM employees;

COUNT(*)

107

Even though the employees table has 107 rows, the previous query returned
only 106 rows. This is because there is an employee without a manager_id. Oracle
excludes that employee’s row from the result set while performing the self inner
join. To include that employee without a manager, you need an outer join as follows:

Self-Instructional Material 65
Relational Model and
Query Languages SELECT e.last_name employee, m.last_name manager
FROM employees e LEFT OUTER JOIN employees m
ON e.manager_id = m.employee_id;
NOTES

EMPLOYEE MANAGER

King
Kochhar King
De Haan King
Hunold De Haan
Ernst Hunold
Austin Hunold
Pataballa Hunold

...

107 row(s) retrieved

The previous examples showed self equijoins. However, there are situations
when you need to perform self non-equijoins. For example, let’s assume that you
are in charge of organizing an interdepartmental soccer competition for the following
departments:

SELECT department_name
FROM departments
WHERE department_id IN (10, 20, 30, 40);

DEPARTMENT_NAME

Administration
Marketing
Purchasing
Human Resources

4 row(s) retrieved

You decide that each department plays against the other three departments
only once, which means that the output of the query you are going to write cannot
contain one combination Administration, Marketing and another Marketing,
Administration because that means that each department plays against the other
twice. The solution is to use a self non-equijoin query as follows:

66 Self-Instructional Material
Relational Model and
SELECT d1.department_name dept1, d2.department_name dept2 Query Languages

FROM departments d1 INNER JOIN departments d2


ON d1.department_id < d2.department_id
WHERE d1.department_id IN (10, 20, 30, 40) NOTES
AND d2.department_id IN (10, 20, 30, 40);

DEPT1 DEPT2

Administration Marketing
Administration Purchasing
Administration Human Resources
Marketing Purchasing
Marketing Human Resources
Purchasing Human Resources

6 row(s) retrieved

OUTER JOIN is used to join two tables even if there is not a match. An
OUTER JOIN can be used to return a list of all the customers and the orders even if
no orders have been placed for some of the customers. A keyword, RIGHT or LEFT,
is used to specify which side of the join returns all possible rows. I like using LEFT
because it makes sense to me to list the most important table first. Except for one
example demonstrating the RIGHT OUTER JOIN, we will use left joins. Just a
note: the keywords INNER and OUTER are optional.
The next example returns a list of all the customers and the SalesOrderID for
the orders that have been placed, if any.

SELECT c.CustomerID, s.SalesOrderID


FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON
c.CustomerID = s.CustomerID

It uses the LEFT keyword because the Sales.Customer table is located on the
left side and we want all rows returned from that table even if there is no match in
the Sales.SalesOrderHeader table. This is an important point. Notice also that the
CustomerID column is the primary key of the Sales.Customer table and a foreign
key in the Sales.SalesOrderHeader table. This means that there must be a valid
customer for every order placed. Writing a query that returns all orders and the
customers, if they match, doesn’t make sense. The LEFT table should always be the
primary key table when performing a LEFT OUTER JOIN.
If the location of the tables in the query are switched, the RIGHT keyword is
used and the same results are returned:

Self-Instructional Material 67
Relational Model and
Query Languages SELECT c.CustomerID, s.SalesOrderID
FROM Sales.SalesOrderHeader s
RIGHT OUTER JOIN Sales.Customer c ON c.CustomerID =
NOTES s.CustomerID

Left [Outer] Join


Returns all the rows from R1 even if there are no matches in R2. If there are no
matches in R2 then the R2 values will be shown as null.
SELECT * FROM R1 LEFT [OUTER] JOIN R2 ON R1.field =
R2.field

Right [Outer] Join


Returns all the rows from R2 even if there are no matches in R1. If there are no
matches in R1, then the R1 values will be shown as null.
SELECT * FROM R1 RIGHT [OUTER] JOIN R2 ON R1.field =
R2.field

Full [Outer] Join


Returns all the rows from both tables even if there are no matches in one of the
tables. If there are no matches in one of the tables then its values will be shown as
null.
SELECT * FROM R1 FULL [OUTER] JOIN R2 ON R1.field = R2.field

LEFT/RIGHT/FULL Outer Join:


An inner join excludes rows from either table that do not have a matching row in
the other table. An outer join combines the unmatched row in one of the tables with
an artificial row (all columns set to null) for the other table:
SELECT * FROM
table-1 [ NATURAL ] { LEFT | RIGHT | FULL } OUTER JOIN
table-2
[ ON join_condition | USING ( join_column_list ) ]
Join types:
• LEFT — only unmatched rows from the left side table (table-1) are retained
• RIGHT — only unmatched rows from the right side table (table-2) are retained
• FULL — unmatched rows from both tables (table-1 and table-2) are retained
LEFT OUTER JOIN returns all rows in the qualified Cartesian product (i.e., all
combined rows that pass its ON condition), plus one copy of each row in the left-
hand table for which there was no right-hand row that passed the ON condition.
This left-hand row is extended to the full width of the joined table by inserting
NULLs for the right-hand columns.
RIGHT OUTER JOIN returns all the joined rows, plus one row for each unmatched
right-hand row (extended with nulls on the left). This is just a notational convenience,
since you could convert it to a LEFT OUTER JOIN by switching the left and right
inputs.

68 Self-Instructional Material
FULL OUTER JOIN returns all the joined rows, plus one row for each unmatched Relational Model and
Query Languages
left-hand row (extended with nulls on the right), plus one row for each unmatched
right-hand row (extended with nulls on the left).
(LEFT) OUTER JOIN example ‘project-percentage assignment’:
NOTES
SELECT assign.project, projects.name, assign.percentage
FROM projects LEFT OUTER JOIN assign
ON projects.id = assign.project ;

project | name | percentage


-+ + -
1 | compiler | 10
1 | compiler | 60
1 | compiler | 30
2 | xpaint | 50
2 | xpaint | 50
3 | game | 70
3 | game | 30
| perl |
(8 rows)

2.13 DIVISION
The division is written as R ÷ S. It is a binary operation. The result has restrictions
of tuples in R to the attribute names unique to R. These are attributes in the header
of R but not in the header of S, and that all combinations with tuples in S are present
in R. For example, the following two tables named Completed and DBProject are
shown along with their division:

Completed

Student Task

Farhan Database1
DBProject Completed ÷ DBProject
Farhan Database2
Task Student
Farhan Compiler1
Database1 Farhan
Heena Database1
Database2 Sona
Heena Compiler1

Sona Database1

Sona Database2

If DBProject has all the tasks of the Database project, then the division of
table named Completed by table DBProject contains all students who have completed
the Database project.

Self-Instructional Material 69
Relational Model and Semantics of the division is expressed mathematically as:
Query Languages

where {a1,...,an} is the set of attribute names unique to R and t[a1,...,an] is the
NOTES restriction of t to this set. The requirement is that the attribute names in the header
of S be a subset of R otherwise the result of the division operation will be empty.

2.14 SUMMARY
In this unit, you have learned about integrity constraints over relations. These provide
a way of ensuring that changes made to a database by authorized users do not result
in a loss of data consistency. An integrity constraint can be any arbitrary predicate
applied to the database. Data integrity allows defining of certain data quality
requirements that the data in the database needs to meet. In this context, primary
key, foreign key, general constraints and additional integrity constriants were also
explained.
You have also learned about the E–R model, which is a data modelling
technique that creates a graphical repesentation of entities and the relationships
between entities within an information system. It views the real world as a set of
basic objects (entities) and relationships among these objects. It is intended primarily
for the database design process by allowing the specification of an enterprise scheme.
It represents the overall logical structure of the database.
Translating relationship sets with participation constraints, translating weak
entity sets and translating class hierarchies are the other important topics that you
have learned in this unit. The participation constraint specifies whether the existence
of an entity is dependent on the existence of another entity to which it must be
related. There are two types of participation constraints—total and partial.
You have also learned about translating E–R diagrams with aggregation.
Aggregation is used to deal with a relationship set as an entity set for participation
with other relationships. It is a special kind of association that specifies a whole or
part relationship between the aggregate (whole) and a component part.
In this unit, you were also introduced to query languages and relational algebra.
An information retrieval query language makes queries into a database and is a
language where definition of semantics of the query are not according to a precise
formal syntax. It is based on an interpretation of the most suitable results of the
query put to the information base. Relational algebra is widely used in computer
science. It is based on algebra of sets and is an extension of first-order logic. It is
concerned with a set of relations associated with operators which operate on one or
more relations, yielding another relation. The other important topics that you have
learned in this unit are set operations, naming joins and division.

70 Self-Instructional Material
Relational Model and
2.15 KEY TERMS Query Languages

• Integrity constraints: These provide a way of ensuring that changes made


to the database by authorized users do not result in a loss of data consistency. NOTES
• Primary key: It is an ordered file whose records are of fixed length with two
fields—primary key and pointer to a disk block.
• Foreign key: It is an attribute of one relation whose values are required to
match those of the primary key of some relation.
• Data integrity: It refers to the accuracy and consistency of the data and
defines the relationally correct state for a database.
• E–R model: It is a modelling technique that creates a graphical representation
of the entities and the relationships between entities.
• Participation constraint: It specifies whether the existence of an entity is
dependent on the existence of another entity to which it must be related.
• Aggregation: It is used to deal with a relationship set as an entity set for
participation with other relationships. It is a special kind of association that
specifies a whole or part relationship between the aggregate (whole) and a
component (part).
• Query languages: These languages make queries into databases as well as
information systems. In fact, information system is also a special kind of
database for storing information that is retrieved using a query language.
• Relational algebra: It is based on the algebra of sets and is an extension of
first-order logic. It is concerned with a set of relations, associated with
operators which operate on one or more relations yielding another relation.
• Joins: SQL joins are used to query data from two or more tables based on a
relationship between certain columns in these tables.

2.16 ANSWERS TO ‘CHECK YOUR PROGRESS’


1. A check constraint allows the stating of a minimum requirement for the value
in a column. If more complicated requirements are desired, an insert trigger
must be used. It is also possible to state a check constraint that checks the
value of more than one column.
2. A set of attributes that uniquely identify the rows/tuples in a database is referred
to as a logical key. The logical key in a data model does not necessarily
become a physical key in a database. Super key, candidate key, primary key
and alternate key are the logical keys common to all conceptual modelling.
Instances of objects/records/tuples/entities are instances that are uniquely
identified by all these keys. Foreign keys are logical keys specific to relational
modelling. In a logical sense, they link relations.
3. Integrity constraints on a database may be divided into two types:
• Static integrity constraints: These constraints define the valid states of
the data, and they include designations of primary keys, etc.
• Dynamic integrity constraints: These constraints define the side effects
of various kinds of transactions (e.g., insertions and deletions). Self-Instructional Material 71
Relational Model and 4. Any ‘thing’, that is, an object, a being, or an event that has an independent
Query Languages
existence in the real world is called an entity. The entity may have a physical
existence in the real world, such as in the case of a house, a bicycle or an
employee. It may also be an object with a conceptual existence, for example,
NOTES a company, a job or a university course.
5. Attributes of an entity set may be divided into the following two groups:
• Identifiers: A set of one or more attributes that uniquely identifies an
instance of an entity. These are also termed as key attributes; For example,
employee code attribute of the EMPLOYEE entity set.
• Descriptors: These attributes provide a non-unique characteristic of an
entity instance. These are said to be non-key attributes. For example,
employee name, salary, date of join, etc., attributes of the EMPLOYEE
entity set.
6. This refers to the minimum number of entities that can exist on one side of
the relationship. The mapping cardinality of a relationship set depends on the
real-world relationships. Minimum cardinality and maximum cardinality pairs
enclosed within parenthesis denote cardinality constraint.
7. Aggregation is used to deal with a relationship set as an entity set for
participation with other relationships. It is a special kind of association that
specifies a whole or part relationship between the aggregate (whole) and a
component (part). It is different from an ordinary association as it has the
following distinguishing characteristics:
• It is an asymmetric relationship.
• It is a transitive relationship.
• It implies stronger coupling.
• It propagates the functions such as copy, delete, etc.
8. It is a query language that defines rules for retrieval from deductive databases.
Syntactically, it is a subset of Prolog, which is a language for logic
programming associated with artificial intelligence and computational
linguistics. This term was used by a group of researchers, in the mid 1980s,
working for the development of a database theory.
9. There are six primitive operators of relational algebra, proposed by Codd.
These are:
(i) Selection
(ii) Projection
(iii) Cartesian product (also called the cross product or cross join)
(iv) Set union
(v) Set difference
(vi) Rename
10. A natural join is relational counterpart of logical AND and has great
importance. The natural join permits combination of relations associated by
a foreign key.

72 Self-Instructional Material
Relational Model
2.17 QUESTIONS AND EXERCISES andQuery
Languages

Short-Answer Questions
1. Write a short note on the concept of key. NOTES
2. What are general constraints?
3. What do you understand by translating?
4. Write short notes on any two: QL, MDX, Entity Relationship Role-Oriented
Language.
5. Briefly explain primitive operations.
6. What are set operations?
7. Write a note on binary joins.
8. What is renaming?
Long-Answer Questions
1. What are the integrity constraints over relations?
2. Explain the INSERT, DELETE and UPDATE operations.
3. Discuss the E–R model in comparison with the relational model.
4. How is the translating of relationship sets with participation constraints done?
5. Explain SQL joins.

2.18 LEARNING OUTCOMES


• Understand integrity constraints over relations
• Compare the E–R model with the relational model
• Translate relationship sets with participation constraints
• Translate weak entity sets
• Translate class hierarchies
• Translate E–R diagrams with aggregation
• Understand query languages

Self-Instructional Material 73
Structured Query Language

UNIT 3 STRUCTURED QUERY


LANGUAGE NOTES
Structure
3.0 Introduction
3.1 Unit Objectives
3.2 Overview of SQL
3.2.1 SQL in Oracle
3.2.2 Different Interfaces for SQL Commands in Oracle
3.2.3 SQL Statements
3.3 Basic Queries in SQL
3.4 Union
3.5 Intersect and Except
3.6 Nested Queries
3.7 Aggregate Operators
3.8 Null Values
3.9 Complex Integrity Constraints in SQL
3.10 Triggers and Views in SQL
3.11 Embedded SQL and Cursors
3.12 Summary
3.13 Key Terms
3.14 Answers to ‘Check Your Progress’
3.15 Questions and Exercises
3.16 Learning Outcomes

3.0 INTRODUCTION
In this unit, you will learn about structured query language (SQL) and get an overview of
it. It is a language that interacts with relational database management systems (RDBMSs).
There are many advantages of this language. This unit will also make you understand
the basic queries in SQL and UNION. A UNION, as defined in relational algebra, is the
set union of two relations.
You will also study about INTERSECT and EXCEPT. INTERSECT is the
intersection of two relations; it returns only those rows that are returned by both component
queries. It is the logical product of two sets. Another important topic dealt with in this
unit is nested queries. These are queries within a query. It is similar to the nesting of
conditional statements, one within the other.
You will learn about aggregate operators and null values. This unit will also examine
complex integrity constraints in SQL. The latter is a user-defined rule for a column or
set of columns that allows or disallows inserts, updates or deletes of a row, based on the
value it contains for the column or set of columns.
In this unit, you will also learn about triggers and views in SQL. A trigger is a
special kind of PL/SQL anonymous block. There are database triggers, dropping triggers,
etc. Finally, you will learn about embedded SQL and cursors.

Self-Instructional Material 75
Structured Query Language
3.1 UNIT OBJECTIVES
After going through this unit, you will be able to:
NOTES • Understand structured query language
• Explain how basic queries are made in SQL
• Understand union, intersect and except
• Describe nested queries and aggregate operators
• Explain null values
• Describe complex integrity constraints in SQL
• Understand triggers and views in SQL
• Explain embedded SQL and cursors

3.2 OVERVIEW OF SQL


SQL is a language used for interacting with RDBMS. The prototype for SQL was
originally developed by IBM, based on E.F. Codd’s paper ‘A Relational Model of Data
for Large Shared Data Banks’. In 1979, ORACLE, an SQL product was released.
Today, SQL has become a very important relational database management system.
SQL does not depend on the underlying database structure and many different
versions of SQL exist. However, it is the current industry standard query language
for organizing, managing and retrieving data/information from databases. It is not
just a query language. It is not only used for retrieving data but also for managing
all the tasks of database management systems (DBMS) including:
• Data definitions
• Data manipulation
• Access control
• Data sharing
Despite the name, SQL does not provide constructs of languages like C or Pascal.
It lacks branching (IF) and looping (GOTO, DO and FOR) statements. Some database
vendors like ORACLE give extensions to SQL to achieve branching, looping, etc. Oracle
utilizes a version of SQL known as PL/SQL. However, they do not form part of the SQL
standard. SQLcan, however, be incorporated into other programming languages known
as Embedded SQL.
Advantages of SQL
The main advantages of SQL are as follows:
• Vendor portability/independence: Generally, SQL permits the user to change
the given brand of database and DBMS without rewriting the SQL code.
• Ease of use: SQL is a ‘declarative language’ (non-procedural). Thus, as
compared to other programming languages, SQL is easy to use. High-level
programming language like C, COBOL, PASCAL, etc., requires the
programmer to convert the program logic into code and use the data structures
that hold the data. In SQL, the programmer only indicates what data is required,

76 Self-Instructional Material
but it is not needed to indicate how to get it. The given DBMS examines the SQL Structured Query Language
and plans the way to get the information needed.
• Language for all users: SQL could be applied by all types of users, like DBA
application programmmers, naïve users and others.
NOTES
• Standardized: The standard is established byAmerican National Standards Institute
(ANSI—an organization that approves certain standards in many different
industries). Many vendors now use the latest SQL 99 version. In 1987, the
International Standards Organization (ISO) recognized the ANSI SQL standard
as the international standard. The standard was revised again in 1992 and was
called SQL 92. The newest standard is now called SQL 99; it is also referred to
as SQL 3.
• Dynamic data definition: For embedded and multi-level database queries, SQL
gives advanced data processing commands.
SQL has become universally acceptable with many commercial relational DBMS
products, such as ORACLE, DB2, INGRES, SYBASE and others due to these factors.
3.2.1 SQL in Oracle
Like other RDBMS packages, SQL in Oracle helps certain elementary functions to
manipulate, control and define it. It also helps in defining, storing and managing data in a
centralized area so that other users can share them. Some terminologies need to be
defined when working with Oracle. A database is a collection of a defined group of
database objects like tables, indexes, tablespaces, storage areas etc. The physical structure
of the data sets used to put up the data is defined by a tablespace—a database object.
A table is another database object. Each table comprises a set of rows (horizontally)
and columns (vertically) that symbolize the conceptual view of the data to be stored. A
column is a single non-decomposable data element in a table. A view is a virtual table.
Generally, it neither stores data nor exists physically. All the tables contained in it are
owned by the tablespace. It also owns the indexes, views and columns defined for
them.
When a view is accessed, the SQL statement that defines it is executed to get
back the desired data. To efficiently access the data stored in a table, indexes are used.
Rather than its physical sequence, an index places table data in its logical sequence.
When a query is issued with one or more tables as inputs, SQL outputs a single
table known as a result set. The result table is composed of many columns and rows. It
also contains a single value taken from one column and one row. The most common
SQL operation is taking out data from a database. To issue a query a database retrieval
is called a query the SELECT command is used. A specified subset of rows from a
table is retrieved by the selection operation, and when selecting information from a table,
some primary keywords are used. These are WHERE, SELECT, FROM and ORDER
BY. FROM and SELECT are employed in every query to choose data from tables.
Forms of SQL
There are two forms of SQL:
• Interactive
• Embedded

Self-Instructional Material 77
Structured Query Language Interactive SQL operates on a database to produce output for user demand.
In embedded SQL, SQL commands can be put inside a program written in some
other language (called host language) like C, C++, etc. Data is passed to a program
environment through SQL. The combined source code is accepted by a special SQL
NOTES precompiler and, along with other tools, it is converted into an executable program.
Types of SQL commands
SQL commands are of various types to suit different purposes.
The primary types are as follows:
• Data Definition Language (DDL)
• Data Manipulation Language (DML)
• Data Query Language (DQL)
• Data Control Language (DCL)
• Transactional control language (TCL)
Data Definition Language (DDL)
Data Definition Language (DDL) is a part of SQL that allows a database user to create
and restructure database objects, such as the creation or deletion of a table. Some of the
most fundamental DDL commands include the following:
CREATE
ALTER
DROP

Data Manipulation Language (DML)


Data Manipulation Language (DML) is the part of SQL used to manipulate data within
objects of a relational database. There are three basic DML commands:
INSERT
UPDATE
DELETE

Data Query Language (DQL)


Though comprising only one command, Data Query Language (DQL) is the most
concentrated focus of SQL for modern relational database users. The base command
is as follows:
SELECT

This command, accompanied by many options and clauses, is used to compose


queries against a relational database. Queries, from simple to complex, from vague
to specific, can be easily created. A query is an inquiry to the database for information.
A query is usually issued to the database through an application interface or via a
command line prompt.
It is to be noted that, according to some authors, the SELECT command may be
treated as DML as a query is nothing but a part of DML that is used for retrieval of data
from a database.
Data Control Language (DCL)
Data control commands in SQL allow us to control access to data within the database.
These DCL commands are normally used to create objects related to user access and

78 Self-Instructional Material
also control the distribution of privileges among users. Some data control commands are Structured Query Language
as follows:
GRANT
REVOKE
NOTES
Transactional Control Languages (TCL)
In addition to the previously introduced categories of commands, the following commands
allow the user to manage database transactions:
COMMIT
ROLLBACK
SAVEPOINT
SET TRANSACTION

3.2.2 Different Interfaces for SQL Commands in Oracle


To begin with, ORACLE SQL, an ORACLE database account and access to database
objects are required by an user. As with most Oracle tools, there will be three prompts
for Password, Username and optionally, a host string showing the database to connect
to. The database administrator can also give this information. For instance, SCOTT
as username and TIGER as password may be used.
There are four interfaces provided by Oracle:
• SQL *PLUS (available in all versions of Oracle)
• SQL *PLUS command line interface (available in all versions of Oracle)
• SQLPlus Worksheet (introduced in ORACLE 8i)
• iSQL *PLUS (introduced in ORACLE 9i)
Oracle’s SQL*Plus, a command line interface, permits a user to type and
execute SQL statements directly. It formats database output and saves often used
commands. SQL*Plus also invokes from other Oracle tools or the operating system
prompt.
3.2.3 SQL Statements

Directions to write SQL statements


• SQL statements are insensitive to case.
• They can be of one or more lines.
• SQL keywords cannot be split across lines or abbreviated.
• They are generally entered in uppercase; all other words, such as column names
and table are entered in lowercase. However, this is not a rule.
• SQL keywords are typically aligned in the first column.
• SQL statements are terminated with a semi-colon.
Some more information about SQL and ORACLE SQL in particular are as follows
A NAME for a view, table, synonym, index, column or user variable MUST
• Start with a letter
• Consist of only the characters A–Z, 0–9, _, $, and #
• Not be the same as an ORACLE reserved word

Self-Instructional Material 79
Structured Query Language • Be 1 to 30 characters long (Database names need not exceed 8 characters)
• Not consist of a quotation mark
Notations used in the syntax of SQL commands
NOTES In this unit, the lowercase is used to identify names or conditions entered by the user, and
uppercase is used to identify SQL keywords.
• User-given data or object name or expression or condition is enclosed within
a “<‘and’>”
• Square brackets (‘[]’) identify optional items. Do not include the brackets
when we enter a query.
• A vertical bar (‘|’) indicates a choice and underlining indicates a default.
• Ellipses (‘...’) are used to specify items that may repeat.

3.3 BASIC QUERIES IN SQL

Arithmetic Expression in SELECT


It is possible to include arithmetic expressions involving column, literals with +, “, * and
/ operators. Expressions on NUMBER and DATE data types can be used by means of
arithmetic operators.
Query: Display the employee name, salary and annual salary
SELECT ename, salary, salary * 12 FROM emp;

Output:
ENAME SALARY SALARY*12
KOUSHIK GHOSH 5000 60000
JAYANTA DUTTA 3500 42000
HARI NANDAN TUNGA 4000 48000
JAYANTA GANGULY 6000 72000
RAJIB HALDER 4000 48000
JISHNU BANERJEE 6500 78000
RANI BOSE 3000 36000
GOUTAM DEY 5000 60000
PINAKI BOSE 5500 66000

9 rows selected.
Operator precedence
• Precedence over addition and subtraction is taken by multiplication and division.
• The same priority operators are evaluated from left to right.
• Parentheses are used to clarify statements and to force prioritized evaluation.
Column Aliases in SELECT
To provide an alternate name to a column or expression, aliases are used that appear as
column heading in the result set. For the above query, the following command can be
used:
SELECT ename “ Employee Name”, salary “Salary”, salary*12
“Annual salary” FROM emp;

80 Self-Instructional Material
Output: Structured Query Language

Employee Name Salary Annual salary


KOUSHIK GHOSH 5000 60000
JAYANTA DUTTA 3500 42000
NOTES
HARI NANDAN TUNGA 4000 48000
JAYANTA GANGULY 6000 72000
RAJIB HALDER 4000 48000
JISHNU BANERJEE 6500 78000
RANI BOSE 3000 36000
GOUTAM DEY 5000 60000
PINAKI BOSE 5500 66000

9 rows selected.
Column alias can also be given using ‘AS’ followed by alias name after column
name. The only restriction is that alias should not contain spaces.
SELECT ename AS EmployeeName, salary AS Salary, salary*12
AS Annual_salary FROM emp;

Output:
EmployeeName Salary Annual_salary
KOUSHIK GHOSH 5000 60000
JAYANTA DUTTA 3500 42000
HARI NANDAN TUNGA 4000 48000
JAYANTA GANGULY 6000 72000
RAJIB HALDER 4000 48000
JISHNU BANERJEE 6500 78000
RANI BOSE 3000 36000
GOUTAM DEY 5000 60000
PINAKI BOSE 5500 66000

9 rows selected.
|| Operator is used to concatenate character expressions, e.g.,
SELECT ‘Designation of ‘ || ename || ‘ is ’ || desg
from emp;

Output:
‘DESIGNATION OF’ || ENAME ||’IS’|| DESG
Designation of KOUSHIK GHOSH is SYSTEM ANALYST
Designation of JAYANTA DUTTA is PROGRAMMER
Designation of HARI NANDAN TUNGA is PROGRAMMER
Designation of JAYANTA GANGULY is ACCOUNTANT
Designation of RAJIB HALDER is CLERK
Designation of JISHNU BANERJEE is SYSTEM MANAGER
Designation of RANI BOSE is PROJECT ASSISTANT
Designation of GOUTAM DEY is PROGRAMMER
Designation of PINAKI BOSE is PROGRAMMER

9 rows selected.
Self-Instructional Material 81
Structured Query Language Removing Duplicates—DISTINCT
Strictly speaking, it is incorrect to refer to SQL tables as relations as SQL queries might
result in duplicate tuples. SQL gives a mechanism to do away with duplicates. It is done
by specifying the keyword DISTINCT after SELECT.
NOTES
The default display of queries is all rows, including duplicate rows.
SELECT dno FROM emp;

Output:
DNO
D01
D01
D02
D03
D03
D02
D01
D01
D02

9 rows selected.
The above command will display all the dept numbers including duplicate
dept numbers.
Instead of the above command, if we use:
SELECT DISTINCT dno FROM emp;

Output:
DNO
D01
D02
D03

Notice that the output contains non-duplicated department numbers. It is to


be noted that we can use only one DISTINCT in the select column list.
Filtering Rows by Conditional Selection—WHERE Clause
The WHERE clause specifies the search condition and join criteria on the data that are
selected. If a row fulfils the search conditions, it is returned as part of the result set.
The predicate employed in the WHERE clause is a simple comparison that uses
the following:
• Relational Operators
• BETWEEN …. AND ….
• IN
• IS NULL
• IS NOT NULL
• LIKE

82 Self-Instructional Material
WHERE Clause and Relational Operator Structured Query Language

These may be as follows:

NOTES

The relational operator condition is fulfilled when the expression on either side of
the relational operator satisfies the relation set up by the operator. There are two other
differences between the SQL operators and operators used in C/C++. First, the equality
comparison and assignment operators, both represented by a single equal sign (=), are
the same in SQL. Ambiguity is resolved by context. Second, the standard SQLinequality
operator is represented by angle brackets (<>), though Oracle also supports the C/C++-
style (!=).
Comparisons with numeric column
Query: Display all employees getting salary over Rs 5000
SELECT * FROM emp WHERE salary > 5000;

Output:

ECODE ENAME SALARY DNO DESG DT_JN


E04 JAYANTA GANGULY 6000 D03 ACCOUNTANT 12-SEP-96
E06 JISHNU BANERJEE 6500 D02 SYSTEM MANAGER 19-SEP-96
E09 PINAKI BOSE 5500 D02 PROGRAMMER 26-AUG-94

Comparisons with character and date column


String comparisons are performed using the numeric value of the characters that are
determined by the database character set. The character set is generally compatible
with ASCII. The decimal values of the numbers and letters, for example, are found
in Table 3.1.
Table 3.1 Decimal Values of ASCII Numbers and Letters

Character range Decimal value range


0–9 48–39
A–Z 65–90
a–z 97–122
The most useful simple comparisons for strings can be done with the equality and
inequality operators “=” and “<>”. They are used to select or omit specific rows. It is to
be noted that
o Date values and character strings are placed within single quotation marks.

Self-Instructional Material 83
Structured Query Language o Date values are format-sensitive and character values are case-sensitive .
o The default date format is ‘DD-MON-YY’.
Query: List the details of the employee whose name is ‘JAYANTA DUTTA’
SELECT * FROM emp WHERE ename = ‘JAYANTA DUTTA’
NOTES
Output:
ECODE ENAME SALARY DNO DESG DT_JN
E02 JAYANTA DUTTA 3500 D01 PROGRAMMER 15-JAN-94

Query: Find the code, name and salary of employees of the department ‘D01’
SELECT ecode, ename, salary FROM emp WHERE dno=’D01’;
Output:
ECODE ENAME SALARY
E01 KOUSHIK GHOSH 5000
E02 JAYANTA DUTTA 3500
E07 RANI BOSE 3000
E08 GOUTAM DEY 5000

Query: List the information of the employees who are not working in the
department ‘D01’
SELECT * FROM emp WHERE dno<>’D01';

Output:
ECODE ENAME SALARY DNO DESG DT_JN
E03 HARI NANDAN TUNGA 4000 D02 PROGRAMMER 01-JUL-95
E04 JAYANTA GANGULY 6000 D03 ACCOUNTANT 12-SEP-96
E05 RAJIB HALDER 4000 D03 CLERK 07-OCT-95
E06 JISHNU BANERJEE 6500 D02 SYSTEM MANAGER 19-SEP-96
E09 PINAKI BOSE 5500 D02 PROGRAMMER 26-AUG-94

N.B: The comparison operators, when used with strings, must match case
correctly. There are SQL functions that are employed to convert strings in the database
to every uppercase or lowercase to find a case-insensitive match. The functions are
as follows:
• LOWER (CHAR) returns CHAR, with each letter in the lowercase
• UPPER (CHAR) returns CHAR, with each letter in uppercase
Query: Find the code, name and salary of employees of the department ‘D01’
SELECT ecode, ename, salary FROM emp WHERE
UPPER(dno)=’D01’;

The equivalent SQL statement is as follows:


SELECT ecode, ename, salary
FROM emp WHERE dno=’D01’ OR dno=’d01’;

Date and time comparisons are similar to comparisons with numbers because
underlying every date and time is a number. In Oracle, there is a single data type,
DATE, that represents both date and time with a single number. If we want to compare
a date column with another date, we can use a string literal in the default date format
and Oracle performs the conversion for:

84 Self-Instructional Material
Query: List the employees who joined after 20th September, 1996 Structured Query Language
SELECT * FROM emp WHERE dt_jn>’20-SEP-96';

Output:
ECODE ENAME SALARY DNO DESG DT_JN NOTES
E07 RANI BOSE 3000 D01 PROJECT ASSISTANT 17-JUN-97
E08 GOUTAM DEY 5000 D01 PROGRAMMER 23-OCT-97

The current date and time can be obtained by using the pseudo column,
SYSDATE. Oracle automatically stores information, such as row numbers and
row descriptions, and Oracle is directly accessible, i.e., not through tables. This
information is contained within pseudo columns. These pseudo columns can be
retrieved in queries and can be included in queries, that select data from tables.
Available pseudo columns in oracle include the following:
• ROWNUM: Row number. Order number in which a row value is retrieved.
• ROWID: Physical row (memory or disk address) location, i.e., unique row
identification.
• SYSDATE: System or today’s date.
• UID: User identification number indicating the current user.
• USER: Name of the user currently logged in.
Oracle table ‘DUAL’
Oracle automatically creates a DUAL table along with the data dictionary. DUAL is in the
schema of the user SYS, however, it is accessible by the name DUAL to all users. It has
one column, DUMMY, defined to be VARCHAR2(1), and contains one row with a value
‘X’. Selecting from the DUAL table is useful for computing a constant expression with
the SELECT statement. Since, DUAL has only one row, the constant is returned only
once. Each Oracle account has access to a table called dual. We can query against this
table to get the current account, system date/time, and execute mathematical functions.
The following example shows how to obtain the username used when the user
logged into Oracle:
SELECT USER FROM DUAL;

Output:
USER
——————
MANAS

SELECT SYSDATE FROM DUAL;

Output:
SYSDATE
——————————
15-APR-05
SELECT POWER(4,3) FROM DUAL;

Output:
POWER(4,3)
—————
64

Self-Instructional Material 85
Structured Query Language Query: List all of the names of employees in the EMP table, the date they
joined and the current system date
SELECT ename, dt_jn,SYSDATE FROM emp;

Output:
NOTES
ENAME DT_JN SYSDATE
KOUSHIK GHOSH 10-MAR-93 15-APR-05
JAYANTA DUTTA 15-JAN-94 15-APR-05
HARI NANDAN TUNGA 01-JUL-95 15-APR-05
JAYANTA GANGULY 12-SEP-96 15-APR-05
RAJIB HALDER 07-OCT-95 15-APR-05
JISHNU BANERJEE 19-SEP-96 15-APR-05
RANI BOSE 17-JUN-97 15-APR-05
GOUTAM DEY 23-OCT-97 15-APR-05
PINAKI BOSE 26-AUG-94 15-APR-05

9 rows selected.
Suppose there is one of the employees, ASOK BASU, who has joined today.
Then SYSDATE and dt_jn are the same. But if we query for the employees joined
today using SYSDATE, we will find no results. When the dt_jn were entered using
the default date format that does not have a time part, the time is defaulted to 12:00:00
a.m. SYSDATE, which is the current date and time, does have a time part even
though only the date part is displayed by default.
SELECT * FROM emp WHERE dt_jn=SYSDATE;

The above command will display the message ‘no rows selected’.
The TRUNC function is used to remove the time part of an Oracle DATE.
This provides us a way to compare the dates in the table with today’s date,
disregarding any hours, minutes or seconds.
SELECT * FROM emp WHERE dt_jn=TRUNC(SYSDATE);

Similarly, suppose the date of joining column is filled with SYSDATE as follows:
INSERT INTO emp
VALUES (‘E11’, ‘KUNTAL GHOSH’,5000, ‘D03’, ‘JR ASSISTANT’,
SYSDATE);
Say SYSDATE is 20-JUL-05.

Sometimes, to recover data based on something like:


SELECT * FROM emp WHERE dt_jn = ‘20-JUL-05’;

When the condition is satisfied, there will be a output in the tabular form displaying
the joining date equal to the system date.
WHERE Clause with Logical Operator
AND, OR, NOT are known as logical operators.
Operator Meaning
AND Returns TRUE if both component conditions are TRUE
OR Returns TRUE if either component condition is TRUE
NOT Returns TRUE if the following condition is FALSE

86 Self-Instructional Material
SQLhas three logical values, TRUE, FALSE and NULL. Every condition, simple Structured Query Language
or compound, evaluates to one of these three values. In a WHERE clause, if this condition
evaluates to TRUE, the row is returned if it is part of a SELECT statement. If it is
FALSE or NULL, it is not.
NOT NOTES
FALSE and NULL are not the same. When FALSE is negated, TRUE is obtained. But
when NULL is negated, we still get NULL. Table 3.2 is the truth table for
NOT:
Table 3.2 Truth Table for NOT

NOT
TRUE FALSE
FALSE TRUE
NULL NULL
Query: List the employees who do not work in the department ‘D01’
SELECT * FROM emp WHERE dno <> ‘D01’;

The equivalent command using logical NOT is as follows:


SELECT * FROM emp WHERE NOT dno =‘D01’;

Output:
ECODE ENAME SALARY DNO DESG DT_JN
E03 HARI NANDAN TUNGA 4000 D02 PROGRAMMER 01-JUL-95
E04 JAYANTA GANGULY 6000 D03 ACCOUNTANT 12-SEP-96
E05 RAJIB HALDER 4000 D03 CLERK 07-OCT-95
E06 JISHNU BANERJEE 6500 D02 SYSTEM MANAGER 19-SEP-96
E09 PINAKI BOSE 5500 D02 PROGRAMMER 26-AUG-94

AND
AND is used to combine two conditions. Table 3.3 is the truth table for AND:
Table 3.3 Truth Table for AND

AND TRUE FALSE NULL


TRUE TRUE FALSE NULL
FALSE FALSE FALSE NULL
NULL NULL NULL NULL
The following is a query combining conditions with AND:
Query: List the employees of the department ‘D01’ who are getting salary over
Rs 4000
SELECT * FROM EMP WHERE dno =’D01’ AND salary>4000;

Output:
ECODE ENAME SALARY DNO DESG DT_JN
E01 KOUSHIK GHOSH 5000 D01 SYSTEM ANALYST 10-MAR-93

Self-Instructional Material 87
Structured Query Language
3.4 UNION
The outer join can be used on only one side of the join condition. Using the outer join on
NOTES both sides will cause an error to be issued. Since Oracle will issue an error, then another
way to get an outer join is to use UNION.
SELECT ecode, ename, D.dno, dname
FROM emp E, dept D
WHERE E.dno(+)=D.dno
UNION
SELECT ecode, ename, E.dno, dname
FROM emp E, dept D
WHERE E.dno=D.dno(+);

Output:
ECODE ENAME DNO DNAME
E01 KOUSHIK GHOSH D01 PROJECT
E02 JAYANTA DUTTA D01 PROJECT
E03 HARI NANDAN TUNGA D02 RESEARCH
E04 JAYANTA GANGULY D03 PERSONNEL
E05 RAJIB HALDER D03 PERSONNEL
E06 JISHNU BANERJEE D02 RESEARCH
E07 RANI BOSE D01 PROJECT
E08 GOUTAM DEY D01 PROJECT
E09 PINAKI BOSE D02 RESEARCH
E10 RAJA SEN ... ...
... ... D04 EDUCATION

11 rows selected.
Notice the last two rows that are in bold fonts. The first query gets the
employees of all departments including department ‘D04’ in which no employee
has been assigned. The second query gets all the employees including employee ‘E10’
who has not been assigned to any department. In the case of the first query, any dept
without employee would return NULL values for the two columns, ecode and ename. In
the case of the second query, any employee without dno would return NULL values for
the two columns, dno and dname. The UNION puts both together and eliminates any
repetitions.
UNION, as defined in relational algebra, is the set union of two relations. The
UNION returns all the distinct rows returned by either of the queries it is applied to.
Since a mathematical set contains no duplicates, the set union also contains no duplicates,
and any row appearing in both results must appear only once in the result.
As the UNION operation eliminates duplicates, DISTINCT, while syntactically
correct, is not necessary.
Query: List all employees who are working in department ‘D01’ or ‘D03’
SELECT * FROM emp WHERE dno=’D01’
UNION
SELECT * FROM emp WHERE dno=’D03’

88 Self-Instructional Material
Output: Structured Query Language

ECODE ENAME SALARY DNO DESG DT_JN E01


KOUSHIK GHOSH 5000 D01 SYSTEM ANALYST 10-MAR-93
E02 JAYANTA DUTTA 3500 D01 PROGRAMMER 15-JAN-94
NOTES
E04 JAYANTA GANGULY 6000 D03 ACCOUNTANT 12-SEP-96
E05 RAJIB HALDER 4000 D03 CLERK 07-OCT-95
E07 RANI BOSE 3000 D01 PROJECT ASSISTANT 17-JUN-97
E08 GOUTAM DEY 5000 D01 PROGRAMMER 23-OCT-97

6 rows selected.
This UNION could have been done more concisely by using an IN clause.
SELECT * FROM emp WHERE dno IN(’D01’,’D03’);

However, it is not easy if the two component queries are from different tables.
UNION ALL
The UNION ALL operator merges the result sets of two component queries. This
operation returns rows retrieved by either of the component queries without eliminating
duplicates.
Query: List code, name and designation of all employees who are working in
the project ‘P01’ or ‘P03’
SELECT ecode, ename, desg FROM emp WHERE ecode IN (SELECT
ecode FROM assign WHERE pid=’P01’)
UNION ALL
SELECT ecode, ename, desg FROM emp WHERE ecode IN (SELECT
ecode FROM assign WHERE pid=’P03’);

Output:
ECODE ENAME DESG
E01 KOUSHIK GHOSH SYSTEM ANALYST
E02 JAYANTA DUTTA PROGRAMMER
E08 GOUTAM DEY PROGRAMMER
E01 KOUSHIK GHOSH SYSTEM ANALYST
E03 HARI NANDAN TUNGA PROGRAMMER
E06 JISHNU BANERJEE SYSTEM MANAGER
E09 PINAKI BOSE PROGRAMMER

7 rows selected.
Notice that because the ALL clause was used, the duplicates were retained in the
result. The UNION eliminates the duplicates.
SELECT ecode, ename, desg FROM emp WHERE ecode IN (SELECT
ecode FROM assign WHERE pid=’P01’)
UNION
SELECT ecode, ename, desg FROM emp WHERE ecode IN (SELECT
ecode FROM assign WHERE pid=’P03’);

Self-Instructional Material 89
Structured Query Language Output:
ECODE ENAME DESG
E01 KOUSHIK GHOSH SYSTEM ANALYST
E02 JAYANTA DUTTA PROGRAMMER
NOTES
E03 HARI NANDAN TUNGA PROGRAMMER
E06 JISHNU BANERJEE SYSTEM MANAGER
E08 GOUTAM DEY PROGRAMMER
E09 PINAKI BOSE PROGRAMMER

6 rows selected.

3.5 INTERSECT AND EXCEPT


The INTERSECT, as defined in relational algebra, is the set intersection of two relations.
It returns only those rows returned by both component queries. It is the logical product
of two sets. Since a mathematical set contains no duplicates, the set intersection contains
no duplicates, and any element appearing in both the sets must appear only once in the
result.
The EXCEPT returns the distinct data value from the left query (query on left side of
the operand) which does not exist in the right query (query on the right side of the
operand). In SQL 2005, using EXCEPT and INTERSECT, one can also simulate the
MERGE command in Oracle.
Example:
CREATE TABLE TEST1 (COL1 VARCHAR(10))
GO
CREATE TABLE TEST2 (COL1 VARCHAR(10))
GO
INSERT INTO TEST1 VALUES(’GEORGIA’)
INSERT INTO TEST1 VALUES(’ALABAMA’)
INSERT INTO TEST1 VALUES(’FLORIDA’)
INSERT INTO TEST1 VALUES(’NEWYORK’)
GO
INSERT INTO TEST2 VALUES(’GEORGIA’)
INSERT INTO TEST2 VALUES(’TEXAS’)
INSERT INTO TEST2 VALUES(’NEVADA’)
INSERT INTO TEST2 VALUES(’NEWYORK’)
GO

Now let us observe the result by running the query using EXCEPT command.
SELECT COL1 FROM TEST1 EXCEPT SELECT COL1 FROM TEST2
GO

This query will return following values. Data which exists in the left query but not
in the right query.
ALABAMA
FLORIDA

90 Self-Instructional Material
SELECT COL1 FROM TEST2 EXCEPT SELECT COL1 FROM TEST1 Structured Query Language

GO

The following things should be remembered when using this operator:


• Number of columns should be the same and columns should be exactly in the NOTES
same order in all the queries.
• Data types should be compatible for columns in the queries otherwise data-
type conversions will take place.
• The XML, text, ntext and image data types are not comparable.
• Column names in ORDER BY clause must refer to column names mentioned
in the left side query.
• They cannot be used in distributed partitioned view definitions or query
notifications.
• If you are looking at the execution plans, an EXCEPT will be denoted by a
left anti semi join, whereas an INTERSECT will be denoted by a left semi
join.
Query: Retrieve code of the employees who have been assigned to a project
SELECT ecode FROM emp
INTERSECT
SELECT DISTINCT ecode FROM assign;

Output:
ECO DE
——————
E01
E02
E03
E06
E07
E08
E09
————
7 rows selected.
The INTERSECT operation can be implemented using the EXISTS clause as
follows:
SELECT ecode FROM emp
WHERE EXISTS (SELECT * FROM assign WHERE
emp.ecode=assign.ecode);

MINUS
MINUS, as defined in relational algebra, is the set difference of two relations. It
returns only rows returned by the first component query but not by the second. It is the
difference between two sets. Since a mathematical set contains no duplicates, the set
difference contains no duplicates. Since the MINUS operation eliminates duplicates,
DISTINCT, while syntactically correct, is not necessary.
Query: Retrieve code of the employees who are not assigned in any project
SELECT ecode FROM emp
MINUS
SELECT DISTINCT ecode FROM assign;
Self-Instructional Material 91
Structured Query Language Output:
ECODE
—————
E04
E05
NOTES
The above query can be used in a subquery to list the employee details.
Query: Retrieve employee details who are not assigned in any project
SELECT * FROM emp
WHERE ecode IN (SELECT ecode FROM emp
MINUS
SELECT DISTINCT ecode FROM assign);

Output:
ECODE ENAME SALARY DNO DESG DT_JN
E04 JAYANTA GANGULY 6000 D03 ACCOUNTANT 12-SEP-96
E05 RAJIB HALDER 4000 D03 CLERK 07-OCT-95

The MINUS operation can be implemented using the NOT EXISTS clause as
follows:
SELECT ecode FROM emp
WHERE NOT EXISTS ( SELECT * FROM assign WHERE
emp.ecode=assign.ecode);

Output:
ECODE
—————
E04
E05

DIVIDE
In SQL, the division operation is not directly supported. It is, however, defined as a
component of Relational Algebra. According to Date, ‘DIVIDE takes two relations, one
binary and one unary, and builds a relation consisting of all values of one attribute of the
binary relation that match (in the other attribute) all values in the unary relation’. Using
subquery with the existence test, the DIVISION operation can be implemented. Consider
the following query:
Query: List the employees who are working on all the projects
SELECT * FROM emp E WHERE NOT EXISTS(SELECT * FROM project P
WHERE NOT EXISTS(SELECT * FROM assigns A
WHERE A.pid=P.pid AND E.ecode=A.ecode));
Alternatively, it can be written as follows:
SELECT * FROM emp WHERE ecode IN( SELECT ecode FROM
assign GROUP BY ecode HAVING COUNT(ecode)=(SELECT
COUNT(pid) FROM project));

Output:
ECODE ENAME SALARY DNO DESG DT_JN
E01 KOUSHIK GHOSH 5000 D01 SYSTEM ANALYST 10-MAR-93

92 Self-Instructional Material
ORDER BY clause in set operation Structured Query Language

If ORDER BY is required to be used in a query involving set operations, then it must be


placed at the end of the entire statement. The ORDER BY clause should appear only
once at the end of the compound query. The component queries cannot have individual NOTES
ORDER BY clauses.
SELECT dno, dname FROM dept WHERE city=’KOLKATA’
UNION
SELECT ecode, ename FROM emp WHERE dno=’D01’
ORDER BY dno;

Output:
ECODE ENAME
D01 PROJECT
D03 PERSONNEL
E01 KOUSHIK GHOSH
E02 JAYANTA DUTTA
E07 RANI BOSE
E08 GOUTAM DEY

6 rows selected.
Notice that the column name used in the ORDER BY clause of this query is
taken from the first component query. Specifying columns from the second
component query will generate an error as in the following example:
SELECT dno, dname FROM dept WHERE city=’KOLKATA’
UNION
SELECT ecode, ename FROM emp WHERE dno=’D01’
ORDER BY ecode;

Output:
ORDER BY ecode
*
ERROR at line 4:
ORA-00904: invalid column name

The solution is that the column should be specified in the ORDER BY clause
by column position as follows:
SELECT dno, dname FROM dept WHERE city=’KOLKATA’
UNION
SELECT ecode, ename FROM emp WHERE dno=’D01’
ORDER BY 1;

Self-Instructional Material 93
Structured Query Language Output:
ECODE ENAME
D01 PROJECT
D03 PERSONNEL
NOTES
E01 KOUSHIK GHOSH
E02 JAYANTA DUTTA
E07 RANI BOSE
E08 GOUTAM DEY

6 rows selected.
Precedence of Set Operators
If more than two component queries are combined using the set operators, then the
expression is evaluated from left to right.
SELECT ecode, ename FROM emp WHERE dno=’D01’
UNION
SELECT ecode, ename FROM emp WHERE ecode IN (SELECT
ecode FROM assign
WHERE pid=’P02’)
INTERSECT
SELECT ecode, ename FROM emp WHERE desg=’PROGRAMMER’;

Output:
ECODE ENAME
E02 JAYANTA DUTTA
E08 GOUTAM DEY

Parentheses may be used to impose a particular order of evaluation on the set


operators. The above query gives different result if a parenthesis is used. In that
case, the query within the parentheses is evaluated first. The result is then combined
with the component queries outside the parentheses.
SELECT ecode, ename FROM emp WHERE dno=’D01’
UNION
SELECT ecode, ename FROM emp WHERE ecode IN (SELECT
ecode FROM assign
WHERE pid=’P02’)
INTERSECT
SELECT ecode, ename FROM emp WHERE desg=’PROGRAMMER’);
Output:
ECODE ENAME
E01 KOUSHIK GHOSH
E02 JAYANTA DUTTA
E07 RANI BOSE
E08 GOUTAM DEY

94 Self-Instructional Material
Structured Query Language
3.6 NESTED QUERIES
Nested queries are queries within a query. These are subqueries. It is something like
nesting of conditional statements, one within the other. NOTES
Subqueries have similarity with chaining of the command, SELECT. Chaining
with SELECT combines one or more SELECT commands on the same level in a query.
But nested queries or subqueries allow embedding of one or more SELECT commands
inside the body of other queries.
They replace a constant (Scalar Subqueries).
They can return a list of values for a comparison.
They can replace a constant, but may vary, which is based on the row being
processed.
The following subquery expressions return Boolean values, true or false.
EXISTS (subquery)
The argument of EXISTS is a SELECT statement, chosen arbitrarily. This subquery
returns one or more rows. If at least one row is returned, the result of EXISTS is
TRUE, FALSE otherwise. The subquery may be variables from the surrounding
query, acting as constants during any one evaluation of the subquery.
One example is being given below:
SELECT col1 FROM tab1
WHERE EXISTS(SELECT 1 FROM tab2 WHERE col2 = tab1.col2);
The above example is also an inner join on col2, producing at most one row
as output row for each row of tab1, although there may be multiple matching rows
of tab2.
We take an example in which students have been given projects. It is named
as ‘Students in Projects’. The following nested queries are given:
SELECT name FROM stud WHERE EXISTS (SELECT 1 FROM assign
WHERE stud = stud.id ) ;
The result of command depends on what is stored in a file named ‘assign’
under the filed stud matching a particular stud.id as put in the nested query. A typical
result may be:
Name
———
Fajal
James
Lily
(3 rows)
[ NOT ] IN or IN [ NOT ]
This query word has two forms, scalar form and subquery form.

Self-Instructional Material 95
Structured Query Language IN [ NOT ] (scalar form)
This is used as:
expression IN (value[, ...])
NOTES The part on the right of IN, contains the list of scalar expressions which is TRUE,
if part to the left of IN, written here as expression, equals any of the right-hand
expressions.
IN [ NOT ] (subquery form)
expression IN (subquery)
The part on the right side of IN, is a parenthesized subquery. This must return
exactly one column. The expression on the left of IN is evaluated and compared to
every row of the subquery result. If any subquery row is found equal, the result of
IN is TRUE.
For example, a query “Is Fazal in Project 1? (—Yes)”:
SELECT id, name FROM stud WHERE id in ( SELECT stud FROM assign
WHERE id = 1 ) ;
id | name
——+——
1 | Fazal
(1 row)
ANY and SOME
These two query words are used in the following way:
expression operator ANY (subquery)
expression operator SOME (subquery)
The subquery on the right side of ANY is a subquery under parenthesis must
return exactly one column. The left part of ANY is evaluated and compared to each
row of the subquery result and uses the given operator, and it must yield a Boolean
result which is TRUE if any true result is obtained.
SOME and ANY are synonyms and IN is equivalent to = ANY.
ALL
It is written like:
expression operator ALL (subquery)
The right part of ALL is a subquery put under parenthesis, and it must return only
one column. The expression to the left of ALL is evaluated and then compared to each
row of the subquery result by using the given operator, and this must yield a Boolean
result which is TRUE if all rows yield TRUE. This also includes the special case when
the subquery returns no row.
NOT IN is equivalent to <> ALL.

96 Self-Instructional Material
Row-wise comparison Structured Query Language

The syntax for row-wise comparison is:


(expression, expression[, ...]) operator (subquery)
(expression, expression[, ...]) operator (expression, expression[, ...]) NOTES
The left part contains a list of scalar expressions. The right side can contain either
a list of scalar expressions of the same length, or it may contain a parenthesized subquery,
returning as many columns as expressions on the left-hand side. This subquery does not
return more than one row. It may return NULL, if no row is found. The left side is
evaluated and compared row-wise to the single subquery result row, or to the right-hand
expression list. Only = and <> operators are allowed in row-wise comparisons. The
result is TRUE if the two rows are equal or unequal, respectively, as the case may be.
For example, we give the following command to retrieve information on first
name, last name, city, state and age of persons from a file ‘friend’ which is ordered by
first name.
test=> SELECT * FROM friend ORDER BY firstname;
Following results are displayed:

firstname | lastname | city | state | age


Dinesh | Singh | Noida | UP | 24
Deepak | Jain | Lucknow | UP | 19
Neel | Chatterji | Asansol | WB | 27
Sandip | Roy | Burdwan | WB | 25
4 rows selected.
Nested queries can be made using SELECT, FROM, WHERE, ORDER BY,
as given below as example;
test -> SELECT f1.firstname, f1.lastname, f1.state
test -> FROM friend f1, friend f2
test -> WHERE f1.state <> f2.state AND
test -> f2.firstname = ’Deepak’ AND
test -> f2.lastname = ’Jain’
test -> ORDER BY firstname, lastname;

firstname | lastname | state


Dinesh | Singh | UP
Neel | Chatterji | WB
4 rows selected.
test -> SELECT f1.firstname, f1.lastname, f1.state
test -> FROM friend f1
test -> WHERE f1.state = f2 state
test (> SELECT f2.state
test (> FROM friend f2
test (> WHERE f2.firstname = ’Deepak’ AND
test (> f2.lastname = ’Jain’

Self-Instructional Material 97
Structured Query Language test (> )
test -> ORDER BY firstname, lastname;

firstname | lastname | state


NOTES Dinesh | Singh | UP
Deepak | Jain | UP
4 rows selected.
A subquery may represent either a fixed value or a correlated value, or a list of
values. Any number of queries can be used, and nested queries inside any number of
subqueries can be made. Subqueries can also be nested inside other subqueries.
Subqueries, in some cases just provide an alternative way to phrase a query. In
some cases, a nested query is the only way to get the desired result.

3.7 AGGREGATE OPERATORS


Aggregation is the process of consolidating multiple values into a single value. For example,
data can be collected on a daily basis and aggregated into a value for the week, the
weekly data can be aggregated into a value for the month, and so on. Aggregation
allows patterns in the data to emerge, and these patterns are the basis for analysis and
decision-making. When you define a data model with hierarchical dimensions, you are
providing the framework in which aggregate data can be calculated.
Analytic workspaces provide an extensive list of aggregation methods, including
weighted, hierarchical, and weighted hierarchical methods. The following are descriptions
of the basic aggregation operators:
• Average: Adds non-null data values, then divides the sum by the number of data
values that were added together.
• First Non-NA Data Value: Returns the first real data value.
• Last Non-NA Data Value: Returns the last real data value.
• Maximum: Returns the largest data value among the children of each parent.
• Minimum: Returns the smallest non-null data value among the children of each
parent.
• Nonadditive: Does not aggregate the data.
• Sum: Adds data values.
Aggregation is frequently called summarization, and aggregate data is called
summary data. While the most frequently used aggregation operator is Sum, there are
many other operators, such as Average, First, Last, Minimum, and Maximum. Oracle
OLAP also supports weighted and hierarchical methods. The following are some
simple diagrams showing how the basic types of operators work. Figure 3.1 shows
a simple hierarchy with four children and one parent value. Three of the children have
values, while the fourth is empty. This empty cell has a null or NA value. The Sum
operator calculates a value of (2 + 4 + 6) = 12 for the parent value.

98 Self-Instructional Material
Structured Query Language
12

2 4 6 NA NOTES

Fig. 3.1 Summary Aggregation in a Simple Hierarchy

The average operator calculates the average of all real data, producing an aggregate
value of ((2 + 4 + 6)/3) = 4, as shown in the Figure 3.2.

2 4 6 NA

Fig. 3.2 Average Aggregation in a Simple Hierarchy

The hierarchical operators include null values in the count of cells. In the following
figure, the hierarchical average operator produces an aggregate value of ((2 + 4 + 6
+NA)/4) = 3.

2 4 6 NA

Fig. 3.3 Hierarchical Average Aggregation in a Simple Hierarchy

Figure 3.3 shows hierarchical average aggregation in a simple hierarchy. This


figure shows a simple hierarchy with four children and one parent value. Three of the
children have values, while the fourth is empty. This empty cell has a null or NA value.
The hierarchical average operator calculates a value of 3 ((2 + 4 + 6 + 0)/4) for the
parent value.
The weighted operators use the values in another measure to generate weighted
values before performing the aggregation. Figure 3.4 shows how the simple sum of
12 changes to 20 by using weights ((3*2) + (2*4) + (NA*6) +(4*NA)).

20

2 4 6 NA

Weight

3 2 NA 4

Fig. 3.4 Weighted Sum Aggregation in a Simple Hierarchy Self-Instructional Material 99


Structured Query Language Figure 3.4 shows weighted sum aggregation in a simple hierarchy. It shows a
simple hierarchy with four children and one parent value. Three of the children have
values, while the fourth is empty. This empty cell has a null or NA value. Each cell has
a weight value assigned to it, which changes the values. The first cell has a value of 2
NOTES and a weight of 3, giving it a calculated value of 6. The second cell has a value of 4 and
a weight of 2, giving it a calculated value of 8. The third cell has a value of 6 and a
weight of NA, giving it a calculated value of 6 because an NA weight value is calculated
as a 1. The fourth cell is NA with a weight of 4, giving it a calculated value of NA. The
calculated values have a total sum of 20.
Take an example of mixing aggregation operators. Even though you can use the
Sum and Maximum operators alone without ordering the dimensions, you cannot use
them together without specifying the order. The following figures show how they calculate
different results depending on the order of aggregation. Figure 3.5 shows a cube with
two dimensions. The Sum is calculated first across one dimension of the cube, then the
Maximum is calculated down the other dimension.

Fig. 3.5 Sum Method followed by Maximum Method

In Figure 3.6 the Maximum is calculated first down one dimension of the cube,
and then the Sum is calculated across the other dimension. The maximum value of the
sums in Figure 3.5 is 15, while the sum of the maximum values is 19 in Figure 3.6.

Fig. 3.6 Maximum Method followed by Sum Method

3.8 NULL VALUES


In populating a database with data objects, it is not uncommon that some of these objects
may not be completely known. For example, getting information from new customers
through forms, which customers are requested to fill, some fields may be left blank. In
these cases, rather than not having any information at all, is still better to record that
information which you have. Leaving a field blank is not good, as it can be interpreted as
an empty string that may be a valid value for some domains. A value is needed that
denotes the unknown and that cannot be confused with the valid domain values.
100 Self-Instructional Material
It is here that the Null value is used. It is a special value that is different from any Structured Query Language

other value from any attribute domain. At the same time, it belongs to every attribute
domain in the database, i.e., it may appear as a value for any attribute and not violate any
type constraints. Syntactically, different DBMSs use different symbols to denote null
values. The null values affect relational operations. All relational operations involve NOTES
comparing values in tuples, including projection that involves comparison of result tuples
for duplicates. The keyto answering this question is in how we evaluate boolean operations
involving null values. For example, what does ‘? > 5’ evaluate to? The unknown value
could be greater than 5. But then again, it may not be. That is, the value of the boolean
expression cannot be determined on the basis of available information. They all rely on
comparisons evaluating categorically to one of two values: TRUE or FALSE. For example,
if the above comparison ‘? > 5’ was generated in the process of selection, you would not
know whether to include or exclude the associated tuple in the result if you were to
admit a third value UNKNOWN. If you wanted to do that, you must go back and
redefine all these operations based on some form of three-valued logic. To avoid this
problem, most systems that allow null values simply interpret any comparison involving
them as FALSE. The rationale is that even though they could be true, they are not
demonstrably true on the basis of what is known. That is, the result of any relational
operation conservatively includes only tuples that demonstrably satisfy conditions
of the operation. Adopting this convention, all the operations defined previously
still hold without any amendment. Some implications on the outcome of each
operation are considered here.
For the Select operation, an unknown value cannot identify a tuple. This is illustrated
in Figure 3.7 that shows two Select operations applied to the relation R. The selection
criteria ranges over the entire domain of the attribute I2 that exists between the two
operations, One would expect, therefore, that any tuple in R1 would either be in the
result of the first or the second. This is not the case, however, as the second tuple in R1
‘<b,?>’ is not selected in either operation, the unknown value in it falsifies the selection
criteria of both operations.

R...
R1 I1 I2
Select R1 Where I2<=2
I1 I2 a 1
a 1
b ? R...
Select R1 Where I2>=2
c 3 I1 I2
c 3

Fig. 3.7 Selecting over Null Values

For Projection, tuples containing null values that are otherwise identical are not
considered to be duplicates. This is because the comparison ‘? = ?’, by the above
convention, evaluates to FALSE. This leads to the situation as illustrated in Figure 3.8.
The reader should note from this example that the symbol ‘?’, while it denotes some
value much like a mathematical variable, is quite unlike the latter in that its occurrences
do not always denote the same value. Thus ‘? = ?’ is not demonstrably true and therefore
considered FALSE.
Self-Instructional Material 101
Structured Query Language

R1 R ...
I1 I2 I3
Project Over 12, 13
I2 I3
a 1 ? 1 ?
NOTES
b 1 ? 2 ?
c 1 ? 3 ?

Fig. 3.8 Projecting over Null Values

In a Join operation, tuples having null values under the common attributes are not
concatenated. This is illustrated in Figure 3.9 ‘?=1’, ‘1=?’ and ‘?=?’ are all FALSE.

R1 R2
I1 I2 I2 I3
a 1 Join R1 and R2 1 x
b ? ? y
c ? ? z
R1
I1 I2 I3
a 1 x

Fig. 3.9 Joining over Null Values

In Division, the occurrence of even one null value in the divisor means that the
result will be an empty relation, as any value in the dividend’s common attributes will fail
when matched with it. This is illustrated in Figure 3.10. It is not necessarily the case if
only the dividend contains null values under the common attributes, division may still be
successful on tuples not containing null values.

R1
R2
I1 I2
a 1 Drive R1 by R2 I2
?
a ?
b ?
R .. .
I1
Empty!

Fig. 3.10 Division with Null Divisors

In set operations, because tuples are treated as a single unit in comparisons, a


single rule applies: otherwise identical tuples containing null values are considered
to be different. Figure 3.11 illustrates this for each set operation. Due to the occurrence
of null values, the tuples in R2 are not considered duplicates of R1’s tuples. Thus, their
union simply collects tuples from both relations; subtracting R2 from R1 simply results in
R1; and their intersection is empty.

102 Self-Instructional Material


Structured Query Language
R2
R1
I1 I2 I3
I1 I2 I3
R1 Union R2 b 1 ?
a 1 x
c 1 ?
b 1 ? NOTES
R ... R1 Minus R2
c 1 ?
I1 I2 I3
a 1 x R ... R1 Inersect R2
b 1 ? I1 I2 I3
c 1 ? a 1 x
R ...
b 1 ? b I1 I2 I3
1 ?
c 1 ? c 1 ?

Fig. 3.11 Set Operations involving Null Values

3.9 COMPLEX INTEGRITY CONSTRAINTS IN SQL


A complex integrity constraint is a user-defined rule for a column or set of columns that
allows or disallows inserts, updates or deletes of a row based on the value it contains for
the column or set of columns. Apart from uniqueness—the key integrity constraint—
which must be enforced by the DBMS and must ensure that no component of the
primary key is allowed to take the undefined (null) value. For example, consider a table
that represents courses taken by students: ENROL(StudentNo, CourseNo, CourseYear,
Semester, FullTime, TestMark, ExamMark).
Each row in the table must have values for the StudentNo, CourseNo, CourseYear,
and Semester columns. All these make up the composite key, representing the fact that
a student may take the same course in the same semester in different years, or in
different semesters in the same year (for example, if he or she failed the course the first
time).
1. Domain Complex Integrity Constraints
Domains are used in the relational model to define the characteristics of the columns of
a table. The domain specifies its own name, data type and logical size. The logical size
represents the size as perceived by the user, not how it is implemented internally. For
example, for an integer, the logical size represents the number of digits used to display
the integer, not the number of bytes used to store it. The domain integrity constraints are
used to specify the valid values that a column defined over the domain can take. You can
define the valid values by listing them as a set of values (such as an enumerated data
type in a strongly typed programming language), a range of values or an expression that
accepts the valid values. Strictly speaking, only values from the same domain should
ever be compared or be integrated through a union operator.
A formal treatment of the domain concept would require the following for all of
the domains:
• The ability to specify the complete set of domains that apply to a given database
(the result of any operation on any column defined over any domain must then
yield a result in one of the specified domains)
• The ability to specify - for every domain, pair of domains, triplet of domains,
and so on - which operators can be applied to the values taken from the
domains, as well as what the domain of the result must be
Self-Instructional Material 103
Structured Query Language • The ability to specify an ordering of the values in the domain
In the previous example, FullTime would be defined over a domain called Flag, of
data type and size char(1), which may take the values ‘y’ and ‘n’. TestMark and
ExamMark would be defined over a domain called Mark, of data type decimal (5,2),
NOTES which may take the real values between 0 and 100 inclusively.
2. Column Complex Integrity Constraints
A column integrity constraint is used to specify additional restrictions on a column defined
over a domain. You use a column integrity constraint to further refine or restrict the
definition of a generic domain for the specific context of the table in which the domain is
used. This can be used to eliminate the proliferation of a number of similar domains.
Consider the complex integrity constraint in this scenario. A company has been
sold. Due to this sale, all employee numbers must be updated to be the current value plus
5000 to coordinate with the new company’s employee numbers as manager numbers
are really employee numbers, these values must also increase by 5000. (See Table 3.4).
Table 3.4 The EMP Table before Updates

EMPNO MGR

210
211 210
212 211

UPDATE employees
SET employee_id = employee_id + 5000,
manager_id = manager_id + 5000;

Even though a constraint is defined to verify that each mgr value matches an
empno value, this statement is legal because Oracle effectively performs its constraint
checking after the statement completes. (See Figure 3.12).

EMPNO MGR EMPNO MGR EMPNO MGR


5210 5210 5210
211 210 5211 5210 5211 5210
212 211 212 211 5212 5211

Update to Update to Update to Constraints


second row second row third row checked

Fig. 3.12 Complex Constraint Checking

The examples in this section illustrate the constraint checking mechanism during
INSERT and UPDATE statements. The same mechanism is used for all types of DML
statements, including UPDATE, INSERT and DELETE statements.
The complex integrity constraints use the checking mechanism. The same
mechanism is used for all types of constraints, including the following:
• NOT NULL
• UNIQUE key
• PRIMARY KEY
• All types of FOREIGN KEY constraints
• CHECK constraints
104 Self-Instructional Material
Structured Query Language
3.10 TRIGGERS AND VIEWS IN SQL

1. Triggers
NOTES
A trigger is a special kind of PL/SQL anonymous block. You can define triggers to fire
before or after SQL statements, either on a statement level or for each row that is
affected. You can also define INSTEAD OF triggers or system triggers (triggers on
DATABASE and SCHEMA).
Triggers are PL/SQL functions that are executed based on some specific activity
or condition in the form. These activities, called events, are the foundation of Oracle
Forms’ user extensibility feature. While many trigger events perform a particular default
function, a trigger can be written to disable, modify or enhance the default processing
capabilities associated with the event. Triggers can be defined at the form, block record
or individual field level. In addition, a trigger may cause another trigger to ‘fire’ as well
as execute another related program unit.
A trigger is a PL/SQL block structure which is fired when DML statements like
Insert, Delete and Update are executed on a database table. A trigger is triggered
automatically when an associated DML statement is executed.
Anonymous Blocks in Oracle Tools

Object Type Environment Description


Trigger Oracle Developer/2000 Place PL/SQL code directly in the Oracle
Forms or Oracle Reports trigger, which is
then packaged as an anonymous block
by the tool and sent to the PL/SQL engine.
Database trigger Record or column of table The body of the trigger is coded in PL/
SQL. While the trigger has a name, the
PL/SQL code itself is unnamed, hence
anonymous.

Database triggers
A database trigger is a stored subprogram associated with a table. You can have Oracle
automatically fire the database trigger before or after an INSERT, UPDATE or DELETE
statement affects the table. One of the many uses for database triggers is to audit data
modifications; for example, the following database trigger fires whenever salaries in the
emp table are updated:
CREATE TRIGGER audit_sal
AFTER UPDATE OF sal ON emp
FOR EACH ROW
BEGIN
INSERT INTO emp_audit VALUES ...
END;

You can use all the SQLdata manipulation statements and any procedural statement
in the executable part of a database trigger.

Self-Instructional Material 105


Structured Query Language Syntax of triggers
The Syntax for creating a trigger is:
CREATE [OR REPLACE ] TRIGGER trigger_name
NOTES {BEFORE | AFTER | INSTEAD OF }
{INSERT [OR] | UPDATE [OR] | DELETE}
[OF col_name]
ON table_name
[REFERENCING OLD AS o NEW AS n]
[FOR EACH ROW]
WHEN (condition)
BEGIN
— sql statements
END;

• CREATE [OR REPLACE ] TRIGGER trigger_name - This clause creates a


trigger with the given name or overwrites an existing trigger with the same
name.
• {BEFORE | AFTER | INSTEAD OF } - This clause indicates at what time
should the trigger get fired, i.e., for example: before or after updating a table.
INSTEAD OF is used to create a trigger on a view, before and after cannot be
used to create a trigger on a view.
• {INSERT [OR] | UPDATE [OR] | DELETE} - This clause determines the
triggering event. More than one triggering events can be used together separated
by OR keyword. The trigger gets fired at all the specified triggering event.
• [OF col_name] - This clause is used with update triggers. It is used when you
want to trigger an event only when a specific column is updated.
• CREATE [OR REPLACE ] TRIGGER trigger_name - This clause creates a
trigger with the given name or overwrites an existing trigger with the same
name.
• [ON table_name] - This clause identifies the name of the table or view to
which the trigger is associated.
• [REFERENCING OLD AS o NEW AS n] - This clause is used to reference
the old and new values of the data being changed. By default, you reference
the values as :old.column_name or :new.column_name. The reference names
can also be changed from old (or new) to any other user-defined name. You
cannot reference old values when inserting a record, or new values when deleting
a record, because they do not exist.
• [FOR EACH ROW] - This clause is used to determine whether a trigger must
fire when each row gets affected ( i.e., a Row Level Trigger) or just once when
the entire SQL statement is executed (i.e., statement level Trigger).
• WHEN (condition) - This clause is valid only for row level triggers. The trigger
is fired only for rows that satisfy the condition specified.
Data access for triggers
When a trigger is fired, the tables referenced in the trigger action might be currently
undergoing changes by SQL statements in other users’ transactions. In all such cases,
106 Self-Instructional Material
the SQL statements executed within triggers follow the common rules used for standalone Structured Query Language
SQL statements. In particular, if an uncommitted transaction has modified values that a
trigger being fired either needs to read (query) or write (update), the SQL statements in
the body of the trigger being fired use the following guidelines:
NOTES
• Queries see the current read-consistent snapshot of referenced tables and any
data changed within the same transaction.
• Updates wait for existing data locks to be released before proceeding.
The following examples illustrate these points:
Example:
Assume that the SALARY_CHECK trigger (body) includes the following SELECT
statement:
SELECT minsal, maxsal INTO minsal, maxsal
FROM salgrade
WHERE job_classification = :new.job_classification;

For this example, assume that transaction T1 includes an update to the


MAXSAL column of the SALGRADE table. At this point, the SALARY_CHECK
trigger is fired by a statement in transaction T2. The SELECT statement within the
fired trigger (originating from T2) does not see the update by the uncommitted
transaction T1, and the query in the trigger returns the old MAXSAL value as of the
read-consistent point for transaction T2.
Example:
Assume the following definition of the TOTAL_SALARY trigger, a trigger to
maintain a derived column that stores the total salary of all members in a department:
CREATE TRIGGER total_salary
AFTER DELETE OR INSERT OR UPDATE OF deptno, sal ON emp
FOR EACH ROW BEGIN
/* assume that DEPTNO and SAL are non-null fields */
IF DELETING OR (UPDATING AND :old.deptno != :new.deptno)
THEN UPDATE dept
SET total_sal = total_sal - :old.sal
WHERE deptno = :old.deptno;
END IF;
IF INSERTING OR (UPDATING AND :old.deptno != :new.deptno)
THEN UPDATE dept
SET total_sal = total_sal + :new.sal
WHERE deptno = :new.deptno;
END IF;
IF (UPDATING AND :old.deptno = :new.deptno AND
:old.sal != :new.sal )
THEN UPDATE dept
SET total_sal = total_sal - :old.sal + :new.sal
WHERE deptno = :new.deptno;
END IF;
END;
Self-Instructional Material 107
Structured Query Language For this example, suppose that one user’s uncommitted transaction includes an
update to the TOTAL_SAL column of a row in the DEPT table. At this point, the
TOTAL_SALARY trigger is fired by a second user’s SQL statement. Because the
uncommitted transaction of the first user contains an update to a pertinent value in
NOTES the TOTAL_SAL column (in other words, a row lock is being held), the updates
performed by the TOTAL_SALARY trigger are not executed until the transaction
holding the row lock is committed or rolled back. Therefore, the second user waits
until the commit or rollback point of the first user’s transaction.
Storage of triggers
Oracle stores triggers in their compiled form, just like stored procedures. When a CREATE
TRIGGER statement commits, the compiled PL/SQL code, called P code (for
pseudocode), is stored in the database and the source code of the trigger is flushed from
the shared pool.
Execution of triggers
Oracle executes a trigger internally using the same steps used for procedure execution.
The only subtle difference is that a user has the right to fire a trigger if he or she has the
privilege to execute the triggering statement. Other than this, triggers are validated and
executed the same way as stored procedures.
Dependency maintenance for triggers
Like procedures, triggers are dependent on referenced objects. Oracle automatically
manages the dependencies of a trigger on the schema objects referenced in its trigger
action. The dependency issues for triggers are the same as those for stored procedures.
Triggers are treated like stored procedures; they are inserted into the data dictionary.
2. Implementing Triggers and Program Units
Oracle forms applications can be further customized by developing PL/SQL procedures
and functions called triggers. These triggers are attached to specific activities in the
form called events. Events typically are defined as before (PRE-FORM, PRE-QUERY,
PREINSERT), after (POST-FORM, POST-QUERY, POST-INSERT), or during
(WHEN-NEW-FORM-INSTANCE, WHEN-BUTTON-PRESSED) common database
activities. Additionally, triggers can be associated with certain keyboard activities (for
backward compatibility with character-based applications), although the trend is to
minimize key triggers.
The customer maintenance form utilizes a pre-insert trigger to determine a unique
customer number based on a sequence generator. Additionally, triggers are defined for
the buttons at the bottom of the form.
To construct this form, create a block for the CUSTOMERS table using a form
style in the block layout definition. Arrange the items and boilerplate objects as shown in
the figure and using the button tool, create three buttons as shown on the screen (These
buttons will be labeled as PUSH_BUTTONx.). Using the Size Objects and Align Objects
tools in the Arrange menu, position and size the form objects to appear.
Now, define the object properties as needed by defining the window and canvas
sizes and titles. Next, because the customer number should be protected from update,
change the Update Allowed and Navigable properties for the CUST_NO item to False.
Modify the label properties for the buttons to reflect the text.
108 Self-Instructional Material
At this point, the additional processing logic may be added to the form. First, Structured Query Language
select the triggers group directly below the CUSTOMERS block in the Object Navigator.
Click on the Add Object button and a list will appear with the names of all allowable
triggers that can be built for the form. The trigger to be built will determine the customer
number for a new customer prior to insert based on the database sequence generator. NOTES
While the list is displayed, either use the scroll bar to find the PRE-INSERT trigger or
type the trigger name into the input box.

Fig. 3.13 PL/SQL Editor for Trigger Creation

The PL/SQLeditor (shown in Figure 3.13) will now appear, indicating the trigger
level and the name of the trigger with an area that may be used to enter the trigger
procedure as shown. Once the text has been entered, click the Compile button to make
sure that there are no code errors and then click the Close button to complete the trigger
definition. Note the colon used to reference form fields. Other buttons can be used to
revert to the version prior as of the last close or compilation, to create a new trigger, or
to delete the existing trigger.
Finally, WHEN-BUTTON-PRESSED triggers should be created for the three
buttons that were created on the form canvas as follows:
Save: begin
commit_form;
end;
Clear: begin
clear_form;
end;
Exit: begin
do_key (‘EXIT_FORM’);
end;
Self-Instructional Material 109
Structured Query Language The first two triggers use standard built-in procedures, while the third uses the
DO_KEY built-in procedure to simulate pressing the Exit key.
The Exit key is defined by the keyboard mapping for your display device. On the
IBM PC keyboard, this normally defaults to the Esc key, but may be modified using
NOTES Oracle Terminal. To view the actual key mapping during Oracle Forms runtime, type
Cntrl-K or select show keys from the help menu in the form.
This is generally a good practice to use when there is more than one way to
perform the same function. Any special logic that needs to be performed prior to exiting
the form can be coded in a KEY-EXIT trigger to ensure consistency. Additionally, a
trigger can be written so that it calls a user defined program unit. For example, if the
customer maintenance form changes the credit limit for a customer, an acceptance
letter should be sent to the customer. The POST-COMMIT trigger should be written as
follows:
begin
if :customers.cust_cre dit_limit > 0 and
:customers.old_credit is null then
print_confirmation_letter (:customers.cust_no);
end if;
end;

The print_confirmation_letter procedure would then be created as a program unit


in the form.
Using record groups and lists of values for data validation
A record group is an internal structure that is analogous to a table in memory. Record
groups contain columns and rows with data based on a structure defined by a query or
column definition. Usages of record groups include parameters, structured arrays and
validation entities. This last usage of record groups is demonstrated in the item price
maintenance form shown in Figure 3.14.

Fig. 3.14 Item Price Maintenance Form


110 Self-Instructional Material
To build this form, start with a default block for the items table using the form Structured Query Language
style layout. Next, change the Displayed Canvas to <Null> and the Displayed property
to False for the ITEM_PL_ID, ITEM_PC_ID, ITEM_PROD_NO, ITEM_CP_NO,
and ITEM_SIZE_CODE items. These items will be hidden from view and will be updated
by the List of Values validations that will be constructed for this form. Also, make the NOTES
ITEM_NO field non-updatable and non-navigable.
To make this form more user-friendly, the translations for the code fields will be
displayed as the input fields in the form. Using the Field tool, create five new fields as
PL_NAME, PC_NAME, PROD_NAME, CP_NAME, and SIZE_DESC. The
properties for these fields are shown in Table 3.5.
Table 3.5 Item Maintenance Description Fields

Property Value
Canvas ITEM_CANVAS
Displayed True
Width 200
Height 200
Database Table Item False
These fields will need to be populated with data whenever an ITEMS record is
queried. To do this, create the POST-QUERY trigger on the ITEMS block as follows:
begin
select pl.pl_name, pc.pc_name, prod.prod_name, cp.cp_name,
s.size_desc
into :items.pl_name, :items.pc_name, :items.prod_name,
:items.cp_name,
:items.size_desc
from product_lines pl, product_classes pc, products prod,
color_patterns cp, sizes s
where pl.pl_id = pc.pc_pl_id
and pc.pc_pl_id = prod.prod_pl_id
and pc.pc_id = prod.prod_pc_id
and prod.prod_pl_id = :items.item_pl_id
and prod.prod_pc_id = :items.item_pc_id
and prod.prod_no = :items.item_prod_no
and cp.cp_no = :items.item_cp_no
and s.size_code = :items.item_size_code;
exception
when NO_DATA_FOUND then
message (‘Database Integrity Error. Contact
your DBA.’);
bell;
raise FORM_TRIGGER_FAILURE;
end;

Self-Instructional Material 111


Structured Query Language 3. Types of Triggers
This section describes the different types of triggers:
• Row and statement triggers
NOTES • BEFORE and AFTER triggers
• INSTEAD OF triggers
(a) Row and Statement Triggers
When you define a trigger, you can specify the number of times the trigger action is to be
executed: once for every row affected by the triggering statement (such as might be
fired by an UPDATE statement that updates many rows), or once for the triggering
statement, no matter how many rows it affects.
(i) Row Triggers
A row trigger is fired each time the table is affected by the triggering statement. For
example, if an UPDATE statement updates multiple rows of a table, a row trigger is
fired once for each row affected by the UPDATE statement. If a triggering statement
affects no rows, a row trigger is not executed at all.
Row triggers are useful if the code in the trigger action depends on data provided
by the triggering statement or rows that are affected.
Statement/row trigger sequencing
To understand Trigger Timing, assume that you have four Triggers going off at different
times during the DML. You could have two Statement Triggers defined for BEFORE
and AFTER the DML. You could have two Row Triggers defined BEFORE and AFTER
each row.
A picture of how it would execute is shown below:
SQL> UPDATE emp
2 SET sal = sal * 1.1
3 WHERE deptno = 30;
BEFORE statement trigger
EMPNO ENAME DEPTNO
BEFORE row trigger
7839 KING 30 AFTER row trigger
30 BEFORE row trigger
7698 BLAKE
AFTER row trigger
7788 SMITH 30 BEFORE row trigger
AFTER row trigger
AFTER statement trigger

Restricting the effect of a trigger (WHEN clause)


1. In the header of the Trigger, you can restrict the effect of the Trigger to only
apply to certain Rows with a WHEN clause.
2. Note that in the WHEN clause you Reference New, or Old with NO leading
colon (:)
3. Actually you could do the same thing with an IF statement inside the it is
BEGIN block, but it is a little easier to see which rows are affected by the
trigger with the WHEN clause in the Trigger Header.
Restricting a row trigger
CREATE OR REPLACE TRIGGER derive_commission_pct
BEFORE INSERT OR UPDATE OF sal ON emp

112 Self-Instructional Material


FOR EACH ROW Structured Query Language

WHEN (NEW.job = ‘SALESMAN’)


BEGIN
IF INSERTING THEN :NEW.comm := 0;
NOTES
ELSIF :OLD.comm IS NULL
THEN :NEW.comm := 0;
ELSE :NEW.comm := :OLD.comm * (:NEW.sal/:OLD.sal);
END IF;
END;
/

Another WHEN clause that picks on CHASSELB


create or replace trigger ClemSalary
before update of sal on emp
for each row
when (old.ename = ‘CHASSELB’)
declare
v_sal emp.sal%TYPE;
begin
v_sal := :old.sal;
if :new.sal > v_sal then
RAISE_APPLICATION_ERROR (-20508,
‘No Pay raise for Clem’);
end if;
end;
/

(ii) Statement Triggers


A statement trigger is fired once on behalf of the triggering statement, regardless of the
number of rows in the table that the triggering statement affects (even if no rows are
affected). For example, if a DELETE statement deletes several rows from a table, a
statement-level DELETE trigger is fired only once, regardless of how many rows are
deleted from the table.
Statement triggers are useful if the code in the trigger action does not depend on
the data provided by the triggering statement or the rows affected. For example, if a
trigger makes a complex security check on the current time or user, or if a trigger
generates a single audit record based on the type of triggering statement, a statement
trigger is used.
Triggering event or statement
A triggering event or statement is either the SQL statement, database event, or user
event that causes a trigger to be fired. A triggering event can be one or more of the
following:
• An INSERT, UPDATE or DELETE statement on a specific table (or view, in
some cases)
• A CREATE, ALTER or DROP statement on any schema object
• A database startup or instance shutdown
Self-Instructional Material 113
Structured Query Language • A specific error message or any error message
• A user logon or logoff
For example, the triggering statement, as shown in Figure 3.15, is
. . . UPDATE OF parts_on_hand ON inventory . . .
NOTES

SQL Statement
UPDATE t1 SET . . .;

Fires the
UPDATE_T1 UPDATE_T1 Trigger
Trigger BEFORE UPDATE 01T t1
FOR EACH ROW
BEGIN
.
.
INSERT INTO t2 VALUES (...);
END;

Fires the
INSERT_T2
INSERT_T2 Trigger
Trigger BEFORE INSERT ON t2
FOR EACH ROW
BEGIN
.
.
INSERT INTO ... VALUES (...);
END;

etc.

Fig. 3.15 Triggering Statement

This means, when the PARTS_ON_HAND column of a row in the INVENTORY


table is updated, fire the trigger. Note that when the triggering event is an UPDATE
statement, you can include a column list to identify which columns must be updated to
fire the trigger. You cannot specify a column list for INSERT and DELETE statements,
because they affect entire rows of information.
A triggering event can specify multiple SQL statements, as in
. . . INSERT OR UPDATE OR DELETE OF inventory . . .

which means: when an INSERT, UPDATE or DELETE statement is issued against


the INVENTORY table, fire the trigger. When multiple types of SQL statements can
fire a trigger, you can use conditional predicates to detect the type of triggering statement.
In this way, you can create a single trigger that executes different codes based on the
type of statement that fires the trigger.
(b) BEFORE and AFTER Triggers
When defining a trigger, you can specify the trigger timing—whether the trigger action
is to be executed before or after the triggering statement. BEFORE and AFTER apply
to both statement and row triggers.
BEFORE triggers
BEFORE triggers execute the trigger action before the triggering statement is executed.
This type of trigger is commonly used in the following situations:
114 Self-Instructional Material
• When the trigger action should determine whether the triggering statement should Structured Query Language
be allowed to complete. Using a BEFORE trigger for this purpose, you can eliminate
unnecessary processing of the triggering statement and its eventual rollback in
cases where an exception is raised in the trigger action.
• To derive specific column values before completing a triggering INSERT or NOTES
UPDATE statement.
AFTER triggers
AFTER triggers execute the trigger action after the triggering statement is executed.
AFTER triggers are used in the following situations:
• When you want the triggering statement to complete before executing the trigger
action.
• If a BEFORE trigger is already present, an AFTER trigger can perform different
actions on the same triggering statement.
Trigger combinations
Using the options listed above, you can create four types of triggers:
• BEFORE statement trigger
Before executing the triggering statement, the trigger action is executed.
• BEFORE row trigger
Before modifying each row affected by the triggering statement and before checking
appropriate integrity constraints, the trigger action is executed provided that the trigger
restriction was not violated.
• AFTER statement trigger
After executing the triggering statement and applying any deferred integrity constraints,
the trigger action is executed.
• AFTER row trigger
After modifying each row affected bythe triggering statement and possiblyapplying
appropriate integrity constraints, the trigger action is executed for the current row provided
the trigger restriction was not violated. Unlike BEFORE row triggers, AFTER row
triggers lock rows.
You can have multiple triggers of the same type for the same statement for any
given table. For example, you may have two BEFORE statement triggers for UPDATE
statements on the EMP table. Multiple triggers of the same type permit modular installation
of applications that have triggers on the same tables. Also, Oracle snapshot logs use
AFTER row triggers, so you can design your own AFTER row trigger in addition to the
Oracle-defined AFTER row trigger.
You can create as many triggers of the preceding different types as you need for
each type of DML statement (INSERT, UPDATE, or DELETE).
For example, suppose you have a table, SAL, and you want to know when the
table is being accessed and the types of queries being issued. The example that follows
contains a sample package and trigger that tracks this information by the hour and type
of action (for example, UPDATE, DELETE or INSERT) on table SAL. A global session
variable, STAT.ROWCNT, is initialized to zero by a BEFORE statement trigger. Then, it
is increased each time the row trigger is executed. Finally, the statistical information is
saved in the table STAT_TAB by the AFTER statement trigger.

Self-Instructional Material 115


Structured Query Language Sample package and trigger for SAL Table
DROP TABLE stat_tab;
CREATE TABLE stat_tab(utype CHAR(8),
NOTES rowcnt INTEGER, uhour INTEGER);
CREATE OR REPLACE PACKAGE stat IS
rowcnt INTEGER;
END;
/
CREATE TRIGGER bt BEFORE UPDATE OR DELETE OR INSERT ON sal
BEGIN
stat.rowcnt := 0;
END;
/
CREATE TRIGGER rt BEFORE UPDATE OR DELETE OR INSERT ON sal
FOR EACH ROW BEGIN
stat.rowcnt := stat.rowcnt + 1;
END;
/
CREATE TRIGGER at AFTER UPDATE OR DELETE OR INSERT ON sal
DECLARE
typ CHAR(8);
hour NUMBER;
BEGIN
IF updating
THEN typ := ‘update’; END IF;
IF deleting THEN typ := ‘delete’; END IF;
IF inserting THEN typ := ‘insert’; END IF;
hour := TRUNC((SYSDATE - TRUNC(SYSDATE)) * 24);
UPDATE stat_tab
SET rowcnt = rowcnt + stat.rowcnt
WHERE utype = typ
AND uhour = hour;
IF SQL%ROWCOUNT = 0 THEN
INSERT INTO stat_tab VALUES (typ, stat.rowcnt, hour);
END IF;
EXCEPTION
WHEN dup_val_on_index THEN
UPDATE stat_tab
116 Self-Instructional Material
SET rowcnt = rowcnt + stat.rowcnt Structured Query Language

WHERE utype = typ


AND uhour = hour;
END; NOTES
/
Dropping triggers
You can DROP triggers just like anything. The general format would be something like:
DROP TRIGGER trigger_name;
Dropping Triggers
To drop a trigger:
drop trigger <trigger_name>;
BEFORE DELETE trigger
ABEFORE DELETE Trigger means that Oracle will fire this trigger before the DELETE
operation is executed.
The syntax for an BEFORE DELETE Trigger is:
CREATE or REPLACE TRIGGER trigger_name
BEFORE DELETE
ON table_name
[ FOR EACH ROW ]
DECLARE
— variable declarations
BEGIN
— trigger code
EXCEPTION
WHEN ...
— exception handling
END;
trigger_name is the name of the trigger to create.
Restrictions:
• You cannot create a BEFORE trigger on a view.
• You can update the: NEW values.
• You can not update the: OLD values.
For example:
If you had a table created as follows:
CREATE TABLE orders
( order_id number(5),
quantitynumber(4),
cost_per_item number(6, 2),

Self-Instructional Material 117


Structured Query Language total_cost number(8, 2)
);
We could then create a BEFORE DELETE trigger as follows:
NOTES CREATE OR REPLACE TRIGGER orders_before_delete
BEFORE DELETE
ON orders
FOR EACH ROW
DECLARE
v_username varchar2(10);
BEGIN
— Find username of person performing the DELETE on the table
SELECT user INTO v_username
FROM dual;
— Insert record into audit table
INSERT INTO orders_audit
( order_id,
quantity,
cost_per_item,
total_cost,
delete_date,
deleted_by )
VALUES
( :old.order_id,
:old.quantity,
:old.cost_per_item,
:old.total_cost,
sysdate,
v_username );
END;
4. View in SQL
A view is a basic schema object in an Oracle database, that is, an access path to the data
in one or more tables. A view may be defined as a ‘stored query’ or a ‘virtual table’. It
is called virtual table because, they do not exist as independent entities in the database as
do ‘real’ tables. A view is a specification consisting of a SELECT statement that tells
Oracle what records and attributes to retrieve. When view is called, the system associates
the appropriate data with it. It retrieves derived data only when they are called.
To create a view the user should have the CREATE VIEW privilege.

118 Self-Instructional Material


Why use views? Structured Query Language

There are several advantages of using views:


• Simplicity: A view masks the complexity of retrieving the data. Even if a view is
defined as multiple table queries, the view still looks like a single table query. The NOTES
complexities are placed in the view by the database administrators and are hidden
from the user.
• Security: Views enhance the confidentiality of data. Access permissions can be
granted to views quite separately from tables, and so a user may have permission
to query a view without having any permission to query its underlying tables. A
user has access to all of the columns and rows in a table after the user has been
granted the SELECT privilege. A view can limit the availability of the table’s
column. The view was a SELECT statement; therefore, the SELECT and
WHERE clauses can filter the expressions and rows available to the user. User is
given access only the rows or columns of the tables those concern them.
• User reports: End-users may not be able to understand the relationships present
among the tables. Views can be used to perform all of the complex join and
filtering in the background, producing a simple view that the users can query
either directly or through reporting tools.
• Data integrity: The WITH CHECK OPTION clause is used to enforce the query
conditions on any updates to the view.
There are three types of views:
• Inline-view: Consists of SELECT statement that is embedded in the FROM clause
of another SELECT statement.
• Database-view: Performs same basic function retrieves data to a calling object.
• Materialized-view: Jas some physical properties and consists of data that is moved
into.
Inline view replaces a physical table in the FROM clause. It is executed at run-time and
furnishes its result set to the outer statement. The main difference between an inline
view and a database view is that the inline view resides within another SELECT statement
and can be called by only that statement. A database view resides in the database and
can be called by many other objects. Another difference is that a database view can be
used in the DML statement.

3.11 EMBEDDED SQL AND CURSORS


Despite the name, SQL does not provide constructs of languages like C or Pascal. It
lacks branching (IF) and looping (GOTO, DO and FOR) statements. Some database
vendors like ORACLE give extensions to SQL to achieve branching, looping, etc. Oracle
utilizes a version of SQL known PL/SQL. However, they do not form part of the SQL
standard. SQLcan, however, be incorporated into other programming languages known
as Embedded SQL.
Check Your Progress
1. Types of Cursors in PL/SQL 6. Why are aliases used to
provide an alternate name to
A cursor is a handle or a name for an area in memory in which a parsed statement and a column or expression?
other information for processing the statement are kept; such an area is also called a
7. What is the use of UNION?
private SQL area. Although most Oracle users rely on the automatic cursor handling of
8. Define INTERSECT.
the Oracle utilities, the programmatic interfaces offer application designers more control

Self-Instructional Material 119


Structured Query Language over cursors. In application development, a cursor is a named resource available to a
program and can be used specifically for the parsing of SQL statements embedded
within the application.
Cursors can be declared explicitly within a procedure, function or package to
NOTES facilitate record-oriented processing of Oracle data. Cursors can also be declared implicitly
(to support other data manipulation actions) by the PL/SQL engine.
The PL/SQL engine is a special component of many Oracle products, including
the Oracle Server that processes PL/SQL. Figure 3.16 illustrates the PL/SQL engine
contained in Oracle Server.
Oracle Server

Database SGA PL/SQL Engine


Application

Procedure Procedural Statement


Program code Executor
Begin
Program code Procedural
Procedural
Procedure code SQL
Prodedural
Program code SQL SQL
END;
Program code

SQL Statement
Executor

Database

Fig. 3.16 The PL/SQL Engine

2. SQL Operators
PL/SQL lets you use all the SQL comparison, set and row operators in SQL statements.
This section briefly describes some of these operators.
Comparison operators
Typically, you can use comparison operators in the WHERE clause of a data manipulation
statement to form predicates that compare one expression to another and yield TRUE,
FALSE or NULL. You can use the comparison operators in the following list to form
predicates. You can also combine predicates using the logical operators AND, OR and
NOT.
Operator Description
ALL Compares a value to each value in a list or returned by a
subquery and yields TRUE if all of the individual comparisons
yield TRUE.
ANY, SOME Compares a value to each value in a list or returned by a
subquery and yields TRUEif any of the individual comparisons
yields TRUE.
BETWEEN Tests whether a value lies in a specified range.
120 Self-Instructional Material
EXISTS Returns TRUE if a subquery returns at least one row. Structured Query Language

IN Tests for set membership.


IS NULL Tests for nulls.
LIKE Tests whether a character string matches a specified pattern, NOTES
which can include wildcards.
Set operators
Set operators combine the results of two queries into one result. INTERSECT returns
all distinct rows selected by both the queries. MINUS returns all distinct rows selected
by the first query but not by the second. UNION returns all distinct rows selected by
either query. UNIONALL returns all rows selected by either query, including all duplicates.
Row operators
Row operators return or reference particular rows. ALL retains duplicate rows in the
result of a query or in an aggregate expression. DISTINCT eliminates duplicate rows
from the result of a query or from an aggregate expression. PRIOR refers to the parent
row of the current row returned by a tree-structured query.
3. Managing Cursors in PL/SQL
PL/SQL uses implicit and explicit cursors. PL/SQL declares a cursor implicitly for all
SQL data manipulation statements, including queries that return only one row. If you
want precise control over query processing, you can declare an explicit cursor in the
declarative part of any PL/SQL block, subprogram or package. You must declare an
explicit cursor for queries that return more than one row.
Explicit cursors
When you need precise control over query processing, you can explicitly declare a
cursor in the declarative part of any PL/SQL block, subprogram or package.
You use three commands to control a cursor: OPEN, FETCH and CLOSE. First,
you initialize the cursor with the OPEN statement, which identifies the result set. Then,
you can execute FETCH repeatedly until all rows have been retrieved, or you can use
the BULK COLLECT clause to fetch all rows at once. When the last row has been
processed, you release the cursor with the CLOSE statement.
This technique requires more code than other techniques such as the implicit
cursor FOR loop. Its advantage is flexibility. You can:
• Process several queries in parallel by declaring and opening multiple cursors.
• Process multiple rows in a single loop iteration, skip rows, or split the processing
into more than one loop.
Declaring a cursor
You must declare a cursor before referencing it in other statements. You give the cursor
a name and associate it with a specific query. You can optionally declare a return type
for the cursor, such as table_name%ROWTYPE. You can optionally specify parameters
that you use in the WHERE clause instead of referring to local variables. These parameters
can have default values. The following example shows how you can declare cursors:

Self-Instructional Material 121


Structured Query Language Example:
DECLARE
my_emp_id NUMBER(6); — variable for employee_id
my_job_id VARCHAR2(10); — variable for job_id
NOTES
my_sal NUMBER(8,2); — variable for salary
CURSOR c1 IS SELECT employee_id, job_id, salary FROM
employees
WHERE salary > 2000;
my_dept departments%ROWTYPE; — variable for departments
row
CURSOR c2 RETURN departments%ROWTYPE IS
SELECT * FROM departments WHERE department_id = 110;

The cursor is not a PL/SQL variable: you cannot assign values to a cursor or use
it in an expression. Cursors and variables follow the same scoping rules. Naming cursors
after database tables is possible but not recommended.
A cursor can take parameters that can appear in the associated query wherever
constants can appear. The formal parameters of a cursor must be IN parameters; they
supply values in the query, but do not return any values from the query. You cannot
impose the constraint NOT NULL on a cursor parameter.
As the following example shows, you can initialize cursor parameters to default
values. You can pass different numbers of actual parameters to a cursor, accepting
or overriding the default values as you please. Also, you can add new formal
parameters without having to change existing references to the cursor.
DECLARE
CURSOR c1 (low NUMBER DEFAULT 0, high NUMBER DEFAULT 99)
IS
SELECT * FROM departments WHERE department_id > low
AND department_id < high;

Cursor parameters can be referenced only within the query specified in the cursor
declaration. The parameter values are used by the associated query when the cursor is
opened.
Opening a cursor
Opening the cursor executes the query and identifies the result set that consists of all
rows that meet the query search criteria. For cursors declared using the FOR UPDATE
clause, the OPEN statement also locks those rows. An example of the OPEN statement
follows:
DECLARE
CURSOR c1 IS SELECT employee_id, last_name, job_id, salary
FROM employees
WHERE salary > 2000;
BEGIN
OPEN C1;

Rows in the result set are retrieved by the FETCH statement but not when the
OPEN statement is executed.

122 Self-Instructional Material


Fetching with a cursor Structured Query Language

Unless you use the BULK COLLECT clause, the FETCH statement retrieves the rows
in the result set one at a time. Each fetch retrieves the current row and advances the
cursor to the next row in the result set. You can store each column in a separate variable
or store the entire row in a record that has the appropriate fields, usually declared using NOTES
%ROWTYPE.
For each column value returned by the query associated with the cursor, there
must be a corresponding, type-compatible variable in the INTO list. Typically, you use
the FETCH statement with a LOOP and EXIT WHEN .. NOTFOUND statements.
Note the use of built-in regular expression functions in the queries.
Example
DECLARE
v_jobid employees.job_id%TYPE; — variable for
job_id
v_lastname employees.last_name%TYPE; — variable for
last_name
CURSOR c1 IS SELECT last_name, job_id FROM employees
WHERE REGEXP_LIKE (job_id,
‘S[HT]_CLERK’);
v_employees employees%ROWTYPE; — record variable
for row
CURSOR c2 is SELECT * FROM employees
WHERE REGEXP_LIKE (job_id,
‘[ACADFIMKSA]_M[ANGR]’);
BEGIN
OPEN c1; — open the cursor before fetching
LOOP
FETCH c1 INTO v_lastname, v_jobid; — fetches 2 columns
into variables
EXIT WHEN c1%NOTFOUND;
DBMS_OUTPUT.PUT_LINE( RPAD(v_lastname, 25, ‘ ‘) ||
v_jobid );
END LOOP;
CLOSE c1;
DBMS_OUTPUT.PUT_LINE( ‘——————————————————’ );
OPEN c2;
LOOP
FETCH c2 INTO v_employees; — fetches entire row into
the v_employees record
EXIT WHEN c2%NOTFOUND;
DBMS_OUTPUT.PUT_LINE( RPAD(v_employees.last_name, 25,
‘ ‘) ||
v_employees.job_id );
END LOOP;
CLOSE c2;

Self-Instructional Material 123


Structured Query Language END;
/

The query can reference PL/SQL variables within its scope. Any variables in the
NOTES query are evaluated only when the cursor is opened. In each retrieved salary is multiplied
by 2, even though factor is incremented after every fetch.
Example
Referencing PL/SQL variables within its scope
DECLARE
my_sal employees.salary%TYPE;
my_job employees.job_id%TYPE;
factor INTEGER := 2;
CURSOR c1 IS
SELECT factor*salary FROM employees WHERE job_id =
my_job;
BEGIN
OPEN c1; — factor initially equals 2
LOOP
FETCH c1 INTO my_sal;
EXIT WHEN c1%NOTFOUND;
factor := factor + 1; — does not affect FETCH
END LOOP;
CLOSe c1;
END;
/
To change the result set or the values of variables in the query, you must close
and reopen the cursor with the input variables set to their new values. However, you can
use a different INTO list on separate fetches with the same cursor. Each fetch retrieves
another row and assigns values to the target variables, as shown the following example.
Example:
DECLARE
CURSOR c1 IS SELECT last_name FROM employees ORDER BY
last_name;
name1 employees.last_name%TYPE;
name2 employees.last_name%TYPE;
name3 employees.last_name%TYPE;
BEGIN
OPEN c1;
FETCH c1 INTO name1; — this fetches first row
FETCH c1 INTO name2; — this fetches second row
FETCH c1 INTO name3; — this fetches third row
CLOSE c1;
END;/
If you fetch past the last row in the result set, the values of the target variables
are undefined. Eventually, the FETCH statement fails to return a row. When that happens,
124 Self-Instructional Material
no exception is raised. To detect the failure, use the cursor attribute %FOUND or Structured Query Language

%NOTFOUND.
Fetching bulk data with a cursor
The BULK COLLECT clause lets you fetch all rows from the result set at once. In the NOTES
following example, you bulk-fetch from a cursor into two collections.
Example:
Fetching bulk Data with a cursor
DECLARE
TYPE IdsTab IS TABLE OF employees.employee_id%TYPE;
TYPE NameTab IS TABLE OF employees.last_name%TYPE;
ids IdsTab;
names NameTab;
CURSOR c1 IS
SELECT employee_id, last_name FROM employees WHERE
job_id = ‘ST_CLERK’;
BEGIN
OPEN c1;
FETCH c1 BULK COLLECT INTO ids, names;
CLOsE c1;
— Here is where you process the elements in the collections
FOR i IN ids.FIRST .. ids.LAST
LOOP
IF ids(i) > 140 THEN
DBMS_OUTPUT.PUT_LINE( ids(i) );
END IF;
END LOOP;
FOR i IN names.FIRST .. names.LAST
LOOP
IF names(i) LIKE ‘%Ma%’ THEN
DBMS_OUTPUT.PUT_LINE( names(i) );
END IF;
END LOOP;
END;
/
Closing a cursor
The CLOSE statement disables the cursor, and the result set becomes undefined. Once
a cursor is closed, you can reopen it, which runs the query again with the latest values of
any cursor parameters and variables referenced in the WHERE clause. Any other operation
on a closed cursor raises the predefined exception INVALID_CURSOR.

3.12 SUMMARY
In this unit, you have learned about SQL. It is a language used for interacting with
RDBMSs. IBM, based on E.F. Codd’s paper, ‘A Relational Model of Data for Large
Shared Data Banks’ was the first to develop the prototype for SQL. In 1979, ORACLE,
Self-Instructional Material 125
Structured Query Language an SQL product, was released. It is one of the most popular RDBMSs today. SQL is
independent of the underlying database structure and many different versions of SQL
exist. It is the current standard query language for organizing, managing and retrieving
data/information from databases. SQL is more than a query language. It is not only used
NOTES for retrieving data but also for managing all DBMS functions. SQL in Oracle supports
certain basic functions to control, define and manipulate data. It also provides for the
definition, storage and management of data in a centralized area that can be shared by
many users.
You have also learned about some basic queries in SQL including SELECT,
DISTINCT, WHERE CLAUSE AND RELATIONAL OPERATOR and WHERE
CLAUSE WITH LOGICAL OPERATOR. You have studied about UNION in this unit.
In relational algebra, UNION is defined as the set union of two relations. INTERSECT
AND EXCEPT have also been discussed. INTERSECT is the set intersection of two
relations. It returns only those rows returned by both component queries.
You have also learned about nested queries. These are queries within a query.
Nested queries allow embedding of one or more SELECT commands inside the body of
other queries. This unit also discussed aggregate operators. Aggregation is the process
of consolidating multiple values into a single value. It allows patterns in the data to
emerge, and these patterns form the basis for analysis and decision-making. Null values
were also explained in this unit. Null values denote the unknown. It has a special value
different from any other value from any attribute domain.
You have also learned complex integrity constraints in SQL. Acomplex integrity
constraint is a user-defined rule for a column or set of columns that allows or disallows
inserts, updates or deletes of a row based on the value it contains for the column or set
of columns. Triggers and views in SQL were also explained in the unit. A trigger is a
special kind of PL/SQL anonymous block. A view may be defined as a ‘stored query’ or
a ‘virtual table’. It is called a virtual table because it does not exist as an independent
entity in the database as do ‘real’ tables.
Finally, you have learned about embedded SQL and cursors. In embedded SQL,
SQL commands can be put inside a program written in some other language (called host
language) like C, C++, etc. SQL commands are of varied types to suit different purposes.
A cursor is a handle or name for an area in memory in which a parsed statement
and other information for processing the statement are kept. Such an area is also called
a private SQL area.

3.13 KEY TERMS


• SQL: It is a language used for interacting with relational database management
systems.
• Database: It is a collection of a defined grouping of database objects like tables,
indexes, tablespaces, etc., and storage areas.
• Tablespace: It is a database object that defines the physical structure of the data
sets used to put up the data.
• Table: It is another database object. Each table comprises a set of columns
(vertically) and rows (horizontally), which represent the conceptual view of the
data to be stored.

126 Self-Instructional Material


• Column: It is a single non-decomposable data element in a table. Structured Query Language

• View: It is a virtual table. A view normally does not store data and does not
physically exist.
• Join: It is a query that retrieves rows from more than one table or view. Most
NOTES
join queries contain the WHERE clause conditions that compare two columns
each from a different table.
• Intersect: It is the set intersection of two relations. It returns only those rows
that are returned by both component queries.
• Nested queries: These are queries within a query. It is like nesting of conditional
statements one within the other.

3.14 ANSWERS TO ‘CHECK YOUR PROGRESS’


1. SQL is a ‘declarative language’ (non-procedural). This makes SQL relatively
easy to use as compared to other programming languages. In SQL, the programmer
only specifies what data is needed but it is not required to specify how to retrieve
it. The underlying DBMS analyses the SQL and formulates the way to retrieve
the required information.
2. A tablespace is a database object that defines the physical structure of the data
sets used to put up the data. A table is another database object. Each table
comprises a set of columns (vertically) and rows (horizontally), which represent
the conceptual view of the data to be stored.
3. Data definition language (DDL) is the part of SQL that allows a database user to
create and restructure database objects, such as the creation or the deletion of a
table.
4. Data control commands in SQL allow you to control access to data within the
database. These DCL commands are normally used to create objects related to
user access and also control the distribution of privileges among users.
5. There are four interfaces provided by Oracle:
• SQL *PLUS (available in all versions of Oracle)
• SQL *PLUS command line interface (available in all versions of Oracle)
• SQLPlus Worksheet (introduced in ORACLE 8i)
• iSQL *PLUS(introduced in ORACLE 9i)
6. To provide an alternate name to a column or expression aliases are used that
appears as column heading in the result set.
7. The outer join can be used on only one side of the join condition. Using the outer
join on both sides will cause an error to be issued. Since Oracle will issue an error,
then another way to get an outer join is to use UNION.
8. As defined in relational algebra, the INTERSECT is the intersection of two
relations. It returns only those rows returned by both component queries. It is the
logical products of two sets.
9. Nested queries are queries within a query. These are subqueries. It is something
like nesting of condition statements, one within the other.
10. Aggregation is the process of consolidating multiple values into a single value, as
for example, data can be collected on a daily basis and aggregated into a value for
Self-Instructional Material 127
Structured Query Language the week, the weekly data can be aggregated into a value for the month, and so
on.
11. Rather than not having any information at all, it is better to record that information
which you know. Leaving a field blank is not good as it can be interpreted as an
NOTES empty string that may be a valid value for some domains. A value is needed that
denotes unknown and cannot be confused with valid domain values. It is here that
null value is used. It is a special value that is different from any other value from
any attribute domain.
12. Complex integrity constraints is a user-defined rule for a column or set of columns
that allows inserts, updates or deletes of a row, based on the value it contains for
the column or set of columns.
13. A trigger is a special kind of PL/SQL anonymous block. You can define triggers
to fire before or after SQL statements, either on a statement level or for each row
that is affected. You can also define INSTEAD OF triggers or system triggers
(triggers on DATABASE and SCHEMA).
Triggers are PL/SQL functions that are executed based on some specific
activity or condition in the form. These activities, called events, are the foundation
of Oracle Forms’ user extensibility feature. While many trigger events perform a
particular default function, a trigger can be written to disable, modify or enhance
the default processing capabilities associated with the event. Triggers can be
defined at the form, block record or individual field level. In addition, a trigger
may cause another trigger to ‘fire’ as well as execute another related program
unit.
A trigger is a PL/SQL block structure which is fired when DML statements
like Insert, Delete and Update are executed on a database table. A trigger is
triggered automatically when an associated DML statement is executed.

3.15 QUESTIONS AND EXERCISES

Short-Answer Questions
1. What is SQL?
2. What are the advantages of SQL?
3. Define the terms, DDL and DML. What are the differences between them?
4. What is the difference between procedural and non-procedural DML?
5. Write a note on SQL in Oracle.
6. What is the use of the SELECT statement?
7. How is a DUAL table created in Oracle?
8. What are nested queries?
9. How is it possible to select null values? Give examples.
10. What is embedded SQL?
Long-Answer Questions
1. What are data types in SQL? Explain with examples.
2. What are the basic queries in SQL?
128 Self-Instructional Material
3. Explain the use of UNION in SQL, giving examples. Structured Query Language

4. Explain INTERSECT with examples.


5. What are nested queries? Discuss.
6. Explain aggregate operators with examples. NOTES
7. What do you understand by null values?
8. Explain complex integrity constraints in SQL.
9. Explain triggers, giving examples.
10. What do you understand by embedded SQL? Explain.

3.16 LEARNING OUTCOMES


• Understand structured query language
• Learn how basic queries are made in SQL
• Understand union, intersect and except
• Nested queries and aggregate operators
• Null values
• Complex integrity constraints in SQL
• Understand triggers and views in SQL
• Embedded SQL and cursors

Self-Instructional Material 129


Relational Database Design

UNIT 4 RELATIONAL DATABASE


DESIGN NOTES
Structure
4.0 Introduction
4.1 Unit Objectives
4.2 Introduction to Schema Refinement
4.3 Functional Dependencies (FD)
4.3.1 Dependency Types
4.4 Properties of Decomposition
4.5 Normal Forms (NF)
4.5.1 First Normal Form
4.5.2 Second Normal Form
4.5.3 Third Normal Form
4.5.4 Boyce-Codd Normal Form
4.5.5 Fourth Normal Form
4.5.6 Fifth Normal Form
4.6 Summary
4.7 Key Terms
4.8 Answers to ‘Check Your Progress’
4.9 Questions and Exercises
4.10 Learning Outcomes

4.0 INTRODUCTION
In this unit, you will learn about relational database design. Database design is very
important because database structure is the foundation for the development of any
successful database application. Problems arise with data anomalies when a poor
design leads to uncontrolled data redundancies. In this context, you will be introduced
to schema refinement. The latter is basically the description of the database design.
It is an analytical study of a specific DBMS with reference to Boyce-Codd Normal
Form (BCNF) relation and anomalies. This unit will discuss functional dependencies.
A functional dependency (FD) is derived from a mathematical theory. It implies the
dependency of values of a single attribute or collection of attributes on another
attribute or collection of attributes.
In this unit, you will also study the properties of decomposition. Finally, you
will learn about normal forms (NFs). These are a set of rules for organizing data to
ensure that it meets specific requirements of data normalization. Normalization is a
database design technique that leads to improved database quality, and is based on
the mathematical theory of relations. A set of relations that is normalized facilitates
easy management and retrieval of data in a database system. Furthermore, if database
relations are not in the normalized form, then an RDBMS cannot be utilized properly.

4.1 UNIT OBJECTIVES


After going through this unit, you will be able to:
• Understand schema refinement
• Understand the meaning of functional dependencies
• Understand the properties of decomposition
• Understand different normal forms
Self-Instructional Material 131
Relational Database Design
4.2 INTRODUCTION TO SCHEMA REFINEMENT
Schema is basically the description of the database that is specified during database
NOTES design. Figure 4.1 shows the schema diagram of the employee relation.

Emp_Code Emp_Name Designation Department


Fig. 4.1 Schema Diagram

The Schema refinement is an analytical study of a specific DBMS with


reference to the Boyce-Codd Normal Form (BCNF) relation and anomalies. If a
relation is in BCNF, it is free of redundancies that can be detected using functional
dependencies (FDs). Therefore, trying to ensure that all relations are in BCNF is a
good heuristic. But if a relation is not in BCNF, it is decomposed into a collection of
BCNF relations. The following requirements and criteria are considered about this
approach:
• Check whether all FDs are preserved.
• Decompositions that do not guarantee the lossless join property have to
be avoided.
• Decompositions should be carried out and or re-examined while
maintaining performance requirements.
• Decompositions that do not reduce redundancy should be avoided.
Schema refinement is the process within database reverse that attempts to
recover all or at least most implicit constructs, such as data structures with integrity
constraints. Redundancy is at the root of several problems associated with relational
schemas that deal with redundant storage, insert, delete and update anomalies. It
also filters integrity constraints especially in functional dependencies. Schema
refinement is used to identify schemas with such problems and to suggest refinements.
The main schema refinement technique implemented with DBMS is known as
decomposition, in which replacing ABCD attributes with say, attribute AB and
attribute BCD or attribute ACD and attribute ABD is done. The schema refinement
techniques use decomposition judiciously. They always consider the following
queries:
• Is there any reason to decompose a relation?
• What problems (if any) does the decomposition cause?

4.3 FUNCTIONAL DEPENDENCIES (FD)


A database is a set of information that is related and it is, therefore, inevitable that
some fields or attributes in the database would depend on some other information;
for example, at any instance of time in a database, an employee code would always
determine an employee of an organization. A person may change his/her name or
his/her residence address but at any moment in time, an employee can be identified
with his/her employee code. The information is either single-valued or multi-valued.
Information, such as employee code, his/her name, or his/her date of birth are single-
valued facts; qualifications of an employee or telephone numbers of an employee,
etc., are multi-valued facts. The single-valued facts formalize the concept of
functional dependency and the multi-valued facts define the concept of multi-valued
132 Self-Instructional Material
dependency. The E–R model provides a starting point for identifying schemas and Relational Database Design
integrity constraints. Functional dependencies offer approaches for refining the
modelling by analysing integrity constraints. Actually, not all schemas are
equal. Some are superior, some are inferior and some are better than others. The
functional dependency theory provides the technique of recognizing defective NOTES
schemas (those that possess particular anomalies), and changing them to high-quality
schemas. This is the fundamental theory of relational database design. Other database
models have no counterparts that makes the relational model powerful: It gives a
firm mechanism for database design.
A functional dependency (FD) is derived from mathematical theory, and it
implies the dependency of values of a single attribute or a collection of attributes on
another attribute or a collection of attributes.
Functional dependencies depend on the information of what can be stored in
the relation and serve as integrity constraints that should be validated when adding
or updating tuples in a relation. A relation state r of R that satisfies the functional
dependency constraint is called a legal relation state (or legal extension) of R.
Definition: Let A1, A2... Ak, and B be the attributes of a relation R. One can
say that B is functionally dependent on {A1, A2... Ak} if and only if two tuples, t and
u, agree on A1, A2... Ak, they must also agree on B.
That is, if t.A1 = u.A1
t.A2 = u.A2
.
.
.
t.Ak = u.Ak
then, t.B = u.B
More formally, you can define it as follows:
Consider a relation scheme R (A1, A2,..., An), and X and Y subsets of {A1,
A2,... An}. Y is functionally dependent on X, denoted by X Y, if and only if for
every relation instance r of R, for every pair of two tuples t and u in r, t[X]=u[X]
implies t[Y]=u[Y] where X R and Y R.
This is represented as follows:

X Y
R

S
If t and u agree Then they must
here agree here

An FD tells about any two tuples t and u in relation R.


It can be concluded that if X and Y are attributes or sets of attributes of
relation r(R), then Y is functionally dependent on X, if each value of X determines
exactly one value of Y, i.e., the values in tuples corresponding to the attributes in X
Self-Instructional Material 133
Relational Database Design uniquely determine the values corresponding to the attributes in Y. Diagrammatically,
an FD can be represented as follows:
• The single arrow denotes ‘functional dependency’.
• X Y can also be read as ‘X determines Y’.
NOTES
• The |= denotes ‘logical implication’.
Determinant of an FD: An attribute is a determinant if it occurs on the left-
hand side of a functional dependency statement (Determinant Dependent
attribute). That is, a determinant is an attribute on which another attribute is
functionally dependent.
It can be understood easily with the help of the EMPLOYEE table that follows.
An employee code uniquely determines an employee, denoted by
ECODE ENAME; salary is also functionally dependent on employee code,
denoted by ECODE SALARY.
ECODE ENAME SALARY
E01 Koushik Ghosh 12000
E02 Goutam Dey 8000
E03 Jayanta Das 9000
E04 Pinaki Bose 15000
E05 Rajib Halder 10000

Two employees can have the same name or salary but they might not be the
same person. To differentiate them, they should have different employee codes.
This reflects in FDs ECODE ENAME and ECODE SALARY. Here, ECODEis
defined as the candidate key.
4.3.1 Dependency Types
• Full functional dependency: X Y is a full functional dependency if Y is
not dependent on any proper subset of X, and Y is said to be fully functionally
dependent upon X. That is, in X Y where X is the minimal set of attributes
that uniquely determines Y. If we remove an attribute from X, it no longer
determines Y.
• Partial dependency: X Y is a partial dependency if Y is dependent on a
proper subset of X and Y is said to be partially dependent upon X. X Y
where X is not the minimal set of attributes that uniquely determines Y. Some
attributes could be removed from X, and the dependency would still hold.
Partial dependencies must be decomposed into separate relations. It is partial
if removal of any attribute from X does not result in the violation of the rule.
• Trivial functional dependency: Trivial functional dependencies are those
FDs that are impossible not to satisfy. A dependency is trivial, if and only if
the right-hand side (the consequent) is a subset (not necessarily a proper
subset) of the left-hand side (the determinant). That is, an FD A , A , ..., A
1 2 n
B1, B2, ..., Bm is said to be a trivial FD if and only if {B1, B2, ..., Bm} – {A1, A2,
..., A } = or, in other words, {B , B , ..., B } {A , A , ..., A }.
n 1 2 m 1 2 n

Some of the examples of trivial FD include the following:


ABC A, ABC BC, BC BC
ECODE, ENAME, DT_JN ENAME
134 Self-Instructional Material
The FD A1, A2... An B ,1 B ... B m is equivalent to A1, A2... An C ,1 C ... C Relational Database Design
2 2 k
if Cs are all those Bs that are not in As.

C
NOTES
A B

u
If t and u They must agree in B
agree on A
So surely they
agree in C

Therefore, a functional dependency A1, A2... An B ,1 B ...2


B mis
• TRIVIAL if the B’s are a subset of A’s.
• NON-TRIVIAL if at least one of the B’s is not among A’s.
• COMPLETELY TRIVIAL if none of the B’s is one of the A’s.
Those attributes that appear on the left can always be removed from the right-
hand side of a functional dependency.
Some observations/comments on FD
• When two rows in a relation have the same values for X, these rows will have
the same values for Y.
• When two rows in a table have the same values for Y, these rows may have
different values for X.
• A functional dependency is a property of the relation schema (intension).
They must hold on all relation states (extensions) of R. However, an FD is
not of a particular relation state/instance. A certain relation case for a relation
schema R cannot be considered to infer what FDs hold for R. The sole means
to find the FDs that hold for R is to strictly consider what attributes of R
mean. FDs are not provable; therefore, they should make it compulsory by a
DBMS as a constraint.
• Let X and Y be subsets of the attributes {A A … A }.
1, 2 n

•X Y indicates that there is a 1:M relationship between the set of


attributes Y and the set X.
• X Y and Y X indicate that there is a 1:1 relationship between the
set Y and the set X.
• If Y = {C, D}, then X Y is equivalent to the two functional
dependencies X C. X D. Thus, X CD is equivalent to X C and
X D. Concatenation of sets of attributes denotes the union of the sets,
i.e., XY stands for X U Y.
• An FD changes with its real-world meaning.
• X and Y may be composite.

Self-Instructional Material 135


Relational Database Design • X and Y could sometimes be mutually dependent on each other; e.g., Husband
Wife, Wife Husband.
• The same Y value may occur in multiple tuples.
Let F be a collection of functional dependencies that are stated on a relation
NOTES
schema R, and X and Y: subsets of R. F rationally means a FD X Y, indicated by
F |= X Y, if every relation occurrence of R that satisfies all FDs in F also satisfies
X Y. The closure of F, denoted by F+, is the set of all FDs that are rationally
indicated by F. In other words, it is a collection of all functional dependencies that
may be rationally obtained from F.
To determine F+, Armstrong proposed a complete set of rules in 1974 for
deriving all possible functional dependencies that are implied by F. The rules are as
follows:
Reflexivity rule
Y X |= X Y.
It can also be stated as X à X, i.e., each subset of X is functionally dependent
on X. It is the formal statement of trivial dependencies.
Augmentation rule
X Y |= XZ YZ.
One can conclude that if a dependency is able to hold, then it can easily
increase its left-hand side.
It should be noted here that the notation XZ is used to denote the collection of
all attributes in X and Z and write XZ rather than the more conventional (X, Z) for
convenience.
Transitivity rule
{X Y, Y Z} |= X Z
It is the ‘most powerful’ inference rule that is useful in multi-step derivations.
These rules are called Armstrong’s Axioms. Each of these rules can be proved
from the definition of the functional dependency.
These rules are said to be complete in the sense that given a set of FDs , all
FDs implied by F can be deduced from the given set using these rules.
These set of inference rules are said to be sound in the sense that no additional
FDs other than complete set of implied FDs be derived. .
It should be possible to simplify an existing set of dependencies to derive a
result. Let us illustrate with the aid of an example:
Consider a relation R(X,Y, Z,W,V,U)
Given FDs are as follows:
1. XY V
2. YV U
3. V Z
4. ZU W
A derivation results XY ZW in the following way:-
a. XY V (Given: 1)
b. XY XY (By reflexivity)
136 Self-Instructional Material
c. XY Y (From decomposition of b) Relational Database Design

d. XY YV (By union of a and c)


e. YV U (Given: 2)
f. XY U (By transitivity from d and e)
NOTES
g. V Z (Given: 3)
h. XY Z (By transitivity from a and g)
i. XY ZU (Applying union: f and h)
j. ZU W (Given: 4)
k. XY W (By transitivity: i and j)
l. XY ZW (By union: h and k )
Key revisited
The actual data stored in relations or collected for entry to the database may violate
a stated data constraint—an FD.
If this happens, you have to decide whether the FD is incorrect and you need
to change the relational model (and the table structures) or the data constraint is to
be enforced by the database processing system.
Functional dependencies (FD) are types of constraints that are based on keys.
Let R be a relation schema represented by its set of attributes. A subset k of R, k
R, for all pairs, t and t in tuples r such that t t then t [K] t [K] or no two rows
1 2 1 2 1 2
(tuples) have the same value in the attribute(s) K, which is the key. In other words,
if there are two attributes X and Y, that are legal in relation schema R , X Y
implies all pairs of two tuples ta and tb such that if t a [X]= tb[X then t a [Y]= tb[Y].
This allows us to state that k is a super key of R if k implies R, i.e., k R.
Therefore, a set of one or more attributes {A1, A2, ……An} is a key for a
relation R if:
• Those attributes functionally determine all other attributes of the relation.
That is, it is impossible for two distinct tuples of R to agree on all of {A1,
A2, ……An}.
• No proper subset of {A , A , ……A } functionally determines all other
1 2 n
attributes of R, i.e., a key must be minimal.
Super key is a ‘superset of keys’. It should be noted that should every super
key satisfy the first condition but it is not mandatory to satisfy the second condition,
i.e., minimality.

4.4 PROPERTIES OF DECOMPOSITION


Let the relation R contains attributes of A1 to An. A decomposition of R consists of
replacing R by two or more relations, such as:
• Every new relation scheme contains a subset of the attributes of R and no
attributes that do not appear in R. Check Your Progress
• Every attribute of R appears as an attribute of one of the new relations. 1. What is schema
refinement?
Basically, decomposing R means, you can store the relation schemas produced
by decomposition instead of instances of R. 2. What is functional
dependency?
3. What is a determinant?

Self-Instructional Material 137


Relational Database Design Take an example about a set of attributes, i.e., column names, such as S, N, L,
R, W, H (SNLRWH) of a given database can be decomposed into SNLRH and RW.
Table 4.1 shows the database containing attributes SNLRWH.
Table 4.1 Containing Attributes SNLRWH
NOTES
S N L R W H
123-13-222 XXX 23 8 10 40
132-66-909 YYY 48 7 13 25
123-88-987 CCC 76 8 10 25
541-88-934 TTT 29 8 10 40
A tuple of the database is prepared according to the information about the
hourly wages as given in Table 4.2:
Table 4.2 Tuple (Hourly Wages )

R W
8 10
7 13
The properties of decomposition can be explained as follows:
• It is used only when needed. For example, SNLRWH has functional
dependencies, such as S SNLRWH and R W.
• It takes into consideration that information that is stored consists of
SNLRWH tuples if projection of these tuples are also stored.
• It considers dependencies that require decomposed relations.
• It considers relation of R with functional dependencies, for example, if
X Y violates BCNF decompose R into R % Y and XY.
• It is an easier way to check for dependency preserving decomposition.
• It ensures that updates to the database do not result in illegal relations.
• It also checks whether joins are computed to determine functional
dependencies. The main property of decomposition is to check updates
without having to compute natural joins.

4.5 NORMAL FORMS (NF)


Normal forms are a set of rules that organizes data to ensure that it meets specific
requirements of data normalization. The normal forms (NFs) of relational database
theory provide the criteria for determining a table’s degree of vulnerability to logical
inconsistencies and anomalies. The higher the normal form applicable to a table,
the less vulnerable it is to inconsistencies and anomalies. Each table has a ‘Highest
Normal Form’ (HNF). The HNF is defined as a table that always meets the
requirements of its HNF and of all normal forms lower than its HNF; by definition,
a table fails to meet the requirements of any normal form higher than its HNF. The
normal forms are applicable to individual tables; to say that an entire database is in
normal form, n is to say that all of its tables are in normal form n. Newcomers to
database design sometimes suppose that normalization proceeds in an iterative
fashion, i.e., a 1NF design is first normalized to 2NF, then to 3NF, and so on. This is
not an accurate description of how normalization typically works. A sensibly designed
138 Self-Instructional Material
table is likely to be in 3NF on the first attempt. Furthermore, if it is 3NF, it is likely Relational Database Design
to have an HNF of 5NF. Achieving the ‘higher’ normal forms above 3NF does not
usually require an extra effort on the part of the designer. This is because 3NF tables
usually need no modification to meet the requirements of these higher normal forms.
The main normal forms are summarized as follows: NOTES
Table 4.3 Summary of the Main Normal Form

Normal Form Defined by Brief definition


First normal form (1NF) Two versions: E.F. Codd Table faithfully represents a relation
(1970), C.J. Date (2003) and has no ‘repeating groups’
Second normal form (2NF) E.F. Codd (1971) No non-prime attribute in the table is
functionally dependent on a part
(proper subset) of a candidate key
Third normal form (3NF) E.F. Codd (1971); see Every non-prime attribute is non
also Carlo Zaniolo’s transitively dependent on every key
equivalent but differently of the table
-expressed definition (1982)
Boyce-Codd normal Raymond F. Boyce and Every non-trivial functional
form (BCNF) E.F. Codd (1974) dependency in the table is a
dependency on a super key
Fourth normal form (4NF) Ronald Fagin (1977) Every non-trivial multivalued
dependency in the table is a
dependency on a super key
Fifth normal form (5NF) Ronald Fagin (1979) Every non-trivial join dependency in
the table is implied by the super keys
of the table
Domain/key normal Ronald Fagin (1981) Every constraint on the table is a
form (DKNF) logical consequence of the table’s
domain constraints and key
constraints
Sixth normal form (6NF) Chris Date, Hugh Table features no non-trivial join
Darwen, and Nikos dependencies at all (with reference to
Lorentzos (2002) generalized join operator)

4.5.1 First Normal Form


The aim of the first normal form (1NF) is to make the structure of a relation simple
by making sure that it does not have data aggregates or repeating groups. This
means that none of the attribute values can contain a set of values.
It specifies that the domains of attributes are inclusive of only atomic (simple,
indivisible) values and that any attribute value in a tuple must be an individual
value from the domain of that attribute.
Definition: A relation is in the first normal form (1NF) only if all the basic
domains simply hold atomic values.
Normalization procedure:
The following table is 1NF.
A B C
x 1 {a, b}
z 2 c
Self-Instructional Material 139
Relational Database Design Transformation to 1NF:
A B
x 1
z 2
NOTES
Method 1: By splitting the relation into new relations
B C
1 a
1 b
2 c

Method 2: By adding new tuples in the same relation


A B C
x 1 a
x 1 b
z 2 c

Method 3: By adding new attributes with NULL values


A B C C’
x 1 a b
z 2 c NULL

Illustration 1: In Table 4.4 each employee has more than one telephone number.
Table 4.4 Employee with Multiple Telephone Numbers

EMP
ECODE ENAME TEL_NO
E01 JAYANTA 24185462
DUTTA 24181450
E02 KOUSHIK 24148618
GHOSH 24731961
E03 SOURAV 25551189
BASAK 24725782

This is thus not in 1NF. This can be achieved by ensuring that every tuple
defines a single entity by containing only atomic values. There are two ways of
doing it.
(a) One can reorganize into one relation as shown in Table 4.5.
Table 4.5

EMP
ECODE ENAME TEL_NO
E01 JAYANTA DUTTA 24185462
E01 JAYANTA DUTTA 24181450
E02 KOUSHIK GHOSH 24148618
E02 KOUSHIK GHOSH 24731961
E03 SOURAV BASAK 25551189
E03 SOURAV BASAK 24725782

140 Self-Instructional Material


(b) One can also split into multiple relations as shown in Table 4.6. Relational Database Design

Table 4.6

EMP
ECODE ENAME NOTES
E01 JAYANTA
DUTTA
E02 KOUSHIK
GHOSH
E03 SOURAV
BASAK
EMP_TEL
ECODE TEL_NO
E01 24185462
E01 24181450
E02 24148618
E02 24731961
E03 25551189
E03 24725782

4.5.2 Second Normal Form


Second normal form is based on the generic idea of fully functional dependency. If
each non-prime attribute in relation is dependent functionally on the candidate key
of the relation, a relation schema is said to be in 2NF. Therefore, if each non-prime
attribute in relation is not partly dependent on any key of the relation, then the
relation schema is in 2NF.
Definition: Each non-key attribute is fully dependent functionally on the total
candidate key, and a relation is in (2NF) if and only if it is in 1NF.
Normalization procedure: Test left-hand side attributes as a component of
the main key. If an attribute is dependant on parts of a multi-valued key, move it to
a different table.
Suppose, there is a relation R

A B C D

1 1 c1 d1

1 2 c2 d2

2 1 c3 d1

2 2 c4 d2

where the composite attribute (A, B) is the main key. Assume that the following
functional dependencies exist:
(A, B) C
i.e., complete functional dependency on the composite keys (A, B).
B D
i.e., part functional dependency on the composite keys (A, B).

Self-Instructional Material 141


Relational Database Design The steps are:
1. A new relation R2 from R is created due to the functional dependency
B D, R2 contains B and D as attributes. Thus, B, the determinant,
becomes the key of R2.
NOTES 2. Minimize the original relation R by eliminating the attribute on the right-
hand side of B D from it. The minimized relation R1 thus has all the
original attributes, however, D is absent.
3. Steps 1 and 2 are repeated if multiple functional dependencies stop the
relation from converting to a 2NF.
If one relation contains similar determinant with the other relation, the
dependent attributes of the relation to be non-key attributes is placed in the other
relation for whom the determinant is a key.

B D R2 A B C

1 d1 1 1 c1

2 d2 1 2 c2

1 d1 2 1 c3

2 d2 2 2 c4

Thus, ‘A relation R is in 2NF if it is in 1NF and every non-key attribute is


fully functionally dependent on the primary key.’
Reduction of 1NF to 2NF is represented pictorially in Figure 4.2:

A B C

1NF but Not in 2NF.

A B

B C

Fig. 4.2 1NF to 2NF Decomposition

4.5.3 Third Normal Form


Consider a relation R (A, B, C) where the given FDs are A B and B C. Then an
instance of this relation can be as follows:
A B C
a1 b1 C1
a2 b1 C1
a3 b2 C2

142 Self-Instructional Material


Check for the redundancies in the relation. As the dependency BàC holds, Relational Database Design
you know from the first tuple when B is b1, C is c1. Therefore, you can say that this
information b1, c1 is repeated in the second tuple. These repetitions had to be made
to provide information a1, b1 and a2, b1.
Check out that this relation is in 2NF and both B and C are fully functionally NOTES
dependent on the candidate key A.
Normalization procedure: If you decompose the table in such a way that no
non-candidate keys become functionally dependent on another non-candidate key,
then the redundancy is removed. So, here you decompose the relation as R 1 (A, B)
and R2 (B, C). Now the given instance will become
A B
a1 b1
a2 b1
a3 b2

with A as the candidate key and


B C
b1 c1
b2 c2

with B as the candidate key


Therefore, the redundancy is no longer present and we say that the relation is
in 3NF.
Illustration 2: Consider the following EMP relation:
Table 4.7 EMP Relation

EMPNO NAME DESG SALARY DEPTNO DEPTNAME

ADMINISTRATIO
E01 KOUSHIK GHOSH MANAGER 25000 D01
N

E05 PINAKI BOSE PROGRAMMER 15000 D03 PROJECT

E06 GOUTAM DEY FINANCE OFFICER 10000 D02 PERSONNEL

E07 RAJIB HALDER SYSTEM ANALYST 20000 D03 PROJECT

Consider again the operations that you may want to do on the data.
Update
Can you change the employees servicing in department ‘D03’, i.e., in PROJECT
department? In this department, there are many employees belonging to the
department ‘D03’ (e.g., RAJIB HALDER and PINAKI BOSE). Thus, you must
make certain that all tuples are updated, or there will be problems with database
inconsistency.
Insert
Suppose, you want to add a new employee in the EMP relation with EMPNO as
its primary key. The new tuple must also include correct information about the
department where he will be working. If not, the data will become inconsistent.
Self-Instructional Material 143
Relational Database Design Table 4.8 EMP Relation with EMPNO

EMPNO NAME DESG SALARY DEPTNO DEPTNAME

NOTES E01 KOUSHIK GHOSH MANAGER 25000 D01 ADMINISTRATION

E05 PINAKI BOSE PROGRAMMER 15000 D03 PROJECT

E06 GOUTAM DEY FINANCE OFFICER 10000 D02 PERSONNEL

SYSTEM
E07 RAJIB HALDER 20000 D03 PROJECT
ANALYST

EO8 JAYANTA DUTTA PROGRAMMER 12000 NULL NULL

If you want to add a new department and no employees are assigned, then as
you do not have any information about the employees, you will have to put NULLs
as their attributes including the primary key, which is absolutely not permitted.
Table 4.9 NULLs as Attributes

EMPNO NAME DESG SALARY DEPTNO DEPTNAME

KOUSHIK ADMINISTRAT
E01 MANAGER 25000 D01
GHOSH ION

E05 PINAKI BOSE PROGRAMMER 15000 D03 PROJECT

FINANCE
E06 GOUTAM DEY 10000 D02 PERSONNEL
OFFICER

SYSTEM
E07 RAJIB HALDER 20000 D03 PROJECT
ANALYST

NULL NULL NULL NULL D01 SALES

Delete
What happens if you delete the employee data from the relation? In this case,
information about the department is also lost, which must be avoided. For example,
if you want to delete information of an employee whose employee code is E06 in
the EMP instance as shown in Table 4.9, then you would lose information concerning
the department D02.
Table 4.10 EMP after deleting the Tuple of EMP E06

EMPNO NAME DESG SALARY DEPTNO DEPTNAME

E01 KOUSHIK GHOSH MANAGER 25000 D01 ADMINISTRATION

E05 PINAKI BOSE PROGRAMMER 15000 D03 PROJECT

E07 RAJIB HALDER SYSTEM ANALYST 20000 D03 PROJECT

144 Self-Instructional Material


Instance of EMP after deleting the tuple of employee E06 Relational Database Design

Table 4.9 makes it clear that in spite of a relation existing in 2NF, problems can still
occur and they should be removed. They need to be normalized further, i.e., there is
a need for a third normal form to remove these anomalies. If you examine the
functional dependencies of Table 4.10 the presence of ‘transitive’ dependencies are NOTES
evident.
Here EMPNO DEPTNAME is only a transitive or an indirect dependency.
It is treated as indirect as EMPNO DEPTNO and DEPTNO DEPTNAME
and thus EMPNO DEPTNAME.
4.5.4 Boyce-Codd Normal Form
Consider a relation R (A, B, C) where the given FDs are AB C, C B. Then an
instance of this relation can be shown in Table 4.11.
Table 4.11

A B C
a1 b2 c1
a4 b2 c1
a1 b3 c2
a3 b3 c3

Check for redundancies in the relation. As the dependency C B holds, and


you know from the first tuple that when C is c1, B is b2. Therefore, it can be said that
this information (c1, b2) is repeated in the second tuple. These repetitions had to be
made to provide information a1, b2, c1 and a4, b2, c1.
You must check that this relation is in 3NF and as C is the only candidate key
in the relation, there is no chance that a non-candidate key is dependant on another
non-candidate key.
Normalization procedure
If you decompose the table such that no part of the candidate key becomes
functionally dependent on a non-candidate key, then the redundancy is removed.
Therefore, you decompose the relation as R1 (A, C) and, R2 (B, C). Now, the given
instance will become as shown in Table 4.12.
Table 4.12

A C
a1 c1
a4 c1
a1 c2
a3 c3

with AC as the candidate key and


B C
b2 c1
b3 c2
b3 c3

with C as the candidate key.


Self-Instructional Material 145
Relational Database Design Therefore, the redundancy is no longer present and it is said that the relation
is in BCNF.
A question may arise in your mind—What is the problem with the redundancy?
The total size of the database does not decrease after the removal of the redundancy.
NOTES On the contrary, it increases. Actually, this question is answered right at the beginning
of the discussion on DBMS. The problem with the redundancy is that it may lead to
inconsistency. You can very easily change one occurrence of an instance and forget
changing the other occurrence.
Checking for process of a relation in BCNF
For a relation R to be in BCNF, one of the following conditions should hold for all
dependencies: A B in a relation where A and B are a set of attributes within the
relation.
• The dependency A B is trivial.
• A is a super key of the relation R.
Now, if none of these conditions holds for at least one of the dependencies
A B in R, we say that R is not in BCNF.
Illustration 3:
When a relation has more than one candidate key, anomalies may result though the
relation is in 3 NF. Anomalies occur when a table is unsuccessful to possess the
property that each determinant is a candidate key. This leads to the Boyce–Codd
Normal Form (BCNF). Table 4.13 cites the failure to possess this property.
Table 4.13

ECODE PID P_LEADER


E01 P01 MG
E02 P02 JDR
E03 P03 SS
E04 P02 JDR
E05 P01 MG

Where
ECODE = Employee Code
PID = Project Identification Number
P_LEADER = Project Leader
The only assumption is that a project leader manages every project. This is in
3NF, because it has no partial functional dependencies and no transitive
dependencies. It does not have the much needed property that each determinant be
a candidate key. Name the determinants in R. A pair of attributes is determinant,
ECODE and PID. Every distinctive pair of values of ECODE and PID establishes a
value for the attribute which is unique, P_LEADER. Another determinant is the
pair, ECODE and P_LEADER which establishes values of the attribute that are
unique PID. Still another determinant is the attribute, P_LEADER, for each different
value of P_LEADER determines a unique value of the attribute PID. These
observations about the relation R correspond to the real-world facts that each
employee has a single project leader (P_LEADER) for each of his or her project,
and just one project leader (P_LEADER) manages each project.
146 Self-Instructional Material
These three determinants need to be examined whether they are candidate Relational Database Design
keys or not. The response is that the pair, ECODE and PID, is a candidate key, since
each pair identifies a row in R uniquely. Similarly, the pair, ECODE and P_LEADER,
is a candidate key, because the value MG appears in two rows of the P_LEADER
column, the determinant, P_LEADER, is not a candidate key. Therefore, the relation NOTES
R is unsuccessful in fulfilling the condition that every determinant in it is a candidate
key.
Even though R is in 3NF, there are still anomalies in it.
INSERTION: If you want to add a new project with a project leader, you
cannot add until you have an employee assigned to that project.
DELETION: If you delete employee E03, You lose all information that SS
manages the project P03.
UPDATE: If AC replaces MG as project leader of project P01, you have to
update multiple rows.
The problem occurs because there is a determinant that is not a candidate
key. Now, formulate the whole thing discussed.
R(ECODE, PID, P_LEADER)
FDs: ECODE, PID P_LEADER
P_LEADER PID.
By decomposition we get
R1(ECODE,PID)
R2(P_LEADER,PID)
ECODE PID P_LEADER PID
E01 P01 MG P01
E02 P02 JDR P02
R1 R2
E03 P03 SS P03
E04 P02
E05 P01

4.5.5 Fourth Normal Form


The fourth normal form has a relation that possesses information about an individual
entity. If a relation has multiple one multi-valued attributes, it should be decomposed
so that problems can be eliminated with the multi-valued facts.
Intuitively R is in 4NF if all dependencies are a result of keys. When multi-
valued dependencies are present, a relation should not have multiple independent
multi-valued attributes. The decomposition of a relation to attain 4NF would lead
to minimized redundancies and anomalies can be avoided.
Definition: A relation R is in the fourth normal form (4NF) only if a multi-
valued dependency exists in the R, say A B; then all attributes of R are dependent
functionally on A.
A relation schema R is in 4NF with respect to a set F of functional and multi-
valued dependencies if for every non-trivial multi-valued dependency X Y in
F*, X is a key for R.

Self-Instructional Material 147


Relational Database Design Alternative definition:
A relation schema R is in 4NF with respect to a set of functional and multi-valued
dependencies if for every multi-valued dependency X Y in F*, at least one of
the following holds:
NOTES
• Y is a subset of X (trivial MVD)
• R=XUY (trivial MVD)
• X is a super key for R
Practical rule: ‘Isolate Independent Multiple Relationships’, i.e., no table should
contain two or more 1:N or N:M relationships that are not directly related.
4.5.6 Fifth Normal Form
The discussion on the normal forms in the earlier pages needed the relation R, if not
in the given normal form, to be decomposed in two relations to achieve the
prerequisites of the normal form. In some situations, a relation can have issues like
unnecessary information and anomalies are updated because of it, however
decomposition into two relations cannot take place in order to eliminate the issues.
In such situations, decomposition of the relation can be possible in multiple relations
using the 5NF. It must be noted that these situation take place very infrequently and
are not easy to detect practically.
The first three normal forms are based on the concept of functional
dependency; the fourth one is based on multi-valued dependency. The fifth normal
form deals with join dependencies, which is a generalization of the MVD. The
purpose of fifth normal form is possessing relations that cannot be decomposed any
longer. A relation in 5NF cannot be built from smaller relations.
Adding an instance to a table that is not in 5NF creates spurious results when
the tables are decomposed and then rejoined. For example, suppose that you have
an employee who uses DBMS skills on one project and C/C++ programming skills
on another. This information is shown in Table 4.14.

Table 4.14

Ecode Prj_id Skill

1211 11 DBMS
1211 28 C/C++

Next, you add an employee (1544) who uses C/C++ programming skills on
Project 11 as shown in Table 4.15.
Table 4.15

Ecode Prj_id Skill

1211 11 DBMS
1211 28 C/C++
1544 11 C/C++

148 Self-Instructional Material


Next, you project this information into three tables R1 (Ecode, Skill), R2 (Ecode, Relational Database Design
Prj_id) and R3 (Prj_id,Skill). However, when you rejoin the tables, the recombined
table contains spurious results as shown in Table 4.16.
Table 4.16
NOTES
Ecode Prj_id Skill Results

1211 11 DBMS
1211 11 C/C++ <<—Spurious data
1211 28 C/C++
1544 11 DBMS <<—Spurious data
1544 11 C/C++

If relations schema R specifies join dependency (JD) on it, this imposes a


constraint on instances of R. This constraint specifies that all legal instances of R
must have a lossless join decomposition into sub relations of R that reunite to make
the complete relation R.
Now you can define join dependency which is as follows:
A relation R satisfies join dependency (R1, R2, R3……Rn) if and only if R is
equivalent to the join of (R1, R2, R3……Rn) where Ri are subsets of the collection
of attributes of R.
A relation schema R is in the fifth normal form (5NF) (or project-join normal
form (PJNF)) with respect to a set F of functional, multi-valued and join dependencies
if for all non-trivial join dependencies JD (R1, R2, …, Rn) in F (implied by F), every
Ri is a super key of R.
Formally, a relation R is in 5NF—also called project-join normal form
(PJNF)—if and only if every join dependency in R is a consequence of the candidate
key of R.
Therefore, a relation R is in 5NF (or project-join normal form, PJNF) if for
all join dependencies at least one of the following holds:
• (R , R , R ……R ) is a trivial join-dependency (that is, one of R is R)
1 2 3 n i
• Every Ri is a candidate key for R.
An example of 5NF is provided by Table 4.17 that deals with departments,
subjects and students.
Table 4.17

Dept Subject_Code Student_Name

CSE CS101 AMAL DAS

MCA MCA101 BIJON BOSE

CSE CS201 JAYANTA DATTA

CSE CS301 HARINANDAN TUNGA

ITE ITE101 RAJIB HALDER

ECE EC201 KOUSHIK GHOSH

Self-Instructional Material 149


Relational Database Design Table 4.17 states that computer science offers subjects like CS101, CS201and
CS301, which are taken by a variety of students. No student takes all the subjects,
no subject has all students enrolled in it and, therefore, all the three fields are needed
to represent the information.
NOTES This relation does not show MVDs since the attributes subject and student
are not independent; they are related to each other and the pairings have significant
information in them. The relation cannot, therefore, be decomposed in two relations.
(Dept, Subject_Code), and
(Dept, Student_Name)
without loosing some important information. The relation can, however, be
decomposed in the following three relations:
(Dept, Subject_Code)
(Dept, Student_Name)
(Subject, Student_Name)
and now, it can also be shown that this decomposition is lossless.

4.6 SUMMARY
In this unit, you have learned about relational database design. Database design is
essentially a business problem, not a data problem. The aim of a relational database
design is to produce a set of relation schemas that represent the real-world situation
that is being modelled. The design must also allow storage of information without
unnecessary redundancy as well as retrieval of information efficiently. You have
thus learned about schema refinement. This unit has also familiarized you with
functional dependencies. Functional dependencies depend on knowledge of what
can be stored in the relation and serve as integrity constraints that should be validated
when adding or updating tuples in a relation. A relation state r of R that satisfies the
functional dependency constraints is called a legal relation state (or legal extension)
of R. You have further studied about the properties of decomposition—the process
of splitting a relation into two or more relations.
Finally, you have learned about normal forms. A technique called
normalization is used to analyse the given relational schema to achieve the desirable
properties of minimizing redundancy as well as inconsistency, besides minimizing
the insertion, deletion and update anomalies. Normalization is a bottom-up approach
to database design that begins by examining the relationships between attributes.
The process of normalization is a formal method that identifies relational schemas
based upon their primary or candidate keys and the functional dependencies that
exist among their attributes. Normalization is often performed as a series of tests on
a relational schema to determine whether it satisfies or violates the requirements of
a given normal form. To sum up, there are seven normal forms and each one was
created to deal with a specific type of problem.

4.7 KEY TERMS


• Schema refinement: An analytical study of a specific DBMS with reference
to BCNF relation and anomalies.
150 Self-Instructional Material
• Functional dependency: Dependency of values of a single attribute or a Relational Database Design
collection of attributes on another attribute or collection of attributes.
• Decomposition: The process of splitting a relation into two or more relations.
• Database: A collection of related information.
NOTES
• Normalization: The process of decomposing large, inefficiently structured
relation schemas (tables) into smaller, more efficiently structured relation
schemas (tables) that possess desirable properties without losing any data in
the process.

4.8 ANSWERS TO ‘CHECK YOUR PROGRESS’


1. Schema refinement is the process within a database that attempts to recover
all or at the most implicit constructs, such as data structures with integrity
constraints.
2. A functional dependency is derived from a mathematical theory, and it implies
the dependency of values of one attribute or a set of attributes on those of
another attribute or a set of attributes.
Functional dependencies depend on knowledge of what can be stored
in a relation and serve as integrity constraints that should be validated when
adding or updating tuples in a relation. A relation state r of R that satisfies the
functional dependency constraints is called a legal relation state (or legal
extension) of R.
3. An attribute is a determinant if it occurs on the left-hand side of a functional
dependency statement (Determinant Dependent attribute), i.e., a
determinant is an attribute on which another attribute is functionally
dependent.
4. Decomposition implies splitting a relation into two or more relations. It should
be noted that decomposition of relations must ensure that the original relation
may be reconstructed from the decomposed relations. Careless decomposition
of a relation can result in loss of information.
5. The process of normalization is a formal method that identifies relational
schemas based upon their primary or candidate keys and the functional
dependencies that exist among their attributes.
6. Normalization technique is used to analyse the given relational scheme to
achieve the desirable properties of minimizing redundancy as well as
inconsistency besides minimizing the insertion, deletion and update anomalies.
This is done by decomposing the original relation into a certain smaller relation
schemes, each of which is in an appropriate normal form, which is a structural
form that satisfies certain rules or restrictions.
7. The accepted way to eliminate anomalies is to decompose relations.
8. The purpose of the first normal form (1NF) is to simplify the structure of a
relation by ensuring that it does not contain data aggregates or repeating
groups. By this, no attribute value can have a set of values.
9. The second normal form is based on the concept of fully functional
dependency. A relation schema is in 2NF if every non-prime attribute in relation
is fully functionally dependent on the candidate key of the relation. It can
Self-Instructional Material 151
Relational Database Design also be said that a relation schema is in 2NF if every non-prime attribute in
relation is not partially dependent on any key of the relation.
10. The most common type of multi-valued dependency is the trivial type. It
does not have anything to do with the fourth normal form except that one or
NOTES more of the resulting tables from the process contains trivial multi-valued
dependencies.
11. A relation R is in the fourth normal form (4NF) if and only if, whenever there
exists a multi-valued dependency in the R, say A B, then all attributes of R
are also functionally dependent on A.
12. The aim of the fifth normal form is to have relations that cannot be decomposed
further. A relation in 5NF cannot be constructed from several smaller relations.

4.9 QUESTIONS AND EXERCISES

Short-Answer Questions
1. What do you understand by schema refinement?
2. What is full functional dependency?
3. What is partial dependency? Explain with an example.
4. What is transitive dependency? Explain with an example.
5. What is trivial dependency? Explain with an example.
6. What are the main rules for normalization?
7. What is an insertion anomaly?
8. What is a deletion anomaly?
9. What is a modification anomaly?
10. Write a note on normal forms.
Long-Answer Questions
1. Discuss the role of functional dependency in DBMS. Give examples.
2. Discuss the different types of functional dependencies. Give examples.
3. What is a key? Explain the different types of keys with examples.
4. Given a relation with three attributes, how many candidate keys can it have
at any given time? At least how many candidate keys does it have? State
reasons in support of your answer.
5. Explain the difference between a primary key and a candidate key.
6. What is meant by attribute preservation condition on decomposition?
7. What are Armstrong’s axioms? Explain each with example.
8. Give the basic rules for 1NF and 2NF considering only primary keys. Define
any term that you feel necessary for these definitions.
9. Define 2NF. Give an example of a relation in 1NF but not in 2NF. Transform
the relation into relations in 2NF.
10. Define 3NF. Give an example of a relation in 2NF but not in 3NF. Transform
the relation into relations in 3NF.
152 Self-Instructional Material
11. How does the definition for BCNF differ from the general definition for 3NF? Relational Database Design

12. Why is BCNF much strict normal form than 3NF? Explain.
13. What is a multi-valued dependency? What types of constraint does it specify?
When does it arise? NOTES
14. Define BCNF. Give an example of a relation in 3NF but not in BCNF.
Transform the relation into relations in BCNF.
15. A university wishes to keep track of student records. They construct a universal
relation describing students and the courses students are enrolled in. Distinct
teachers teach distinct sections of courses even in the same term, and students
may take courses multiple times. They may be enrolled in at most one section
of any one course in any one term. Building and room BUILDING and ROOM
describe where a section of a course is taught.
STUDENT_ENROLLMENT_HISTORY
STUDENT_ID TEACHER_IDROOM_CAPACITY
STUDENT_LASTNAME TEACHER_NAME
COURSE_ENROLLMENT
STUDENT_FIRSTNAME COURSE_DESCRIPTION
SECTION_ENROLLMENT
COURSE_NUMBER BUILDING_NAME STUDENT_MAJOR_1
SECTION_NUMBER BUILDING_ADDRESS
STUDENT_MAJOR_2
TERM_COURSE_TAKEN ROOM_NUMBER
STUDENT_MAJOR_3
COURSE_MARK
(a) Normalize the STUDENT_ENROLLMENT_HISTORY relation and
present your normalized solution as a schematic diagram.
(b) Show primary keys in your normalized design.
(c) Show relationships between foreign keys and primary keys appropriately.

4.10 LEARNING OUTCOMES


• Understand schema refinement
• Understand the meaning of functional dependencies
• Understand the properties of decomposition
• Understand different normal forms

Self-Instructional Material 153


Transaction Management,

UNIT 5 TRANSACTION Concurrency Control and


Distributed System

MANAGEMENT,
NOTES
CONCURRENCY CONTROL
AND DISTRIBUTED SYSTEM
Structure
5.0 Introduction
5.1 Unit Objectives
5.2 The ACID Properties
5.3 Transactions and Schedules
5.3.1 Transaction
5.3.2 Transaction Processing Steps
5.3.3 Interleaved vs Simultaneous vs Serial Transaction
5.3.4 Transaction Action
5.3.5 Transaction States and Additional Operations
5.4 Concurrent Execution of Transactions
5.4.1 Motivation for Concurrent Execution of Transactions
5.4.2 Serializability
5.4.3 Anomalies due to Interleaved Execution
5.5 Schedules Involving Aborted Transactions
5.6 Lock-Based Concurrency Control
5.6.1 Binary Locks
5.6.2 Shared or Exclusive Locks
5.6.3 Two-Phase Locking: Basic and Strict
5.7 Introduction to Crash Recovery
5.8 Concurrency Control
5.8.1 The Lost Update Problem
5.8.2 Uncommitted Dependency: The Dirty Read Problem
5.8.3 Unrepeatable Read or Inconsistent Retrieval Problem
5.8.4 Phantom Reads
5.8.5 Recoverability
5.9 Introduction to Locking Management
5.9.1 Types of Locks
5.10 Dealing with Deadlock
5.10.1 Deadlock
5.10.2 Deadlock Prevention Protocol
5.10.3 Deadlock Detection
5.10.4 Starvation
5.11 Introduction to ARIES
5.12 Introduction to Distributed Databases
5.12.1 Distributed Database
5.12.2 Distributed Data Transparency Features
5.12.3 Advantages and Disadvantages of Distributed Databases
5.12.4 Catalogue Management
5.12.5 Update Propagation
5.12.6 Recovery Control
5.12.7 Concurrency Control
5.13 Transaction Types
5.13.1 Implicit Transactions

Self-Instructional Material 155


Transaction Management, 5.13.2 Explicit Transactions
Concurrency Control and 5.13.3 Local and Global Transactions
Distributed System
5.14 Distributed DBMS Architectures
5.14.1 Homogeneous Distributed Database Systems
5.14.2 Heterogeneous Distributed Database Systems
NOTES
5.15 Design of Distributed Databases
5.15.1 Levels of Data and Processing
5.15.2 Design
5.15.3 Replication
5.15.4 Characteristics of DBMS Replication
5.15.5 Fragmentation
5.16 Summary
5.17 Key Terms
5.18 Answers to ‘Check Your Progress’
5.19 Questions and Exercises
5.20 Learning Outcomes

5.0 INTRODUCTION
In this unit, you will learn about transaction management, concurrency control and
distributed system. These are three closely-related functions that are designed to
ensure that a database is reliable and remains in a consistent state. In this regard,
you will learn about the four fundamental properties that every transaction should
possess. It is abbreviated as the ACID properties. You will then study about
transactions and schedules. A transaction is a logical unit of work in a DBMS that
includes one or more database access operations. This unit will examine the
concurrent execution of transactions. Most DBMSs allow concurrent access to the
database. If these concurrent operations are not controlled, the various accesses
may interfere with one another and the database could arrive at an inconsistent
state. To prevent this from happening, the DBMS implements a concurrency control
protocol that prevents concurrent accesses from interfering with one another.
You will also learn about schedules involving aborted transactions. A schedule
is a sequence of operations performed by a set of concurrent transactions that prevents
the order of the operations in each of the individual transactions. In lock-based
concurrency control, you will know that in a concurrency approach, a lock is a
variable that is lined with a data item. This defines the status of the database operation
that can be performed on it.
You will also learn about crash recovery and concurrency control. The
reliability and consistency of a database must be maintained even in the event of
failures in both hardware and software components, and when multiple users are
accessing the database. Both concurrency control and recovery are required to protect
the database from data inconsistencies and loss of data. Recovery is the process of
restoring the database to a correct state following any failure that may have occurred
in either the hardware or software or both. Malicious corruption or destruction is
another problem associated with the reliability and consistency of a database. While
recovery from such an event certainly must be handled by the DBMS, its prevention
is a DBMS security issue.
You will also learn about locking management and the methods of dealing
with deadlock. While locking is the most popular concurrency control mechanism
implemented by many important techniques, there are various deadlock prevention
protocols that provide rules that you can use to avoid the problem of deadlock. You
156 Self-Instructional Material
will be introduced to ARIES and distributed databases. ARIES is a Web-based Transaction Management,
Concurrency Control and
technology and serves as an Internet gateway, providing security to an Internet Distributed System
connection by acting as a firewall. A distributed database is a collection of multiple
interrelated databases spread across a computer network.
You will also study about transaction types and distributed DBMS NOTES
architectures. In DBMS, there are many types of transactions with reference to
concurrency control—known as implicit, explicit, local and global transactions.
The data and software in a DDB system is spread over multiple sites. In the end,
you will learn about the design of distributed databases. DBMS allows the data
used in different applications and software to be processed at different levels.

5.1 UNIT OBJECTIVES


After going through this unit, you will be able to:
• Understand the ACID properties
• Describe transactions and schedules
• Explain the concurrent execution of transactions
• Discuss schedules involving aborted transactions
• Understand lock-based concurrency control
• Describe concurrency control
• Understand locking management
• Know how to deal with deadlocks
• Understand the concept of ARIES
• Explain distributed databases
• Analyse transaction types
• Understand distributed DBMS architectures
• Describe the design of distributed databases

5.2 THE ACID PROPERTIES


Every transaction possesses four fundamental properties abbreviated as ACID
properties.
1. Atomicity: Being an atomic unit of work, a transaction can be either completed
successfully or rolled back as a whole.
2. Consistency: The transformation of a database from one consistent state to
another by the transaction is guaranteed.
3. Isolation: The transaction performs as though it is the only transaction in
operation.
4. Durability: After the successful completion of a transaction, it is guaranteed
that the transaction will leave permanent effects even if there is a system
failure.
1. Atomicity implies that each transaction is considered as an indivisible
logical unit of work. An ideal example is a bank transaction. Here, the
Self-Instructional Material 157
Transaction Management, account of the debtor is debited and the account of the creditor is credited.
Concurrency Control and
Distributed System Both these updates should take place to complete the whole transaction
properly. The two operations are wrapped up into one single unit of work.
Hence, either both credit and debit occur or none of them occur at all.
NOTES 2. It is the database programmer’s duty to preserve the consistency of the
database before and after a transaction. The consistent state of a database
implies a state that satisfies the restrictions/limitations that the scheme
specifies. It is the responsibility of the database programmer to write the
database programs in such a way that a transaction is only allowed to
enter a database when it is in a consistent state; and to leave only when it
has reached another consistent state. This, of course means that there is
no interference from any other transaction in the action of the concerned
transaction.
3. Now come to the next concept, i.e., isolation. Isolation means that every
transaction should carry on its job without being dominated by other
transactions that may be operating on the same database. To implement
isolation we should take care of the fact that no transaction makes its
partial updates available to other transactions. Though this eliminates the
temporary update problem it is not sufficient to solve several other
problems.
The isolation property can be ensured by guaranteeing that although
many transactions take place in an interleaved manner, the overall effect
is the same as execution of all transactions one after the other in a sequence.
If two transactions, for instance, T1 and T2 are simultaneously executed,
the net effect will be equivalent to the execution of T 1 followed by T2 or
execution of T2 followed by T1.
The DBMS provides no guarantees about which of these orders is
effectively chosen. If each transaction transforms the database from one
consistent state to another consistent state, executing several transactions
one after the other (on a consistent initial database instance) will also
cause a consistent final database instance.
Transactions can be incomplete due to the following three reasons:
(i) A transaction can be aborted or terminated unsuccessfully by the
DBMS because of some anomaly arising during execution. If a
transaction is aborted by the DBMS for some internal reason, it is
automatically restarted and executed anew.
(ii) The system may crash (e.g., due to interrupted power supply) while
one or more transactions are in progress.
(iii) A transaction may encounter an unexpected situation (for example,
read an unexpected data value or be unable to access some disk)
and decide to abort (i.e., terminate itself).
Of course, since users think of transactions as being atomic, a transaction
that is interrupted in the middle may leave the database in an inconsistent
state. Thus, a DBMS must find a way to remove the effects of partial
transactions from the database, that is, it must ensure transaction atomicity.
Either all the actions of a transaction are carried out, or none of them are
carried out. A DBMS ensures transaction atomicity by undoing the actions
of incomplete transactions. This means that users can ignore incomplete
158 Self-Instructional Material
transactions keeping in mind how the database is modified by transactions Transaction Management,
Concurrency Control and
over time. To be able to do this, the DBMS maintains a record, called the Distributed System
log, of all writes to the database. The log is also used to ensure durability.
In case a system happens to crash before the modifications made by a
completed transaction are written to the disk, the log is employed, to not NOTES
just remember but also restore the changes, on restarting.
4. Durability guarantees that once a transaction is successfully committed,
the modifications made by the transaction are made permanent. Even in
case of system or hardware failure, the system ensures that the data in the
database is correct and the changes made by a committed transaction are
still available after recovery from failure.
Consistency and isolation are guaranteed by concurrency control
whereas atomicity and durability are guaranteed by recovery.
Now, how can these desirable properties of transactions be enforced?
During the implementation and designing of the transaction, the concept
of atomicity is taken care of. On the other hand, if a transaction fails
before completion of the task assigned, the recovery software is able to
undo the partial effects of the transactions onto the database.

5.3 TRANSACTIONS AND SCHEDULES


Transaction management is the process of controlling the execution of transactions.
5.3.1 Transaction
A transaction is a logical unit of work in a DBMS that includes one or more database
access operations. A transaction may involve an arbitrary number of operations on
the database that brings about changes in the state of the database while preserving
system consistency. It may be specified using SQL in an interactive way or it may
be embedded or contained in an application program. In a database environment,
an application program is generally considered as a series of transactions with non-
database processing taking place in between the operations on the database.
Credit card approvals, hotel check-ins, registrations (for courses), billing,
phone calls and ATM transactions are some examples.
Definition: It is a unit of work involving read, write or control operations on
database elements that must be completed as a whole or not at all.
Let Read(X) and Write(X) denote read and write operations on a set of database
items X = {A, B, C,…..}.
Then a sequence of actions on X is given by
S = { [Read(X)]*; [Write (X)]*}
A transaction is a sequence followed by COMMIT or a ROLLBACK
T = {S*, {ROLLBACK | COMMIT}}
COMMIT operation signals the successful end of a transaction. It informs
that the transaction has been successfully completed and the database is in a consistent
state and all the updates are permanent.
ROLLBACK operation, in contrast, signals the unsuccessful end-operation
of a transaction.
Self-Instructional Material 159
Transaction Management, 5.3.2 Transaction Processing Steps
Concurrency Control and
Distributed System
It is necessary to understand the steps followed during a transaction execution for
planning and implementing the custom applications. Also, the database administrator
NOTES should have a clear idea of the steps as it is helpful in understanding and tuning the
database parameters and processes. It should be very clear to the readers that the
discussion is only valid for the normal transactions. Other transactions—such as
discreet and distributed— are treated in a different manner. These transactions are
not part of the discussion. The steps in processing are as follows:
1. Entering the DDL/DML statements
2. Assigning the ROLLBACK segment
3. Optimizing the statement
4. Generating the execution plan by the optimizer
5. Manipulating/returning data following the execution plan
6. Looping the above steps from 1 to 5
5.3.3 Interleaved vs Simultaneous vs Serial Transaction
Interleaved transactions take place when two or more transactions are processed
concurrently with only one transaction at a time making progress toward completion.
This typically occurs in a single CPU environment using multi-programming
techniques. Simultaneous transactions take place when two or more transactions
are processed in parallel. The number of CPUs online determines the number of
transactions progressing. Serial transactions may occur in either a single CPU or
multiple CPU environments. They occur when a transaction is executed from start
to finish before any other transaction may begin execution. The following illustrations
should clarify the differences in the three basic types of transactions:

T1 T2 T3 T2 T1

t0 t1 t2 t3 t4 t5 time
Interleaved transactions (one CPU)

T2

T1

T3

t0 t1 time

Simultaneous transactions (three CPUs shown)

160 Self-Instructional Material


Transaction Management,
Concurrency Control and
Distributed System
T3 T1 T2
t0 t1 t2 t3 time
NOTES
Serial transactions (unknown number of CPUs)

5.3.4 Transaction Action


Any transaction can access the database either by reading or writing a database
item.
The main area of concern with write operations is when to reflect the changes
made by a transaction into the database. In many cases, the write-back protocol is
handled by the recovery portion of the DBMS possibly working in conjunction with
the OS (operating system). In general, there are two basic approaches to handling
the write-back operation—the pessimistic approach and the optimistic approach.
The pessimistic approach basically forces the write-back to occur immediately after
the transaction performs the write. The problems with this approach are as follows:
(i) If the transaction uses the same database item again, the buffer will
need to be reloaded from the disk memory system.
(ii) If the transaction ultimately aborts, the write operation must be undone.
Alternately, the optimistic approach basically waits for a certain amount of
time (which depend on many factors including system loading, frequency of access
to database items, etc.) before actually performing the write-back operation. This
potentially eliminates both the problems that the pessimistic approach suffers.
However, it is not without problems of its own. For example, if too much time is
allowed to lapse before the write-back operation is performed, it is very likely that
transactions waiting to access the item will be unnecessarily blocked if the writing
transaction never uses the same item again.
The current trend in DBMS favours the optimistic approach coupled with
interaction with the OS in order to optimize the time delay before write-back occurs.
This approach tends to delay waiting transactions for a minimum amount of time
and at the same time eliminates the problems of the pessimistic approach.
A transaction may have only one of the two outcomes. It can either successfully
complete execution and be committed enabling the database to assume a new
consistent state, or it may fail to successfully complete execution (for any number
of reasons) and get aborted. If the transaction is aborted, the database must be
returned to the last consistent state which existed before the aborted transaction
began execution. The aborted transaction is said to be rolled-back or undone.
A transaction which has been committed cannot be aborted. If a committed
transaction is subsequently found to be a mistake, then another transaction called a
compensating transaction must be executed to reverse the effects of the transaction
that was committed in error.
An aborted transaction can be restarted later, and depending upon the cause
of the failure, may successfully execute to completion and commit at that time.

Self-Instructional Material 161


Transaction Management, 5.3.5 Transaction States and Additional Operations
Concurrency Control and
Distributed System
If a transaction does not fail, it must complete successfully. If a transaction fails to
complete successfully it is said to be aborted. Changes or modifications performed
NOTES by an aborted transaction must be cancelled. Changes made by the successful
transactions must be made permanent.
In a simple transaction model, the transaction takes place in any one of the
following ways:
(i) Active: This is the initial state of the transaction where it stays during
execution.
(ii) Partially committed: A transaction stays in this state after it has executed
the final statement.
(iii) Failed: This is a state reached by the transaction when it cannot continue
normal execution anymore.
(iv) Terminated: The terminal state is reached once the transaction has
been aborted or rolled back and the database has been restored to the
state it was in before the beginning of the transaction.
(v) Committed: A transaction reaches this state after the completion of
successful execution.
The most fundamental of all database operations are the read and write
operations. But these two operations are not sufficient. A system needs some
additional operations too. Suppose, for instance, there is a system crash. To recover
from that situation, the system needs to maintain a full record about the beginning,
termination and abortion of the transactions. For example:
• BEGIN TRANSACTION: This denotes the beginning of an execution
process.
• END TRANSACTION: This denotes completion of execution of a
transaction.
• COMMIT: This indicates that the modifications or changes made by the
successful transactions are to be incorporated permanently onto the
database, and it will never be undone in future.
• ROLLBACK: This indicates that the modifications (if any) made by
any unsuccessful transaction needs to be undone.
In a real multiprogramming environment, some systems even keep track of
the present status of transactions running in the system. Sometimes, the system
even tries to predict the next possibilities for the transaction to proceed and also
says how to roll it back in case of failure. This is diagramatically represented by a
state transition diagram in Figure 5.1.

Terminate Terminated
Failed

Abort Terminate
Rollback
Committed

Begin Active End


Partially
Transaction Transaction committed Commit

Read/Write

162 Self-Instructional Material Fig. 5.1 State Transition Diagram of a Transaction


The arrows indicate the change of transaction from one state to another. Transaction Management,
Concurrency Control and
Immediately following the start of execution, a transaction is said to be in an active Distributed System
state. In this state, the transaction performs read and write operations. The protocols
in this state also ensure that no inconsistent entries into the database occur as a
result of system failure. After this, the system enters the committed state wherein a NOTES
transaction automatically moves on to the TERMINATED state. However, a
transaction may fail due to many reasons before entering the committed state.
Once a transaction fails, it tries to undo the modification performed by the
write operations. After this is done, the transaction will enter the terminated state
to move out of the system. It is possible to restart a failed transaction later.
Figure 5.1 represents a state transition diagram for a transaction. In addition
to the obvious states—active, committed and failed—there are two additional states:
(i) Partially committed: This state occurs after the final statement has
been executed. At this point, it may be found that the transaction has
violated serializability (discussed later in this unit) or has violated an
integrity constraint and the transaction needs to be aborted.
Alternatively, the system may fail and any data updated by the transaction
may not have been safely written back to the disk at this point in time.
In such cases, the transaction would go into the Failed state and would
have to be aborted. If the transaction has been successful, any updates
can be safely written-back and the transaction can go to the Committed
state.
(ii) Failed: This state occurs when the transaction cannot be committed or
the transaction is aborted while in the active state. This could be because
the user aborted the transaction or possibly as a result of a concurrency
control protocol which selects the transaction to be aborted to ensure
serializability.

5.4 CONCURRENT EXECUTION OF TRANSACTIONS


Transaction-processing systems usually let many transactions run simultaneously.
This can be achieved only through extra work. We have a convenient way to describe
interleaved executions of transactions. The DBMS interleaves the actions of different
transactions to improve performance.
5.4.1 Motivation for Concurrent Execution of Transactions
Ensuring transaction isolation, while permitting such concurrent execution, is
difficult, but is essential for reasons of performance.
(i) It is possible for the CPU to process another transaction while one
particular transaction waits for a page to be read from the disk. This is
because I/O activity can be done in parallel with CPU activity in a
computer. Overlapping I/O and CPU activity reduces the amount of
time disks and processors are idle, and increases system throughput–
(the average number of transactions completed in a given time).
(ii) When a short transaction is executed with a long transaction in an
interleaved manner, the short transaction is completed fast. In serial
execution, a short transaction could get stuck behind a long transaction

Self-Instructional Material 163


Transaction Management, which results in unnecessary delays in response time or increase in the
Concurrency Control and
Distributed System average time required to finish a transaction.
5.4.2 Serializability
NOTES Suppose two transactions, T1 and T2 are to be scheduled. They can be scheduled in
a number of ways. To schedule them serially, without being bothered about
interleaving would be the most common way. In a serial schedule transaction, all
operations of transaction T2 should follow all the operations of T1 or vice versa.

T1 T2
Read(X)
X=X+N
Write(X)
Read(Y)
Y=Y+N
Write(Y)
Time Read(X)
X=X+P
Write(X)

Non-interleaved Serial Schedule A

T1 T2 T1 T2

Read(X) read(X) Read(X)

X=X+N X=X+P X=X+P


Write( X) Write(X) Write(X)
Read(Y) Read(X)
Y=Y+N
Write(Y)

Non-interleaved Serial Schedule B

Now, these can be termed as serial schedules, since all the operations of one
transaction are being followed by the entire sequence of operations of the other
transaction.
In the interleaved mode, the operations of T1 are mixed with the operations of
T2. This can be done in a number of ways. Two such sequences are given here:

164 Self-Instructional Material


T1 T2 Transaction Management,
Concurrency Control and
Distributed System

Read(X )
X=X+N NOTES
Read( X)
X=X+P
Write( X)
Read(Y)
Write(X)
Y=Y+N
Write(Y)

Interleaved non-serial schedule C

T1 T2
Read(X)
X=X+N
Write( X)
Read( X)
X=X+P
Write(X)
Read(Y)
Y=Y+N
Write(Y)

Interleaved non-serial Schedule D

If for every transaction T in a schedule S all the operations are executed


consecutively, we call it a serial schedule; otherwise we call it a non-serial schedule.
A non-interleaved schedule of independent transactions always results in a consistent
database state as the transactions commit or abort before the beginning of the next
transaction. A non-interleaved schedule is guaranteed to produce a correct result as
long as the individual transactions are free of errors. However, non-interleaved
schedules suffer from low utilization and wastage of resources. Often, a transaction
waiting for an I/O makes the subsequent transactions wait causing wastage of
resources or reduction in resource utilization. If the former transaction takes too
long to execute, the latter one keeps waiting till its completion.

Self-Instructional Material 165


Transaction Management, The problem with serial schedules is resource wastage. In case of a serial
Concurrency Control and
Distributed System schedule, if a transaction is waiting for an I/O, the subsequent transactions will
also wait causing wastage of resources. If a transaction T is extremely long, the
other transactions will have to keep waiting till T is completed. In such a schedule,
NOTES no concurrency is supported.
Therefore, in general, serial schedules are unacceptable in practice. To solve
this problem, the operations need to be interleaved. But the interleaving sequence
should be well planned. A wrong interleaving sequence may leave the schedule
incorrect and the database inconsistent. Therefore, a methodology should be to
determine which schedules produce correct results and which ones do not.
A schedule S of N transactions is said to be a serializable schedule if it is
equivalent to a serial schedule which comprises the same N transactions. N
transactions can produce N! serial schedules. If you carry on interleaving them, the
number of possible combinations becomes very large. To solve this problem, all the
non-serial schedules are divided into two disjoint groups. One comprises those
schedules that are equivalent to one or more serial schedules and the other consists
of those schedules that are not. The first category of schedules is called ‘serializable
schedules’ and the latter is called ‘non-serializable schedules’.
Conflicting actions
Conflict between two or more actions takes place if:
1. The read or write action is performed on the same data object.
2. The actions are performed by different transactions.
3. At least one of the actions performed is a write operation.
The following set of actions is conflicting:
• T1:Read(X), T2:Write(X), T3:Write(X)
While the following sets of actions are not:
• T1:Read(X), T2:Read(X), T3:Read(X)
• T1:Read(X), T2:Write(Y), T3:Read(X)
Conflict equivalence
Two schedules can be said to be conflict equivalent, if any two conflicting operations
in both the schedules are executed in the same order. If, somehow, the order of
conflicting operations in both the schedules is not the same, then the schedules
produce different database states at the end, and hence they cannot be equivalent to
each other.
Conflict-serializable schedule
A conflict-serializable schedule is one which is conflict-equivalent to other serial
schedule(s).
You can say that a schedule is conflict-serializable only if an acyclic precedence
graph or a serializability graph exists for the schedule.

166 Self-Instructional Material


T1 T2 Transaction Management,
Concurrency Control and
Read(A) Distributed System
Read(A
G= Write(B)
COMMIT NOTES
Write(A)
COMMIT

This is conflict-equivalent to the serial schedule <T1,T2>


Algorithm to check for conflict serializability of a schedule:
To test a schedule for conflict serializability, an algorithm may be suggested:
1. For each transaction Ti, participating in the schedule S, create a node labelled
T1 in the precedence graph.
2. For each case where Tj executes a Read(Y) after Ti executes write(Y), create
an edge from Ti to Tj in the precedence graph.
3. For each case where Tj executes Write(X) after Ti executes a Read(X), create
an edge from Ti to Tj in the graph.
4. For each case where Tj executes a Write(X) after Ti executes a Write(X),
create an edge from Ti to Tj in the graph.
5. The schedule S is serializable if and only if there are no cycles in the graph.
If we apply these methods to write the precedence graphs for the earlier
four cases, we get the following precedence graphs.
X

T1 T2 T1 T2

X
Schedule A Schedule B

T1 T2
T1 T2

Schedule C Schedule D
We may conclude that schedule D is equivalent to schedule A.

Self-Instructional Material 167


Transaction Management, View equivalence
Concurrency Control and
Distributed System Two schedules S1 and S2 are said to be view-equivalent when the following conditions
are satisfied:
NOTES 1. For every data item A if transaction Ti reads the initial value of A in S1, then
in S2 also transaction Ti must read the initial value of A.
2. For each data item A if transaction T1 executes a Read(A) on data item A and
if the value was produced by a Write(A) operation performed by another
transaction Ti in S1, then in S2 also the same transaction Ti should perform
Read(A) on the value produced by Write(A) operation by Ti.
3. For each data item A in S if the transaction Ti performs the final Write (A)
operation, then in S2 also the final Write (A) operation must be performed by
Ti .
View-serializable schedule
A view-serializable schedule is view-equivalent to some serial schedule. All conflict-
serializable schedules are view-serializable.
T1 T2
Read(A)
Read(A)
G=
Write(B)
COMMIT
Write(A)
COMMIT

This is an example of a schedule which is both view-serializable as well as


conflict-serializable. There are view serializable schedules which are not conflict
serializable. Such schedules contain ‘blind writes’.
T1 T2 T3
Read(A)
Write(A)
H= COMMIT
Write(A)
COMMIT
Write(A)
COMMIT

This example is not conflict-serializable, but it is view-serializable since it


has a view-equivalent serial schedule <T1, T2, T3>.
Another example:
S1 = {[T3:read(A)], [T4:write(A)], [T3:write(A)], [T6:write(A)]}
S2 = {[T3:read(A)], [T3:write(A)], [T4:write(A)], [T6:write(A)]}
Note that S1 is a concurrent schedule and S2 is a serial schedule. S1 is view
equivalent to S2 since the one Read(A) instruction reads the initial value of A in
both schedules and T6 performs the last Write(A) operation in both schedules.
Therefore, S1 is a view serializable schedule.
The only difference between conflict serializability and view serializability
is that the former satisfies ‘constrained write’ while the latter does not. This condition
states that any write operation Wi(A) in Ti is preceded by an Ri(A) is Tj and that the
168 Self-Instructional Material
value written by Wi(A) in Tj depends only on the value of A read by Ri(A). The Transaction Management,
Concurrency Control and
assumption behind the concept of constrained write is that the new value of a data Distributed System
item A is a function based on the old value of the same data item in the database. So
view serializability is less stringent than conflict serializability. View serializability
applies the concept of ‘unconstrained write assumption’ where the value of a data NOTES
item produced by a write operation performed by any transaction does not depend
upon the old value of the data item.
The only problem with view serializability is its complexity with regard to
computation. No efficient algorithm exists to do the same.
Uses of serializability
If it is said that S is correct, it implies (or is equivalent to) proving the serializability
of schedule S.
Therefore, it is guaranteed that the schedule will provide correct results.
However, being serial and being serializable are different things. A serial scheduling
may be inefficient and may cause underutilization of the CPU, I/O devices and in
some cases such as mass reservation system, becomes untenable. A serializable
schedule, on the other hand, not only has the advantages of concurrent execution
(ability to cater to numerous simultaneous users, efficient system utilization) but
also guarantees correctness.
The scheduling process is done by the operating system routines after
considering various factors, such as the priority of the process in comparison to
other processes, time of submission of transaction, system load and many other
factors. Also, since it is possible to have numerous interleaving combinations, it is
very difficult to determine, in advance, the way in which the transactions are
interleaved. That is, getting the various schedules itself is tough, leave alone testing
them for serializability.
Therefore, most DBMS protocols employ a more practical technique. Instead
of generating the schedules and checking for serializability before using them, they
apply restrictions or controls on the transactions themselves. These restrictions are
followed by each participating transaction, automatically ensuring serializability in
all the schedules that the participating schedules create.
In addition, it is not easy to determine the start of a schedule and its finish,
since transactions get submitted at different times.
Therefore, the theory of serializability can be used to tackle this problem by
considering only the committed projection C(CS) of the schedule. Therefore, as an
approximation, a schedule S can be defined as serializable if its committed C(CS)
is equivalent to some serial schedule.
5.4.3 Anomalies due to Interleaved Execution
The three anomalous situations can be described in terms of two transactions T1
and T2. The anomalies due to interleaved execution are as follows:
(i) Unrepeatable reads (RW conflicts) ‘Dirty Reads’
(ii) Reading uncommitted data (WR conflicts)
(iii) Overwriting uncommitted data (WW conflicts)

Self-Instructional Material 169


Transaction Management, Anomalies with interleaved execution follow the following mechanism:
Concurrency Control and
Distributed System • Not all interleaving of operations are okay.
• Anomaly: Two consistency-preserving committed transactions that lead to
NOTES an inconsistent state
(i) Unrepeatable reads (RW conflicts) ‘Dirty Reads’
Quite often, in database processing, we come across a situation wherein one
transaction can change a value, and a second transaction can read this value before
the original change has been committed or rolled back. This is known as a dirty
read scenario, because there is always the possibility that the first transaction may
rollback the change, resulting in the second transaction having read an invalid value.
While you can easily command a database to disallow dirty reads, this usually
degrades the performance of your application due to the increased locking overhead.
Disallowing dirty reads also leads to decreased system concurrency. The
characteristics of dirty page table are as follows:
• One entry per dirty page in buffer pool.
• It contains recLSN, i.e., the LSN of the log record which first caused the page
to be dirty.
(ii) Reading uncommitted data (WR conflicts)
Reading uncommitted data can be very interesting. Not only because you can read
data that ‘is not modified at all but not accessible because of a lock’. For example,
an eventlog table, a table that contains statistics about page requests. A side benefit
is that a select normally also creates a lock on a table for itself. That is because a
single query adheres to the ACID rules. This means that a select will lock data
while it is running. Let us say you have a log table in your database. You know you
only do inserts. Each insert will do locking. Lets say you have some SELECT queries
that will result in a table scan. You really do not want to have a table-lock while it is
running because else the application will not be able to add new rows to the table.
You know in advance that you will never read dirty data because you only do inserts.
Such queries are mostly used for reporting functionality or even normal read
operations. You will need optimistic concurrency control for that and it requires a
timestamp column to validate or a total record compare. You must be sure that the
record data read in the first place is not dirty. You cannot use the read uncommitted
type here. But the good thing about the total record compare is that you can read
uncommitted data because the complete solution relies on the data and not on a
timestamp. So, it is possible but in most environments you see timestamps because
comparing one timestamp column is cheap.
(iii) Overwriting uncommitted data (WW conflicts)
In the field of databases, write-write conflict, also known as overwriting uncommitted
data is a computational anomaly associated with interleaved execution of
transactions. Let there be no read in this schedule. The writes are called blind writes.
We have a lost update. Any attempts to make this schedule serial would give off two
different results (either T1’s version of A/B is shown, or T2’s version of A and B is
shown), and would not be the same as the previous schedule. This schedule would
not be serializable. Strict 2PL, overcomes this inconsistency by locking T1 out from
B. Unfortunately, deadlocks are something strict 2PL does not overcome all the
time.
170 Self-Instructional Material
Transaction Management,
5.5 SCHEDULES INVOLVING ABORTED Concurrency Control and
Distributed System
TRANSACTIONS
A schedule is a sequence of operations performed by a set of concurrent transactions NOTES
that preserves the order of the operations in each of the individual transactions. As
before, we are only interested in the set of operations performed by the transactions
which affect the database, so the schedule consists only of a sequence of read and/
or write operations followed by either a commit or abort operation.
Transactions may be aborted when they are unable to obtain a lock on a
required data item. Estimating the proportion of transaction that aborts is one of the
key issues in modelling a system that affects the performance measures of interest,
such as average response time and the throughput capacity of the system. If, for
some reason you do not want to commit a transaction, then call
Transaction.abort(). The definition of serializability relies on the action
of aborted transactions. Basically, the unreverable schedule is involved in aborted
transaction. Table 5.1 shows the schedule involved in aborted transaction:
Table 5.1 An Unrecoverable Schedule involved in aborted Transaction

T1 T2
R(A)
W(A)

R(A)
W(A)
R(B)
W(B)
Commit

Abort

If a transaction Ti is aborted, all its actions have not to be done. Not only that,
if Tj reads an object last written by Ti, then Tj must be aborted as well. Most systems
avoid such cascading aborts by releasing a transaction’s locks only at commit time.
If Ti writes an object, Tj can read this only after Ti commits. In order to undo the
actions of an aborted transaction, DBMS maintains a log in which every write is
recorded. This mechanism is also used to recover from system crashes: all active X
acts at the time of the crash are aborted when the system comes back up. Write-
ahead logging (WAL) is used to undo the actions of aborted transactions and to
restore the system to a consistent state after a crash.

5.6 LOCK-BASED CONCURRENCY CONTROL


In the concurrency approach, a lock is a variable which is associated with a data
item. This defines the status of corresponding to the database operation that can be
performed on it. Locks help synchronize the access of data items when concurrent
transactions are executed on the items. The following are the various types of locks,
for example, binary lock and shared or exclusive lock.

Self-Instructional Material 171


Transaction Management, 5.6.1 Binary Locks
Concurrency Control and
Distributed System
Binary locks are the locks that have two states, locked and unlocked. If the value of
a lock on a data item is set as 1, this implies that it is in the locked state and cannot
NOTES be accessed by any database access operation. The 0 value of lock on the data item
specifies that the data item is in the unlocked state and can be accessed by a database
access operation. When a transaction requests, it issues the lock_item(X) operation.
The 1 value of the operation indicates that the transaction has to wait, whereas, 0
value of the operation specifies that the transaction can lock and access the data
item. When the transaction stops accessing a data item, it issues an unlock_item(X)
operation so that other transactions can use the data item.
The database can easily implement binary lock by maintaining a record with
the following three fields:
(i) Data item name
(ii) The LOCK variable
(iii) Locking transaction
The system maintains the records of locks in the lock table. The items that
are not included in the lock table are called unlocked items. A transaction follows
the following rules if binary lock is implemented:
• A transaction should issue the lock_item(X) operation before performing
read and write operations on a data item.
• A transaction should issue the unlock_item(X) operation after performing
read and write operations on a data item.
• A transaction cannot issue the lock_item(X) operation if it already holds
a lock on a data item.
• A transaction cannot issue the unlock_item(X) operation if it does not
hold a lock on a data item.
5.6.2 Shared or Exclusive Locks
Shared or exclusive locks are the locks that allow multiple transactions to read the
same data item concurrently but only one transaction to write on a data item at a
time. Shared/exclusive locks are also called read/write locks. In shared/exclusive
locks, a transaction can issue three operations: read_lock(X), write_lock(X) and
unlock(X). A read-locked item is called shared locked and a write-locked item is
called exclusive locked. When the database system repeatedly denies the request of
a transaction for the exclusive lock of a data item, it is called livelock. Livelock
occurs when various transactions request shared locks on the same data item on
which a transaction is trying to have an exclusive lock.
To implement the shared/exclusive lock, lock table includes the following
fields:
• Data item name
• The LOCK variable
• Locking transaction
• No_of_reads
A transaction follows the following rules if the shared/exclusive lock is
implemented:

172 Self-Instructional Material


• A transaction should issue the read_lock(X) and write_lock(X) operations Transaction Management,
Concurrency Control and
before performing the read_item(X) operation on a data item. Distributed System
• A transaction should issue the write_lock(X) operation before any
write_item(X) operation is performed on a data item.
• A transaction cannot issue the read_lock(X) operation if it already holds a NOTES
shared or exclusive lock on a data item.
• A transaction should issue the unlock(X) operation after performing all
read_item(X) and write_item(X) operations on a data item.
• A transaction cannot issue the write_lock(X) operation if it already holds
a shared or exclusive lock on a data item.
Optimistic concurrency control technique is also known as validation or
certification technique that is used to control the concurrency without using the
locking data during the processing of transaction. These optimistic concurrency
control techniques are different from concurrency control techniques, which include
locking and timestamp. The concurrency control techniques like locking and
timestamp need a certain type of checking before the database operation can be
executed. The locking technique helps to determine whether the item being accessed
is locked or not. The timestamp concurrency control technique checks the read and
write timestamps of the item. These concurrency control techniques are time
consuming and provide low performance during transaction execution. These
techniques slow down the transactions and hence decrease the performance of the
database system.
In optimal concurrency control technique, no checking is done during the
execution of transaction. This technique allows transactions to execute without
synchronization. This technique is based on the assumption that most database
transactions do not conflict with other transactions. This assumption makes the
optimistic concurrency control technique flexible for executing concurrent
transactions.
This technique does not apply the updates in the transaction directly to the
database tables. The transactions in optimal control technique wait for the transaction
to complete. After the transaction is complete, the optimal concurrency control
technique actually updates the database. When the transaction is executed, all updates
are applied to local copies of the database tables, which are kept for the transaction.
The optimal concurrency control technique consists of three phases, which are:
1. Read phase: The transactions applied in the read phase can only read
values of committed data items from the database. The updates are
performed by the transaction on the local copies of the database tables.
The updated copies of database tables are kept in the transaction
workspace.
2. Validation phase: A validation phase on completion of transaction
checks for the updates performed in transaction violates serializability
which is the property of a schedule being serializable. Serialization is
related with the isolation property of the transaction and plays an
important role in concurrency control. The information needed in the
‘validation phase’ is preserved by the database management system. If
the outcome of the validation phase is not negative, it means that
serializability has not been violated which means that the transaction is
committed. Updating is done from the local copies, and it is maintained
Self-Instructional Material 173
Transaction Management, by the system during transaction processing. But if serializability is
Concurrency Control and
Distributed System violated, the transaction will be aborted and then restarted later.
3. Write phase: The write phase occurs after the validation phase. If the
validation phase is passed successfully, updates made by the transaction
NOTES during processing are applied to the database. If unsuccessful, then
updates are discarded and the transaction will be restarted.
The optimistic concurrency control technique makes the checks at one time
only in the validation phase. Therefore, the transaction execution in this technique
goes on with least overhead and this goes on until it reaches the validation phase.
The concurrent transactions that have only little interference will be validated
successfully. The concurrent transactions that have a large amount of interference
will be discarded and restarted later. This technique is called optimistic because it
assumes that there will be little interference between transactions during execution
and hence there is no need to check the transaction execution.
For example, consider the transaction Ti for validation phase. The optimistic
concurrency control protocol checks Ti to see whether it interferes with any
committed transactions or with transactions that are currently in the validation phase.
In this way, it checks that another transaction Tj, either committed or in its validation
phase, has to hold one of the following conditions:
• Transaction Tj should start write phase before Ti starts read phase.
• Ti should start write phase on completion of its write phase ofTj and the
read_set of Ti should not have items which is common with the write_set
of Tj.
• Tj has to complete the read phase before read phase is completed by Ti.
Concurrency control and locking in DBMS is a mechanism used in DBMS
when there is sharing of data. The concurrency control mechanism helps control
concurrent transactions. The locking mechanism, on the other hand, allows only
one user to change data at one time when multiple users are accessing the same
data. In the locking mechanism, granularity is the amount of data that can be locked
with a single instance or group of instances. The four types of granularity available
in DBMS are as follows:
1. Page locking: In this type of granularity, all data that appears on a specific
page is locked.
2. Cluster locking: In this type of granularity, all data that is clustered
together is locked simultaneously.
3. Table locking: In this type of granularity, all instances of a table are
locked.
4. Object locking: In this type of granularity, an object that represents a
real-world entity such as student and employee is locked.
The concurrency control and locking mechanisms help achieve the four ACID
properties (atomicity, consistency, isolation and durability) of DBMS.
5.6.3 Two-Phase Locking: Basic and Strict
In two-phase locking (2PL), a transaction requests an appropriate lock just before
performing an operation. If a transaction requests a conflicting lock, it is blocked,
awaiting the release of the conflicting lock. Otherwise, the appropriate lock is granted
to the transaction. (See Figure 5.2)
174 Self-Instructional Material
Each transaction is executed in two phases: Transaction Management,
Concurrency Control and
Phase 1: The growing phase during which the transaction acquires all the Distributed System
required locks but cannot release any lock.
Phase 2: The shrinking phase during which the transaction releases its locks NOTES
but cannot request additional locks on any data item.

Growth Processing Shrinkage


Phase Phase

Fig. 5.2 Two-Phase Locking Strategy

readlock(Y)
read(Y) Growth Phase I
writelock(X)
—————————————————
unlock(Y)
read(X) Shrinkage Phase II
X=X+Y
write(X)
unlock(X)
No downgrading lock:
According to these rules, a transaction cannot downgrade a write lock into a
read lock, because downgrading is equivalent to releasing the write lock and
subsequently requesting and acquiring a new read lock.
Upgrading a lock is accepted.
However, it is possible for a transaction to upgrade a read lock into the stronger
write lock during the growing phase.
The order in which transactions are granted locks, forces an execution ordering
on the transactions in connection with their conflicting operations. This is because
transactions are forced to wait on conflicting locks.
By preventing transactions from acquiring any lock after the release of a
lock, the 2PL protocol ensures serializability.
This can be illustrated by showing how the non-repeatable read problem is
solved using 2PL.
Examine the next example of inconsistent retrieval involving two transactions
T and S and two columns A and B. Their non-2PL and 2PL executions with time
diagrams are shown in the example. Local variables with lower case letters and

Self-Instructional Material 175


Transaction Management, with * values produced by committed transactions are denoted. In the 2PL execution,
Concurrency Control and
Distributed System the required lock requests are specified.
A non-2PL execution
Time T S A B Sum
NOTES
T1 v=read(A) 3* 5* 8*
T2 x=read(A) 3* 5* 8*
T3 v= v+3 3* 5* 8*
T4 write(A,v) 6 5* 11
T5 v= read(B) 6 5* 11
T6 v = v-3 6 5* 11
T7 write(B,v) 6 2 8
T8 commit 6* 2* 8*
T9 y=read(b) 6* 2* 8*
T10 output(x+y) ——> 5 <> 8*
T11 commit
In the non-2PL execution, there is a cyclic ordering captured by the conflicting
of read(A) by S with the write(A) of T, and the conflicting of write(B) by T with the
read(B) of S.
In the 2PL execution, by requesting and acquiring a write lock on A at the
beginning, T blocks S’s read lock request, forcing S to wait until T commits and
reads the consistent values of A and B that are produced by T. In the non-2PL, a
non-serializable execution, S reads inconsistent values for A and B. It reads B from
T, and A from another previously committed transaction.
A 2PL execution
Time T S A B Sum
T1 writelock(A)
T2 v=read(A) 3* 5* 8*
T3 readlock(A) 3* 5* 8*
T4 v= v+3 WAIT 3* 5* 8*
T5 write(A,v) WAIT 6 5* 11
T6 writelock(B) WAIT 6 5* 11
T7 v= read(B) WAIT 6 5* 11
T8 v = v-3 WAIT 6 5* 11
T9 write(B,v) WAIT 6 2 8
T10 release-locks WAIT 6 2 8
T11 commit WAIT 6* 2* 8*
T12 x=read(A) 6* 2* 8*
T13 readlock(B) 6* 2* 8*
T14 y=read(b) 6* 2* 8*
T15 output(x+y) —> 8 == 8*
T16 release-locks
T17 commit
Advantages of 2PL
• It is easy to enforce. DBMS has to keep a track of only what phase a transaction
is in.
• 2PL has been a major factor in the success of databases specially while
handling concurrent transactions.
• It does not involve the real-time testing for serializability and the generation
of precedence graphs.

176 Self-Instructional Material


Variations of 2PL Transaction Management,
Concurrency Control and
• Basic 2PL: This involves the growth and the release phase. Distributed System

• Conservative 2PL: This involves acquiring of all locks at the beginning and
releasing when done. It is inefficient because you will request more than and NOTES
longer than what is really needed.
• Strict 2PL: There is a distinct growth phase, but the shrinkage phase is abrupt.
It is used by Oracle.
There are a number of variations of 2PL. The technique just described is
known as basic 2PL. A variation known as conservative 2PL (or static 2PL) requires
a transaction to lock all the items it accesses before the transaction begins execution,
by predeclaring its read-set and write-set. As mentioned earlier, the write-set is the
set of all items that a transaction writes and the read-set of a transaction is the set of
all items that are read by the transaction. If any of the predeclared items required
cannot be locked, the transaction does not lock any item. Instead, it waits until all
the items are available for locking. Conservative 2PL is a deadlock-free protocol.
However, it is difficult to use in practice because of the need to predeclare the read-
set and write-set, which is not possible in most situations.
In practice, the most popular variation of 2PL is the strict 2PL, which
guarantees strict schedules. Here, none of the exclusive (write) locks are released
by a transaction T till it has committed or aborted. Therefore, no item written by T
can be read or written in by any other item until T commits. This leads to a strict
recoverability schedule. Strict 2PL is not deadlock-free. A more restrictive variation
of strict 2PL is the rigorous 2PL, which also guarantees strict schedules. In this
variation, a transaction T does not release any of its locks (exclusive or shared) until
after it commits or aborts, and so it is easier to implement than strict 2PL. Notice
the difference between conservative and rigorous 2PL; the former must lock all its
items before it starts so once the transaction starts, it is in the shrinking phase,
whereas the latter does not unlock any of its items until after it terminates (by
committing or aborting) so the transaction is in its expanding phase until it ends.
In many cases, the concurrency control subsystem itself is responsible for
generating the readlock and writelock requests. For example, if the system is to
enforce the strict 2PL protocol, then, whenever transaction T issues a read(X), the
system calls the readlock(X) operation on behalf of T. If the state of LOCK(X) is
write-locked by some other transaction T’, the system places T on the waiting queue
for item X; otherwise, it grants the readlock(X) request and permits the read(X)
operation of T to execute. On the other hand, if transaction T issues a writeitem(X),
the system calls the writelock(X) operation on behalf of T. If the state of LOCK(X)
is writelocked or readlocked by some transaction T', the system places T on the
waiting queue for item X; if the state of LOCK(X) is readlocked and T itself is the
only transaction holding the read lock on X, the system upgrades the lock to write
locked and permits the write_item(X) operation by T; Finally, if the state of LOCK
(X) is unlocked, the system grants the writelock(X) request and permits the write(X)
operation to execute. After each action, the system must update its lock table
appropriately.

Self-Instructional Material 177


Transaction Management,
Concurrency Control and 5.7 INTRODUCTION TO CRASH RECOVERY
Distributed System

Crash recovery mechanisms guard against the following types of crashes:


NOTES • System crashes
• Media failures
The crash recovery in DBMS follows the following mechanism:
• A log or journal records all changes before they modify the database.
• The log is assumed to survive system crashes and media failures.
• After a crash, the recovery manager follows the ARIES strategy.
Analysis of database during crash recovery analyses the changes that have
not been written and the active transactions. Redo of database during crash recovery
takes actions that were not written to the database. Undo of database during crash
recovery makes successful transactions that were not completed. Crash recovery of
database is done in the following way: (Also see Figure 5.3).
• The log is also updated during recovery to guard against repeated crashes.
• A checkpoint is periodically recorded to the log to reduce the recovery
overhead.
• A checkpoint strategy is also used to take backup copies of live databases.

Real-Time Mobile
HLR Computing FA CIM
DBMS

User Interface
Crash
Received Transaction
Manager Manager
Main Memory Manager
Backup
Manager Main Memory

Backup Database

Fig. 5.3 Crash Recovery Mechanisms

Application areas
The following are the application areas of the crash recovery:
• Real-time database management systems
• Home location register (HLR) in mobile communications
• Mobile computing
• Factory automation (FA)
• Factory control DBMS

5.8 CONCURRENCY CONTROL


Concurrency control is the process of managing simultaneous execution of
transactions without letting them interfere with one another. Concurrency control
operates under one main assumption as follows:

178 Self-Instructional Material


Given a consistent state of the database as input, an individually correct Transaction Management,
Concurrency Control and
transaction will produce a correct state of the database as output if the transaction is Distributed System
executed in isolation. The goal of concurrency control is to allow multiple
transactions to be executed concurrently, that is simultaneously, within a certain
time period with all the transactions producing a correct state of the database at the NOTES
end of their concurrent execution.
Although two transactions may be individually correct and may produce a
correct database state when executed in isolation, their concurrent (interleaved)
execution may result in an inconsistent state. Let us look at an example of a concurrent
execution that would result in an incorrect state:

T1 T2
Read(X)
X = X - 50
Read(X)
temp = X * 0.2X
= X - temp
Write (X)
Read(Y)
Write(X)
Read(Y)
Y = Y + 50
Write(Y)
Y = Y + temp
Write(Y)

The first transaction transfers Rs 50 from the account of person X to the


account of person Y. The second one transfers twenty per cent from the account of
person X to the account of person Y. In this example, the ten per cent transfer from
X is lost and the deposit of Rs 50 to Y is lost. You can prove it to yourself. The sum
of X + Y should be the same before and after both transactions. Concurrency control
can regulate the interaction among concurrent transactions to keep them from
disturbing the database consistency.
There are several classical examples of the problems concurrent execution
can cause in the correctness of a database.
5.8.1 The Lost Update Problem
When two transactions access the same database items and also perform their own
operations in a way that makes the value of some database item wrong, a lost update
problem takes place.
Suppose that there are two distinct transactions called T1 and T2 whose function
is to sell concert tickets. Considering only the fundamental database operations
that would be required by this type of transaction, namely their read and write
operations, examine the following concurrent execution of T1 and T2. (Assume that
the number of tickets, i.e., seats available, is represented by the variable n.)

Self-Instructional Material 179


Transaction Management,
Concurrency Control and
Time Action
Distributed System t0 transaction T1 performs Read(n)
t1 transaction T2 performs Read(n)
NOTES t2 transaction T1 checks n, issues ticket, performs Write(n–1)
t3 transaction T2 checks n, issues ticket, performs Write(n–1)

The lost update problem is as follows:


The update performed by T1 at time t2 is ‘lost’ due to the update performed by
T2 at time t3. The correct value of n in the database should be n–2, but it clearly is
not, so the database is incorrect.
The lost update problem can be handled in several different ways as follows:
(i) Prevent T2 from reading the value of n at time t1 on the basis that T1 has
already read the value of n and may therefore update this value. If this
is the case, then T2 would be operating with an obsolete value of n and
this will not occur if T2 is blocked.
(ii) Prevent T1 from writing the value of n–1 at time t2 on the basis that T2
has also read the value of n and would then be operating with an invalid
value of n, since clearly T2 cannot be forced to re-read the value of n.
(iii) Prevent T2 from writing the value of n–1 at time t3 on the basis that T1
has already updated the value of n and since T1 preceded T2 in terms of
execution start time (thus T1 is the ‘older’ transaction), and thus T2’s
write is based upon an obsolete value of n.
The first two solutions to the lost update problem can be implemented using
locking protocols, while the third solution can be implemented using a time-stamping
protocol.
5.8.2 Uncommitted Dependency: The Dirty Read Problem
A dirty read problem occurs when a transaction updates an item in the database and
then there is a failure in the transaction for some reason. As the transaction fails, the
updated item in the database is accessed by another transaction before it can go
back to the original value.
In other words, a transaction T1 updates a record which is read by T2. Then
T1 aborts so that T2 then has values which do not remain after T1 has rolled back.
Consider the following concurrent schedule of transactions T1 and T2.
Time Action
t0 transaction T1 performs Read(n)
t1 transaction T1 performs Write(n)
t2 transaction T2 performs Read(n)
t3 transaction T1 aborts
The dirty read problem is as follows:
When T1 aborts, it must undo any changes it has made to the database.
However, before T1 aborted, T2 read a value (at time t2) that T1 wrote (at time t1).
Thus, T2 has read a ‘dirty’ value – one that will not remain in the database after T1
is rolled-back, thus T2 must undo its read, i.e., T2 must abort.
180 Self-Instructional Material
Typically, time-stamping protocols are employed to implement solutions to Transaction Management,
Concurrency Control and
the dirty read problem as most locking protocols to eliminate this problem are too Distributed System
restrictive in the level of concurrency they allow.
5.8.3 Unrepeatable Read or Inconsistent Retrieval Problem NOTES
Suppose two users X and Y access a department’s database concurrently. User X
updates the database to provide all employees with a salary raise of 5 per cent. At the
same time, user Y wants to know the total salary that has been paid by the department.
As the two transactions are being executed simultaneously on the same
database, there is always a high probability to interfere with each other. As a result,
the sum includes certain salaries before the raise and certain salaries subsequent to
the raise. Such a sum cannot be considered as an acceptable value of the total salary
(the value before the raise or after the raise would be different).
Table 5.2 Example of Inconsistent Retrieval

X Time Y
Read(salary) of Employee 100 1 -
- 2 Sum = 0.0
Update Salary 3 -
- 4 Read(salary) of Employee 100
Write(salary) 5 -
- 6 Sum = Sum + Salary of Employee 100
Read(salary) of Employee 101 7 -
- 8 Read(salary) of Employee 101
Update Salary 9 -
- 10 Sum = Sum + Salary of Employee 101
Write(salary) of Employee 101 11 -
- 12 -
- 13 -
etc - etc

The problem illustrated in Table 5.2 is called the inconsistent retrieval anomaly.
During the execution of a transaction therefore, changes made by another transaction
that has not yet committed should not be visible since that data may not be consistent.
5.8.4 Phantom Reads
Phantom reads take place when insertion or deletion is performed against a row
belonging to a range of rows read by a transaction. The first read of the range of
rows reflects a row that does not exist in the second or successive read because of a
deletion by another transaction. In the same way, due to an insertion by another
transaction, the second or successive read of the transaction shows a row that was
non-existent in the original read.
An editor, for instance, modifies a document submitted by an author. However,
when the changes are incorporated into the main/master copy of the document by
the production department, it is discovered that the new content which is unedited
has been added by the author. Such a problem can be kept at bay if there is a
provision to check anybody from adding new material. In other words, it would
help to have a system wherein no one can add/modify a document till the editors
and the production department have completed their work on the original document.

Self-Instructional Material 181


Transaction Management, 5.8.5 Recoverability
Concurrency Control and
Distributed System
If a transaction Ti fails to commit, its effects should be undone to ensure the atomicity
property of transaction. In an environment which supports concurrency, it is essential
NOTES to ensure that any transaction Tj dependent on Ti (that is, Tj has read data written
by Ti ) is also aborted. This phenomenon is referred to as cascaded rollback.
Cascading rollback is undesirable as it results in cancellation of a significant amount
of work.
Transactions commit only after all transactions whose changes they read
commit.
T1 T2
Read(A)
Write(A)
F= Read(A)
Write(A)
COMMIT
COMMIT

T1 T2
Read(A)
Write(A)
F2= Read(A)
Write(A)
Abort
Abort

These schedules are recoverable. F is recoverable because T1 commits before


T2, that makes the value read by T2 correct. Then T2 can commit itself. In F2, if T1
aborted, T2 has to abort because the value of A read by it is incorrect. In both cases,
the database is left in a consistent state.

5.9 INTRODUCTION TO LOCKING MANAGEMENT


Locking is the most popular concurrency control mechanism implemented by many
important techniques. Lock is a variable associated with every data item to indicate
whether the data item is available for an operation to be performed on it. Lock
variable synchronizes the operations performed on these data items by the concurrent
transactions. A planned and proper implementation of lock solves many problems
due to concurrency listed earlier. However, there are some problems created by the
lock itself.
Lock granularity
Locking ensures conflict prevention by holding a lock on various parts of a database.
Sometimes, these locks are held by the DBMS; sometimes the DBA or a user himself
holds the lock explicitly. Applying the highest level of concurrency in databases
with a high level of concurrency, degrades the performance. Typical levels of locking
are given as follows:
(i) Database locking:The entire database is locked in this method during updating
by any user. This is the easiest method to be implemented but not a very
182 Self-Instructional Material
popular one. This method is applied only when the DBA is performing some Transaction Management,
Concurrency Control and
maintenance task on the database like restoring, backing up, etc. Distributed System
(ii) Table locking: Some operating systems consider tables to be equivalent to
files. If it is so, then those systems can apply file locking systems to lock the
tables of the database. This method is efficient but can lead to congestion in NOTES
busy databases.
(iii) Page locking: It is better to lock a separate page instead of locking the entire
page. In fact, a page is a disk concept. A file stored on the disk is divided and
written into physically separate sections called sectors. So, it is efficient to
lock access to a disk page or sector permitting the users to access the rest of
the table.
(iv) Row locking: This locks a database row denying any request of modification
by any transaction on the database. This technique involves high
implementation overhead.
(v) Column locking: Column locking is supported by very few databases. This
method is generally considered unacceptable because of the high overhead
involved in it.
5.9.1 Types of Locks
• Binary locks: In this method, the lock variable can have only two states.
This method is too simple, but too stringent. This is not a very good method
to be accepted.
• Shared/exclusive locks: This method is more practical and accepted in general
lock-based systems. This scheme locks the data items in two modes. Shared
or read mode and exclusive or write mode.
• Certify lock: This is used to improve the performance of locking protocols.
(i) Binary locks
As explained earlier, a binary locking scheme has two states. These states are
designated by ‘0’ and ‘1’. These two digits are used to indicate whether a data item
is locked or unlocked. For example if ‘1’ is presumed to indicate that a lock is held
then ‘0’ indicates that the lock is open. This means that if the value of the lock
associated with the data item X is ‘1’ then no operation can be performed on the
data item. It indicates that the item is locked.
The concept is that a data item cannot be modified by transactions if it is
locked. If a transaction is modifying the value of a data item it cannot be accessed
until the modification is completed. So, if a transaction wants to modify a data item
it requests a lock on it. Now, the lock request is granted only if the data item is not
already locked by some other transaction. Otherwise, it has to wait for the lock.
After the lock is acquired, the transaction performs the modification and finally
releases the lock.
The two operations lockitem(X) and unlockitem(X) are required to be
performed to implement this method. Lockitem(X) locks a data item and prevents
it from being accessed by other transactions and unlockitem(X) releases the lock on
a data item and makes it available to the other transactions. Therefore, from the
point of view of the lock variable the procedure can be described as follows:

Self-Instructional Material 183


Transaction Management, Any transaction that wants to hold a lock on a data item first checks the value
Concurrency Control and
Distributed System of the lock variable associated with it. If it finds the lock status of the variable to be
‘1’, then it is already locked. The transaction then waits. Once the lock status of the
data item becomes ‘0’ the waiting transaction can hold a lock on it to perform the
NOTES modifications. It then issues a lockitem(X) command. Once it completes its operation
with the data item, it issues unlockitem(X) and resets the value of lock variable so
that some other transaction waiting for acquiring a lock can proceed.
Notice that the binary lock essentially produces a ‘mutually exclusive’ type
of situation for the data item, so that only one transaction can access it. These
operations can be easily written as an algorithm as follows:
The locking algorithm
Lockitem(X):
Start: if Lock(X)=0 /* item is unlocked*/
Lock(X)=1 /*lock it*/

Else
{
wait(until Lock(X)=0) and the lock manager wakes up the
transaction)
go to start;
}
The unlocking algorithm
Unlock item(X):
Lock(X) ¬ 0;
{ If any transactions are waiting,
Wakeup one of the waiting transactions}
The only restriction on the use of the binary locks is that they should be
implemented as indivisible units (also called ‘critical sections’ in operating systems
terminology). That means, no interleaving operations should be allowed, once a
lock or unlock operation is started, until the operation is completed. Otherwise, if a
transaction locks a unit and gets interleaved with many other transactions, the locked
unit may become unavailable for long times to come with catastrophic results.
To make use of the binary lock schemes, every transaction should follow
certain rules:
1. The lockitem(X) command is to be issued before issuing a read(X) or
write(X).
2. The unlockitem(X) command is to be issued after completion of all
read(X) and write(X) operations on X.
3. If a transaction holds a lock on a data item X, it should not issue another
lockitem(X) command.
4. If a transaction is not currently holding a lock on a data item X, it should
not issue an unlockitem(X) command.

184 Self-Instructional Material


No other transaction Tj would be permitted to operate on a data item X between Transaction Management,
Concurrency Control and
a lockitem(X) and an unlockitem(X) issued by Ti. Therefore, between these intervals Distributed System
only Ti holds the value of the data item X. Thus, many of the above mentioned
problems are solved by this scheme.
(ii) Shared/exclusive locks NOTES
A binary lock is easy to implement, it also looks satisfactory enough but suffers
from serious difficulties. It strictly prohibits more than one transaction to access a
data item simultaneously. This scheme does not even permit more than one
transaction to perform a read operation on the same data item simultaneously. This
is where the problem lies. While one transaction is performing a write operation,
the other transactions should not be permitted to perform a write or read operation
on the same data item. However, where is the harm in allowing two read operations
to take place simultaneously? In fact, allowing simultaneous read operations on the
same data item would increase the system’s performance without causing any harm
to the database.
This concept gave rise to the idea of shared/exclusive locks. The notion of
exclusive lock is too rigid and has performance side effects. There is a need to make
the locking mechanism less stringent by introducing shared or read locks.
There are two types of shared locks:
• Write locks/Exclusive locks (X locks)
• Shared locks/Read locks (S locks)
(a) Write lock
A write lock set up on a data item, allows a transaction to read and | or modify its
value, exclusively. No other transaction can read or write to that data item while the
write lock is in effect.
If transaction A holds an X lock on record p, then transaction B requesting a
lock on the same record will be denied.
(b) Read lock
A read lock is set up by a transaction. It is non-exclusive, i.e., it can be shared
among many readers, allowing parallel reads to happen. Nobody can change the
data item while the read lock is on.
Figure 5.4 shows the lock compatibility matrix.
Current State of lock of data items

Exclusive Shared Unlocked

Requested Exclusive N N Y
Lock

Shared N Y Y

Unlock Y Y -

Fig. 5.4 Lock Compatibility Matrix

Normally, locks are implicit. A FETCH request is an implicit request for a


shared lock whereas an UPDATE request is an implicit request for an exclusive
Self-Instructional Material 185
Transaction Management, lock. Explicit lock requests need to be issued if a different kind of lock is required
Concurrency Control and
Distributed System during an operation. For example, if an X lock is to acquired before a FETCH it has
to be explicitly requested for.
We need to think of three operations, a read lock, a write lock and unlock.
NOTES The algorithms can be as follows:
Readlock(X):
Start: If Lock (X) = “unlocked”
{
Lock(X) “readlocked”;
no_of_reads(X) 1;
}
else if Lock(X) = “read locked”
no_of_reads(X) = no_of_reads(X)+1;
else {
wait until Lock(X) = “unlocked” and the lock manager
wakes up the transaction);
}
go to start;
Writelock(X):
Start : If lock(X) = “unlocked”
Lock(X) “locked”;
else {
Wait until Lock(X) = “unlocked” and the lock manager wakes up the
transaction;
}
Go to start;
Unlock(X):
If lock(X) = “writelocked”
{
Lock(X) “unlocked”;
Wakeup one of the waiting transaction, if any;
}
else if Lock(X) = “readlocked”
{
No_of_reads(X) No_of_reads –1;
if no of reads(X)=0
{
Lock(X) = “unlocked”;

186 Self-Instructional Material


Wakeup one of the waiting transactions, if Transaction Management,
Concurrency Control and
any; Distributed System
}
} NOTES
The algorithms are fairly straightforward, except that during the unlocking
operation, if a number of read locks are there, then all of them are to be unlocked
before the unit itself becomes unlocked.
To ensure smooth operation of the shared/exclusive locking system, the system
must enforce the following rules:
• Readlock(X) or writelock(X) commands must be issued before any read
or write operations are performed.
• Writelock(X) command must be issued before performing any writetr(X)
operation on the same data item.
• An unlock (X) command must be issued after completion of all readtr(X).
• A readlock(X) command must not be issued while holding readlock or
writelock on X.
• A writelock(X) command must not be issued while holding readlock or
writelock on X.
Conversion of locks
In some cases, it is desirable to allow lock conversion by relaxing the conditions (4)
and (5) of the shared/exclusive lock mechanism, that is, there should be provision
for a transaction holding one type of lock on a data item to be converted to some
other type of lock. For example, if a transaction is holding a read lock on a data
item X, it may be permitted to upgrade it to writelock. If no other transaction is
holding a lock on a data item X, it can upgrade its read lock to write lock on issuing
a writelock(X) command. Otherwise the transaction waits for the others to release
their readlocks on X. Similarly, any transaction holding a write lock on X is allowed
to downgrade its write lock to read lock. The above algorithm can be amended to
accommodate lock conversion.
It is important to note that use of binary locks does not guarantee serializability.
The reason behind this is that in some situations or combinations of them, a key
holding transaction may unlock the unit prematurely due to many reasons. One
situation could be wherein a transaction does not need a certain data unit and,
therefore, unlocks it but may be indirectly writing into it via some other unit at a
later date. This would give rise to ineffective locking performance and loss of
seralizability. Such serializability can be guaranteed by implementing two-phase
locking.
(iii) Certify locks
This lock updates to the database. This is related to the committed stage. A certify
lock is not compatible with any one of the read lock. The multi-version 2PL uses
certify locks. A transaction must acquire a certify lock on all items that it currently
holds write locks on before it can commit. This scheme lock avoids cascading aborts.
The idea behind multi-version 2PL is to allow other T' to read X while T holds a
write lock on X. The two versions are, committed version and newly created version
X'. Once T is ready to commit, it must obtain a certify lock on all items that it
currently holds write locks on before it can commit, as mentioned earlier. The certify
Self-Instructional Material 187
Transaction Management, lock is not compatible with read locks, so the transactions may delay its commit
Concurrency Control and
Distributed System until all its write-locked items are released by any reading transactions. At this
point, the committed version X of the data item is set to the value of version X’ and
the certify locks are then released.
NOTES
Table 5.3 Requested Lock
Lock Requested
Lock Already Set Read Write Certify
Read Granted Granted Blocked
Write Granted Granted Granted
Certify Blocked Granted Blocked

Tables 5.3 and 5.4 show requested lock and the locking rules with reference
to read, write and certify locks respectively based on priority.
Table 5.4 Locking Rules as per Priority Inversion
Lock requested by a high
Lock already set by a low priority transaction
priority transaction
Read Write Certify
Read Granted L-Aborted Can not Occur*
Write Granted/Blocked# Granted Granted
Certify Conversion Granted Conversion

In Table 5.4:
• L-Aborted means Low Priority Transaction Aborted, whereas conversion
means Low Priority Transaction is converted to write lock.
• * - It means if the transaction is already aborted, there is no reading lock set.
• # - This symbol depends on the implementation.
Optimistic Locking and Pessimistic Locking
Optimistic locking: The data is locked only when data is being saved. The lock is
released after data is saved.
Pessimistic locking: The data is locked when editing begins. The lock is released
after the data is saved or discarded.
Optimistic locking is only for solving physical I/O conflict. It cannot be used
to handle the concurrency mentioned above. In other words, it cannot provide a
solution the update loss problem and uncommitted dependency problem.

5.10 DEALING WITH DEADLOCK


The type of locking implemented in a system can cause different problems in the
execution of transactions. The various problems associated with locking are:
• Deadlock
• Starvation
5.10.1 Deadlock
The problem of deadlock is like chicken and egg problem and occurs when each
transaction included in a set of two or more transactions is waiting for the release of
data item that is locked by the other transaction in the set of transaction process.
188 Self-Instructional Material
Consider two transactions, T1 and T2. Figure 5.5 shows the deadlock between the Transaction Management,
Concurrency Control and
two transactions. Distributed System

NOTES

Fig. 5.5 Deadlock between T1 and T2

5.10.2 Deadlock Prevention Protocol


There are various deadlock prevention protocols that provide some rules that you
can use to avoid the problem of deadlock. One of the deadlock prevention protocols
states that a transaction should check the availability of all the required data items
before starting the execution. If the data items are not available, the transaction
should wait for a specific period of time and then check the availability of data
items again.
The other deadlock prevention protocols decide what to do when a deadlock
occurs in transactions. According to these protocols, the transactions are either
aborted or made to wait for the data items. These protocols use the timestamp
concurrency control technique and provide the following schemas:
• Wait-die: It specifies that when the time stamp of a transaction, T, is
lower than the timestamp of the transaction T', then T has to wait, else T
is aborted and restarted.
• Wound-wait: It specifies that when the timestamp of a transaction T, is
lower than the timestamp of another transaction T , then abort T and
restart it. If it is greater, then T' is allowed to wait.
The other deadlock prevention protocols include the two important algorithms,
namely No Waiting (NW) and Cautious Waiting (CW). In the first algorithm, when
a transaction is unable to access a data item, the transaction gets aborted immediately
and restarts after some specific period of time, without checking whether a deadlock
will occur or not. In the CW algorithm, if there is a transaction Ti waiting for a data
item, which is locked by the Tj transaction, the deadlock prevention protocol checks
whether or not Tj is waiting for any other data item. If Tj is not waiting for any other
data item, Ti is allowed to wait, and if Tj is waiting for a data item then Ti transaction
is aborted.
5.10.3 Deadlock Detection
In this case, the system checks whether or not the problem of deadlock exists. To
detect a deadlock, the system constructs the wait-for-graph. In this graph, every
transaction has one node created for it, which is currently under execution. When a
transaction Ti gives a command to lock a data item, already locked by transaction
Tj, it creates a directed edge. If the data item requested by Ti is released, then the
Self-Instructional Material 189
Transaction Management, edge is deleted from the graph. Figure 5.6 shows the wait- for-graph for transactions
Concurrency Control and
Ti and Tj.
Distributed System

NOTES
Ti Tj

Fig. 5.6 Wait-for-Graph

5.10.4 Starvation
Starvation occurs when a transaction remains unexecuted for a long time while
other transactions are under execution. This occurs if some transactions are assigned
high priority than others. To avoid starvation, a system can use the first-come-first-
serve waiting method, according to which the transactions can lock the data item in
the same order in which the request is made for the data item. The wait-die and
wound-wait schemes also help prevent starvation among transactions. Another way
to avoid starvation is to define the priority of transactions and increasing the priority
of a transaction, which is waiting for a data item.

5.11 INTRODUCTION TO ARIES


ARIES is a web-based technology which was created by a group of NGOs, policy-
makers and scientists from all over the world, through an international partnership.
It establishes communication between users and experts. This brings about joint
involvement of users’ knowledge of a case study with that of knowledge experts to
develop expertise.
ARIES was developed with a unique objective to work for the cause of ‘rapid
ecosystem service assessment and evaluation’ (ESAV) and to support decision makers
in taking decisions fast, in a reliable and effective manner. It helps users analyse
effectively, their goals, assets and other resources and study factors that affect their
values.
Introduced in 2007, it is user friendly and provides service in any geographical
area, according to the priorities and needs of users. Aries is highly equipped with
hardware, as well as software tools. It uses a unique approach of artificial intelligence
in studying every data relevant to ESAV questions. Based on these studies, it
constructs a cause-and-effect picture which reveals the interaction of ecological
and economic factors in the selected area. It takes enough care to ensure that every
bit of relevant data is booked and not a single bit of relevant data is overlooked.
It is federally funded and mainly aimed at providing intelligent support to
ESAV. It plans to bring a fully functional portal to the public by the end of 2010. Its
demonstration and beta version will become available much earlier than that.
It will be free and open to all non-profit users. These include members of
NGOs and academic or governmental institutions. Non-profit users, contribute to
the growth of the system by committing to share data in modalities, reflecting their
mission and sources. Through an independent organization, services for business
190 Self-Instructional Material
groups and professional support will be provided by ARIES. This will support Transaction Management,
Concurrency Control and
effective maintenance and better growth of the system. Distributed System
ARIES supports sharing of files between computers having different operating
systems. Thus, it is useful for clients having different machine and operating systems.
For example, Windows-based and Apple Macintosh clients have to share files and NOTES
printers and all services provided on ARIES either via Ethernet cable or AirPort.
One client creates a document in Microsoft Word for Windows. Another client, may
be a coworker, can access that document, modify it using Word for the Macintosh
and then place it back on the server. The first client can see those revisions using
Word for Windows. ARIES also functions as a router, the AppleTalk router.
ARIES works on the client/server model and the client–server operating
systems solve many issues of peer networks and offer features like shared Internet
connectivity; intranet, local and Internet e-mail. It also has features like web hosting,
shared customer lists and calendars. These operating systems also have the ability
of centralized management of users account, security and backups. ARIES Server
Appliance is equipped with all these features.
Migration from a peer-to-peer (PP) network to an ARIES-based client–server
network is cost effective and can be done with ease. After connecting to ARIES,
one has to just run the web browser on a PC that is connected to the network. After
this, one can add new users, set file folders, invoke security, etc.
It is compatible wireless LAN (IEEE 802.11b and Apple AirPort) and, thus,
provides the maximum user mobility. It is simple to install, flexible in operation,
reduces cost of maintenance and has excellent scalability. Additional PCs can be
connected to it quite easily.
Network Interface cards are needed, like in any other computer for connecting
to the network. Dual speed NICs are available in the market that operate at both
10Mbps and 100Mbps which has an advantage that there is no need to replace them
for upgrading the network to Fast Ethernet. ARIES can support up to 25 users. It
has a maximum range of 550 feet. Users can find access to the LAN using wireless
PC card or AirPort adapters. It can be inserted in desktop PCs, notebooks or laptops.
Three types of network cables are supported by Network Interface. These
are:
1. Unshielded Twisted Pair (UTP), known as 10BaseT,
2. Coax referred to as 10Base2 (Fiber-Optic), also known as 10BaseFL,
3. 4-pair UTP (10BaseT and 100BaseT), which uses a rectangular 8-pin
modular connector.
UTP connectors are known, officially, as RJ-45 connectors with a maximum
allowable length of 100 metres between the hub and the computer.
ARIES can be used to share an Internet connection. For this, the TCP/IP,
which is a standard protocol, has to be configured for an ARIES-based Network.
This protocol is suited for mixed networks, Macs, Windows PCs and other computers.
Computers using TCP/IP have a unique IP address which is a numerical identifier
of a computer or device connected to a TCP/IP network.
ARIES automatically assigns the IP address to computers on the network
through DHCP server that runs the service. It uses a range of IP addresses from
192.168.1.1 to 192.168.1.254 and IP address 192.168.1.1 is reserved for ARIES
server.
Self-Instructional Material 191
Transaction Management, ARIES acts as an Internet gateway. There is an advantage in connecting
Concurrency Control and
Distributed System computers through a local network connection as one need not provide a modem
and a telephone line for each computer for accessing the Internet. The local network
can share one Internet connection and all other computers can have access to the
NOTES Internet. Thus, ARIES acts as an Internet gateway to user’s network and controls
access to the Internet, receiving all outgoing communications and sending them to
sites on the Internet with a single connection. It also receives all incoming
communications from the Internet, distributing these to the correct computer on the
local network. This rerouting is fully transparent but completely hidden from users.
When using the Internet, the question of security arises as there are many
types of users on the Net including hackers. ARIES, sitting at the gateway of the
network, guards the network, acting like a firewall. It provides security by blocking
access for unsolicited users on the Internet. It prevents a user’s computer on the
Internet to directly contact the network’s computers since computers’ IP addresses
are not known to him. Only the public IP address on the gateway can be accessed by
a computer on the Internet and Aries allows incoming Internet traffic only when it
is sent to the public IP address of the user’s network. It can be configured to selectively
filter out and reject TCP/IP traffic from few specific applications.
ARIES has one feature that users may like very much—its unlimited user
licenses. It provides users the ability to add unlimited users to the network and
additional charges are booked on users for this.

5.12 INTRODUCTION TO DISTRIBUTED DATABASES

5.12.1 Distributed Database


A distributed database or DDB is a collection of multiple interrelated databases that
are spread over a computer network. Each computer contains its own database,
which is managed by an individual database management system. A distributed
database management system (DDBMS) manages DDBs and makes the distribution
transparent to the user so that the user is not aware of the distribution and accesses
the data, as it is stored at one place.
DDB technology has evolved as a combination of two technologies, one is
the database technology and the other is the network and data communication
technology. DDBs provide the advantages of distributed computing to the field of
database management. The components are interconnected by a computer network
and work together to perform the assigned tasks.
Advantages of distributed database
DDBs are used for several reasons such as organizational decentralization and cost-
effective processing. Some of the advantages of DDBs are as follows:
• Increased reliability and availability
• Improved performance
Increased reliability and availability
Reliability is a measure of the possibility that defines a system which is running at
a given point of time. On the other hand, availability is a measure of the possibility
192 Self-Instructional Material
that the system is continuously serving the queries made to it during a time interval. Transaction Management,
Concurrency Control and
When you use DDBs, which are spread over several other sites, even if one site Distributed System
fails, it helps you to connect to the other sites which continue to function normally.
However, the data and software that resides on the site which failed cannot be
accessed, without affecting the performance of the other sites in the distributed NOTES
database. This quality improves the reliability and availability.
Improved performance
A distributed DBMS fragments the existing database and keeps the data closer to
the related site where it is most required. A database which is large in size is
fragmented and distributed over many other sites; and these are referred as smaller
databases at each site where these exist. The queries and transactions accessing
data at smaller databases have a better performance. In addition, when the database
is fragmented into smaller databases, each site has less overhead of transaction in
the execution of a query. Figure 5.7 shows the distributed database architecture.

Site 5

Site 1
Site 4

Communications
Network

Site 2
Site 3

Fig. 5.7 The Distributed Database Architecture

5.12.2 Distributed Data Transparency Features


In a DDBMS, distribution is transparent, i.e., it hides the details of where each file
is physically stored within the system. Any user on the network can access a file
sitting on his terminal disregarding where the file is actually stored. The various
features of data transparency are:
• Distribution or network transparency: In this type of transparency, the
user is not concerned with the network details. It may be location transparency
or naming transparency. Location transparency has the property that the
command used to perform a task is not dependent on the location of the data
and the location of the system from where the command is issued. Naming
transparency means that if you specify a name for an object, the named object
can be accessed unambiguously without any further specification.
• Hardware transparency: This enables the running of the same DBMS on
different hardware platforms. The hardware transparency is needed as the
real-world computer installations involve a wide range of machines that
include IBM machines, ICL machines, HP machines, personal computers
Self-Instructional Material 193
Transaction Management, and workstations. Hardware transparency helps the DBMS to integrate the
Concurrency Control and
Distributed System data in those machines so that they can be presented to the users as from the
single-system machine.
• Replication transparency: This enables the storage of copies of data at
NOTES multiple distributed sites. In other words, replication transparency can allow
the creation and destruction of replicas of data in response to user requests.
Replication transparency has two advantages. First, it enhances the
performance of the database, for example, applications accessing the database
can use local copies of data instead of trying to access the data in the remote
locations. Secondly, it increases the availability of data of the DBMS, as due
to replication transparency, data becomes available on multiple client
machines. However, apart from these advantages, replication also has
disadvantages. The major disadvantage of replication is encountered whenever
a replicated data is updated. It is because, when you update a replicated data,
you also need to update every single copy of that replicated data.
• Fragmentation transparency: This type of transparency enables a user to
behave in such a manner that he/she is unware of the existing fragments of
the data in the database. Fragmentation transparency also implies that the
users can view the data as logical combinations by using suitable joins and
unions. Here, the system optimizer is responsible for determining the fragments
that can be physically accessed by the users. A DBMS is said to support data
fragmentation, whenever, a relation variable in the DBMS can be divided
into fragments or pieces for the purpose of physical storage. Fragmentation
helps enhance the performance of the DBMS by enabling storage of data at
the location from which the data is most frequently used. This makes the
operations on the data to be performed as local along with reducing the traffic
in the network. Basically, there are two types of fragmentations—three-fourth
horizontal and vertical. Horizontal fragmentation can divide a relation or a
table into sets of rows. Vertical fragmentation can divide a relation or a table
into subrelations. Each subrelation is considered as a subset of columns of
the original relation. Fragmentation transparency has a special feature that
makes the user aware about the existing fragments.
• Transaction transparency: It helps in maintaining consistency of data by
coordinating the different functions of an object. These functions are used to
define various transactions and their dependencies in the database. You need
to add different check points at the different states of an object to define these
functions of the object.
• Failure transparency: It refers to the extent to which the errors and related
recoveries of a distributed database are invisible to the users and applications.
For example, if a server fails, then the failure transparency helps in
automatically redirecting all the users connected to the failed server to a
different server in such a manner that the users never notice the server failure.
In other words, failure transparency is used to tolerate fault failures and
problems of the distributed database. It does so by defining different conditions
of the database that can causes problems. For tolerating problems of databases,
the failure transparency includes various steps, such as locating an object
with its related possible problems, using check points and recovery functions
to detect and recover the problems and providing stability of an object using
194 Self-Instructional Material
replication function which is used to recover a problem. Failure transparency Transaction Management,
Concurrency Control and
is one of the most hard-to-achieve transparencies, as it is very difficult to Distributed System
determine whether a server actually has failed or whether it is just responding
to the requests very slowly.
• Performance transparency: It allows reconfiguration of the distributed NOTES
database system to match with varying loads on the system for improving the
performance of the system. It also helps in executing distributed queries in
the distributed system using a distributed query optimizer.
• Heterogeneity transparency: It is used to access the database of a single
computer of the DDB system.
• Migration transparency: It allows movement of data in a distributed database
system without affecting the operations performed by the users as well as
application programs. Examples of migration transparency include Network
File System (NFS) and Web pages available on the World Wide Web (WWW).
• Access transparency: It enables users to access local and remote data from
a distributed database system using the same operations. Examples of access
transparency include:
o Various file system operations in NFS
o Queries performed on SQL
o Navigating though the various Web pages on the Web
• Location transparency: It enables the users to behave in such a manner that
they do not actually know where the data of the DBMS is physically located.
However, they can work on that data as if it were present at their local site. In
other words, location transparency helps in accessing data without the
knowledge of their actual location. Location transparency is useful in the
distributed databases as it simplifies the terminal activities and user programs.
Here, data can move from one location to another so that it can respond to the
changing performance requirements. Examples of location transparency
include:
o Operations of the file systems in NFS
o Web pages available on the Web
o Tables contained in a distributed database
5.12.3 Advantages and Disadvantages of Distributed Databases
Advantages
The following are the advantages of distributed databases:
• Distributed database systems employ a distributed processing architecture.
An Oracle database server acts as a client when it requests data that another
Oracle database server manages.
• The distributed database system and database replication are related, yet
distinct. In a pure, that is, not replicated distributed database, the system
manages a single copy of all data and supporting database objects.
• Typically, distributed database applications use distributed transactions to
access both local and remote data and modify the global database in real
time. The term replication refers to the operation of copying and maintaining
database objects in multiple databases belonging to a distributed system. While
Self-Instructional Material 195
Transaction Management, replication relies on distributed database technology, database replication
Concurrency Control and
Distributed System offers applications benefits that are not possible within a pure distributed
database environment. Most commonly, replication is used to improve local
database performance and protect the availability of applications because
NOTES alternate data access options exist. For example, an application may normally
access a local database rather than a remote server to minimize network traffic
and to achieve maximum performance. Furthermore, the application can
continue to function if the local server experiences a failure, but other servers
with replicated data remain accessible.
• A distributed database system is regarded as a kind of partnership among
individual local DBMSs at individual sites.
• This system maintains a database link. A database link is a connection between
two physical database servers that allows a client to access them as one logical
database. To access the link, you must be connected to the local database that
contains the data dictionary entry. A database link connection is one-way in
the sense that a client connected to local database A can use a link stored in
database A to access information in remote database B, but users connected
to database B cannot use the same link to access data in database A. If local
users on database B want to access data on database A, then they must define
a link that is stored in the data dictionary of database B. A database link
connection allows local users to access data on a remote database. For this
connection to occur, each database in the distributed system must have a
unique global database name in the network domain.
• The global database name uniquely identifies a database server in a distributed
system. It supports the user accounts which are necessary to establish server-
to-server connections must be available in all databases of the distributed
database system.
Disadvantages
In distributed databases, multiple users work on the same database but each user
assumes that he is the only user using the database. In such an environment, data
consistency and integrity must be maintained properly. To implement the distributed
databases successfully, DBA handles the following operations:
• Query processing
• Catalog management
• Update propagation
• Recovery control
• Concurrency control
Each of these operations has various problems associated with it as discussed
in the following pages:
Query processing
Minimization of network utilization involves all processes that are related to query
processing, such as Query optimization process and Query execution process.

196 Self-Instructional Material


The problems related to query processing are given as follows: Transaction Management,
Concurrency Control and
• There is a lot of differences between communication time of each possible Distributed System
way of processing a query.
• The rate of data transfer and delay time is responsible for selecting a way
of query execution. NOTES
• Computation and I/O time of poor strategy are negligible as compared to
communication time.
Figure 5.8 shows different query processing ways with their communication
time.

Fig. 5.8 Query Processing Ways and their Communication Time

5.12.4 Catalogue Management


Distributed system catalogue includes views, authorization and control information.
Control information provides information related to the desired location, where the
database is required, fragmentation and replication independence. Distributed system
catalogue can be stored as centralized, fully replicated, partitioned and as a
combination of centralized and partitioned. Centralized means the whole catalogue
is stored at a central site of the network. Fully replicated means the complete
catalogue is stored at every site of the network. Partitioned catalogue means every
site of the network maintains its local catalogue and all local catalogues are combined
as the main catalogue. A combination of centralized and partitioned catalogue means
that each site maintains a local catalogue and all local catalogues are copied at the
central site of the network. There are problems with each storage type of catalogue,
which are given as:
• In the centralized and combination of centralized and petitioned system, the
whole catalogue is stored at a central location on the network. Thus, there
must not be reliance on the central site of the network for some services,
because if the central system fails to work, then the whole system would not
function properly.
• In the fully replicated system, every site maintains a local catalogue which
causes lack of control over catalogues of the whole network.
• In the partitioned system, the central site contains the union of all local
catalogues which requires expensive maintenance for the whole local
catalogue in the central catalogue.

Self-Instructional Material 197


Transaction Management, 5.12.5 Update Propagation
Concurrency Control and
Distributed System
The main problem of data replication is that it is not possible to make updation in
all copies of the database because of unavailability of data. Primary copy is a scheme
NOTES which is used for proper updation of database. These are the steps in primary copy
scheme:
• Every updated copy of the data is designated as primary copy and the remaining
copies are designated as secondary copy.
• Primary copies of a database are distributed at every site of the network.
• Updation of a database is deemed to be complete when the primary copy is
updated.
The primary copy scheme also has some problems such as if the primary
copy of any database is not available, then it is impossible to update the database.To
avoid this problem, a specific time, which is decided by the user, is used for updation
propagation. However, this solution is not right for database, because there are
possibilities that the information of a database will be changed before that specific
time and after creation of the update transaction.
5.12.6 Recovery Control
Recovery control in distributed database is based on the two-phase commit protocol.
The two-phase commit protocol is the transaction protocol due to which all nodes
and databases agree with each other to commit a transaction. This protocol is required
in an environment where a single transaction can interact with multiple independent
resource managers as in the case of distributed databases. It supports data integrity
by ensuring that modifications made to transactions are either committed by all the
databases involved in a distributed system or rolled back by all the databases.
The two-phase commit protocol works in two phases. The first phase is called
the prepare phase during which the updates are recorded in a transaction log file,
and the resource through a resource manager indicates that it is ready to make the
changes. Resources can vote either to commit the updates or to roll back to their
previous state. The activities performed in the second phase depend on the vote of
resources. If all resources vote to commit then, all the resources participating in the
transaction are updated, whereas if one or more of the resources vote to roll back,
then, all the resources are rolled back to their previous state.
Consider an example in which an interaction between a coordinator at a local
site and a participant at a remote site takes place, and a transaction has requested
the commit operation. In the first phase, the coordinator instructs the participants to
get ready and sends the get ready message at time T1. Participants make an entry in
the log and send the okay message as acknowledgement to the coordinator. Figure
5.19 shows that the acknowledgement has been sent at time T3 and received at time
T4. After receiving messages from all the participants, the coordinator takes a decision
to either commit or roll back the transaction. The transaction is committed if the
messages received by all the participants are okay. It is rolled back if the messages
received by all the participants. It is not okay. The first phase of the two-phase
commit protocol is complete.
The coordinator then, writes an entry in the log, takes a final decision and
sends it to the participants to do it or not to do it at time T6. It is the beginning of the

198 Self-Instructional Material


second phase. The participants receive it at time T7 and send an acknowledgement Transaction Management,
Concurrency Control and
to the coordinator at time T8 that is received by coordinator at time T9. Figure 5.9 Distributed System
shows the working of two-phase commit protocol.

NOTES

Fig. 5.9 Two-Phase Commit Protocol

5.12.7 Concurrency Control


In distributed database system, the problem of concurrency control is more complex
than in a non-distributed database system, because in a distributed database system
different users access the data stored in different computer systems. The part of a
distributed database that is present on one computer system cannot predict the nature
and type of transactions executing on other computer systems at that time. It is also
difficult to predict the concurrency control mechanism executing on a different
computer system. The concurrency control mechanism in a distributed database
system can be performed by implementing locking process for which different
requests, such as test lock, set lock and release lock are generated. These different
requests to lock the part of the database involved in transaction processing are known
as messages in the context of distributed database.
Consider a transaction, T1 that will update an object. The details of the object
are present at ten different sites. Each site has to generate the following five messages
to lock its copy of the object:
(i) Lock request
(ii) Lock grants
(iii) Update object
(iv) Acknowledgment
(v) Unlock request
Therefore, for ten sites, fifty such messages are required to ensure the success
of transaction T1; and the amount of time involved for the completion of T1 is larger
than the time taken by a similar transaction in a non-distributed database system.
The time taken to complete the same transaction T1 can be minimized by using the
primary copy scheme, in which the site possessing the primary copy of the object
that is to be updated will handle all locking operations.
The locking process to implement concurrency mechanism in a distributed
database system can lead to a global deadlock, which is a type of deadlock that

Self-Instructional Material 199


Transaction Management, involves two or more sites. For example, consider the two transactions T1 and T2 in
Concurrency Control and
Distributed System Figure 5.10.
Site A Site B
Holds Lock Wait for T1
NOTES at A at B to
complete
T1 T1

Wait for T1 Wait for T2


at A to at B to
release Lock release Lock
Wait for T2
at A to
complete Holds Lock
T2 T2 at B

Fig. 5.10 Global Deadlock

From Figure 5.10, the following conclusions are made:


• The user of T2 at Site A is waiting for the user of T1 at Site A to release the
lock.
• The user of T1 at Site A is waiting for the user of T1 at Site B to complete
its transaction.
• The user of T1 at Site B is waiting for the user of T2 at Site B to release the
lock.
• The user of T2 at Site B is waiting for the user of T2 at Site A to complete
the transaction.
The direction of arrows in the figure shows the existence of a deadlock between
the four concurrent transactions execution on two different Sites A and B. Since,
the deadlock is spread over two different sites, it is known as global deadlock. In
such case, the sites cannot even detect the presence of a deadlock situation. In
order, to detect a global deadlock, the sites will have to pass some other information
related to the details of the transaction running on each sites that will involve some
overheads.

5.13 TRANSACTION TYPES


In DBMS, there are many types of transactions with reference to concurrency control.
They are known as implicit transactions, explicit transactions, local transactions
and global transactions (See Figure 5.11).

200 Self-Instructional Material


Transaction Management,
Concurrency Control and
Distributed System

NOTES

Fig. 5.11 Explicit and Implicit Recipients of Transactions

Figure 5.11 shows that implicit recipients work with implicit transaction and
able are to pass all the messages to a queue that represents a specific queue. The
word recipient is used here for maintaining the queue for implicit and explicit types
of transactions. The category of transactions can be viewed via ‘explicit recipients’
in that they are designated targets of particular messages. The Figure 5.11 shows a
workflow consisting of four steps performed by Application A, Application B,
Application C and Application D. The queues are used to buffer the flow of
information between different processing stages of the business process. By
specifying the delay interval and the expiration time for a message, a window of
execution can be provided for each of the applications.
5.13.1 Implicit Transactions
By default, all INSERT, DELETE, UPDATE statements in DBMS run as implicit
transactions. For example, if an UPDATE query is run, and a primary key constraint
is violated, a rollback will occur and changes will not take place. One of the server-
level options available in the SQL server causes all operations to be encapsulated in
the context of a transaction. To use this mode, you must set the S e t
Implicit_Transactions On statement for the database connection. Any of
the following statements will serve as an implicit start to a transaction:
Alter Table Create Delete Drop
Fetch Grant Insert Open
Revoke Select Truncate Table Update
This adds another layer of integrity protection and applies for the following
three things:
(i) The size of the transactions themselves
(ii) The size of the transaction log Check Your Progress
(ii) Data recovery 16. What is ARIES?
17. What do you understand by
You can only recover committed data so that big transactions that do not
distributed databases?
commit, if there is a problem, can be lost. Performance is also considered by using 18. How many types of
transactions. transaction are there in a
DBMS?

Self-Instructional Material 201


Transaction Management, 5.13.2 Explicit Transactions
Concurrency Control and
Distributed System
These are written by the programmer. In order to execute a group of statements as a
transaction, we write the statements within a BEGIN TRANSACTION and COMMIT
NOTES TRANSACTION block. If an error occurs, we can use ROLLBACK, so that, the
previous data state is restored. The most popular and obvious way to use transactions
is to give explicit commands to start or finish the transaction. Transactions started
in this way are called explicit transactions. The types of explicit transaction are as
follows:
• Begin Transaction
• Rollback Transaction
• Commit Transaction
If anything goes wrong with any of the grouped statements, all changes need
to be aborted. The process of reversing changes is called ‘rollback’ in DBMS. If
everything is in order with all the statements within a single transaction, all changes
are recorded together in the database, as shown in Table 5.5.
Table 5.5 Types of Transaction and Syntax

Types of Transaction Syntax


Rollback Transaction Return 1
Commit Transaction Return 0
Table 5.6 shows the explicit transaction commands along with their
descriptions.
Table 5.6 Explict Transactions and their Descriptions

Explicit Transaction Commands Descriptions


BEGIN TRANSACTION Sets the starting point of an explicit
transaction.
ROLLBACK TRANSACTION Restores original data modified by a
transaction and brings data back to the state
it was in at the start of the transaction.
Resources held by the transaction are
freed.
COMMIT TRANSACTION Ends the transaction if no errors were
encountered and makes changes
permanent. Resources held by the
transaction are freed.
BEGIN DISTRIBUTED TRANSACTION Allows you to define the beginning of a
distributed transaction to be managed by
Microsoft Distributed Transaction
Coordinator (MS DTC).
SAVE TRANSACTION This command issues a savepoint within a
transaction that allows one to define a
location to which a transaction can return
if part of the transaction is cancelled. A
transaction must be rolled back or
committed immediately after rolling back
to a savepoint.

202 Self-Instructional Material


@@TRANCOUNT This command returns the number of Transaction Management,
active transactions for the connection. Concurrency Control and
Distributed System
This will start Begin Transaction and will
increment @@TRANCOUNT by 1.
It will represent towards a savepoint using NOTES
Roll back transaction.

5.13.3 Local and Global Transactions


The term global transaction is defined as a unit of work that takes place in an
environment in which the process of distribution of transactions takes place. A global
transaction is a transaction that requires the efforts of more than one manager of
resources. It also involves coordinating several types of services. These services
may be located in different programs on the server. It is also known as a distributed
transaction. In a distributed database, a Local transaction works with one which
requires access only to data stored at the originating site.
On the other hand, the term local transaction is defined as a type of transaction
in which each of the managers of resources who is involved separately coordinates
only its own changes. They do not have a sync point coordinator, such as an RRS, to
coordinate them. It involves one service within a single server program and accesses
only a single database. In a distributed database, a Global transaction requires access
to data at one or more non-local sites.

5.14 DISTRIBUTED DBMS ARCHITECTURES


The data and software in a DDB system is spread over multiple sites. The sites are
connected by some form of communication network. You can classify distributed
databases on the basis of various factors, such as degree of homogeneity. If all the
servers or individual local DBM on different sites in a DDB system use identical
software and all the users also use identical software, then the DBMS is called
homogeneous, otherwise it is called heterogeneous.
5.14.1 Homogeneous Distributed Database Systems
A homogeneous DDB system is basically a network consisting of two or more
databases. These databases use the same software and reside on multiple machines.
Figure 5.12 illustrates a homogeneous distributed database system that connects
three databases: hq, mfg and sales.

Self-Instructional Material 203


Transaction Management,
Concurrency Control and Marketing Distributed Database Head Office
Distributed System
mkd.com ho.com

NOTES

purchase.com

Purchase

Fig. 5.12 Homogeneous Distributed Database

In a distributed environment, an application is defined which can


simultaneously access or modify the existing and specified data in the database
tables. For example, any query received from a manufacturing client on local database
mfg can retrieve the joined data from the products table existing on the local database
and the dept table on the remote hq database. The location and platform of the
databases are transparent for a client application. In the distributed system, synonyms
can be created for remote objects so that the users can access them using the same
syntax as for local objects.
5.14.2 Heterogeneous Distributed Database Systems
In a heterogeneous DDB system, at least one of the databases uses different software
than the others. For example, if all the databases in a DDB system are using SQL
server but one of them is using some other non-SQL Server Database system, then
it constitutes a heterogeneous DDB system. To the application, the heterogeneous
DDB system appears as a single or local SQL Server database. The local SQL server
hides the distribution and heterogeneity of the data from the user.
The SQL server accesses the non-SQL database system using the SQL server
heterogeneous services in conjunction with an agent. A heterogeneous service agent
is the process through which an SQL server connects to a non-SQL server system.
An agent can reside in the following places:
• On the same machine as the non-SQL server system
• On the same machine as the SQL server system
• On a machine different from either the non-SQL server system or the the
SQL server system

204 Self-Instructional Material


Transaction Management,
5.15 DESIGN OF DISTRIBUTED DATABASES Concurrency Control and
Distributed System

5.15.1 Levels of Data and Processing


NOTES
DBMS allows the data used in different applications and software to be processed
in different levels. These levels are:
• Single-Site Processing, Single-Site Data (SPSD): In SPSD level, processing
is mostly performed on a single CPU or host computer. Here, DBMS is located
on local disk of the host computer and users are connected with dumb terminals
for accessing the database. The dumb terminals are terminals without having
a processing unit. Processing unit of the host computer executes all the
processes of these terminals using database of the host computer. DBMS of
mainframe, minicomputer and single-user microcomputer uses SPSD level
of data processing. Figure 5.13 shows the scenario of SPSD.

Fig. 5.13 SPSD Scenario

• Multiple-Site Processing, Single Site Data (MPSD): In MPSD level of


process and data distribution, the processing is done on different computers
on the network but the data is stored at a single place. Thus, in MPSD, multiple
processes that run on different computers can share a single repository of
data. This scenario is also known as client-server architecture. All the
applications access and retrieve the data from this single database. A network
file server is used in this level, and this is suited for a scenario where many
multi-user accounting applications are running under a personal computer
network. Figure 5.14 shows the scenario of MPSD.

Fig. 5.14 MPSD Scenario

• Multiple-Site Processing, Multiple-Site Data (MPMD): At the MPMD


level, multiple data processors and transaction processors are connected with
a DDBMS. MPMD is classified on the basis of how the DBMS is integrated
Self-Instructional Material 205
Transaction Management, in the network. There are two types of MPMD, homogeneous and
Concurrency Control and
Distributed System heterogeneous. Homogeneous MPMD supports the integration of a single
type of centralized DBMS over a network, while heterogeneous MPMD
supports integration of different types of centralized DBMS over the network.
NOTES Heterogeneous MPMD even supports different DBMSs, that can support
varying data models such as relational, hierarchical or network. These data
models run on different computers that may be mainframes or microcomputers.
Such types of heterogeneous MPMD are also known as fully heterogeneous
MPMD. Figure 5.15 shows the scenario of MPMS.

Fig. 5.15 MPMD Scenario

5.15.2 Design
The design of DDBs is carried out using the following techniques:
• Data fragmentation
• Data replication and allocation
1. Data fragmentation
Data fragmentation can be defined as a process that is used to break up a distributed
database into logical units. These logical units are known as fragments that can be
assigned to various sites of a Distributed database for storage. The fragments can be
defined as relations that are stored a particular site. Distributed database can be is a
collection of number of interrelated databases that are spread through a computer
network. Distributed database system is used to improve the reliability and the
performance of the database. The following are the types of data fragmentation:
• Horizontal fragmentation
• Vertical fragmentation
Horizontal fragmentation: Horizontal fragmentation can be defined as a
process that horizontally divides a relation by grouping rows to create subsets of
tuples. Each subset has a logical meaning. These subsets of tuples, also called
segments, are then assigned to different sites which are part of a distributed database
system. A horizontal fragmentation of a relation is a subset of the tuples existing in
that relation. These tuples in horizontal fragmentation are specified by a condition
on one or more attributes of the relation. Consider, for example, that you have a
relation called Employee as shown in the Table 5.7. Table 5.8 and 5.9 show horizontal
fragments for the Employee relation. The fragment Employee2 shown in Table 5.8

206 Self-Instructional Material


is obtained by applying the condition (status = 20) and fragment Employee3 as Transaction Management,
Concurrency Control and
shown in Table 5.9 is obtained by applying the condition (status = 40). Distributed System
Table 5.7 Employee

NOTES
Serial No. Employee Status City
Name
S5 Reeve 20 Paris

S4 John 40 London

S3 Blake 20 Athens

S2 Adams 10 London

S1 Clark 40 Paris

Table 5.8 Employee 2

Serial No. Employee Status City


Name
S5 Reeve 20 Paris

S3 Blake 20 Athens

Table 5.9 Employee 3

Serial No. Employee Status City


Name
S4 John 40 London

S1 Clark 40 Paris

Vertical fragmentation: Vertical fragmentation can be defined as a process


used to divide a relation vertically by columns. A vertical fragment of a relation
includes only a few attributes of the relation. Consider, for example, you have a
relation called Employee as shown in Table 5.7. If you fragment this relation
vertically, then you get the two relations as shown in the Table 5.10 and 5.11. The
fragment Employee 4 shown in the Table 5.10 includes attributes Serial number
and Employee name. The fragment Employee5 shown in Table 5.11 includes the
attributes Status and City.

Self-Instructional Material 207


Transaction Management, Table 5.10 Employee4
Concurrency Control and
Distributed System
Serial No. Employee
Name
NOTES
S5 Reeve

S4 John

S3 Blake

S2 Adams

S1 Clark

Table 5.11 Employee 5

Status City
20 Paris

40 London

20 Athens

10 London

30 Paris

2. Data replication and allocation


Data replication is a technique of storing certain data at more than one site in a
DDB. This improves the availability of the system, because the system can continue
to operate as long as at least one site is working properly. It can also improve the
performance of retrieval f1or global queries because the result of such a query can
be obtained locally from any one site. The most extreme case of replication is to
store the whole database at every site in the DDB system. This creates a fully
replicated distributed database.
Each fragment or each copy of a fragment must be assigned to a particular
site in the DDB system. This process is called data distribution or data allocation.
The arrangement of sites and the degree of replication depends on the performance
and availability needs of the system, and on the types and frequencies of the
transactions submitted at each site. For example, if the requirement of the system is
high availability, transactions can be submitted at any site and most of the
transactions are retrieval only, then a fully replicated database would be a good
choice.
5.15.3 Replication
Database replication is a highly flexible technology for copying updates automatically
between databases. The idea is that if you make a change to one database, other
208 Self-Instructional Material
database copies update automatically. Replication occurs at the database level and Transaction Management,
Concurrency Control and
does not require any special actions from client applications. Propagating updates Distributed System
automatically is a simple idea, but it helps solve a surprisingly large number of
problems, as shown in Figure 5.16.
NOTES

Fig. 5.16 Replication Benefits

Database replication is the creation and maintenance of multiple copies of


the same database. In most implementations of database replication, one database
server maintains the master copy of the database and additional database servers
maintain slave copies of the database. Database writes are sent to the master database
server and are then replicated by the slave database servers. Database reads are
divided among all of the database servers which results in a great performance
advantage due to load sharing. In addition, database replication can also improve
availability because the slave database servers can be configured to take over the
master role if the master database server becomes unavailable. Data replication is
very attractive in order to increase system throughput and provide fault-tolerance.
However, it is a challenge to keep data copies consistent. Furthermore, in order to
fully take advantage of the processing power of all replicas, adaptive load-balancing
schemes are needed. One of the very famous methods of replication is Middleware-
Mased Replication.
Middle-R is the middleware based replication tool working with DBMs. It
provides efficient, fast and consistent database replication for both cluster
configurations where all replicas are within a LAN and in WAN environments.
Each database replica is an instance of a non-replicated standard database system.
It provides a basic approach of fault-tolerance. Replication in DBMS works with
following mechanism:
• Isolation levels: It allows for different levels of isolation of concurrent
transactions. It focusses on snapshot isolation.
• Wide-area systems: It provides transparent, efficient and consistent data
replication in wide area networks. The usual communication technology
used in clusters, e.g., group communication systems does not work well
in WAN settings.
• Partial replication: While full replication places copies of data items at
all replicas, partial replication only assigns copies of an individual data
item to some replicas. When there is a high update workload, full
replication has too much overhead to keep all copies consistent and the Check Your Progress
individual replicas have little resources left to execute read operations. In 19. What is meant by a
homogeneous distributed
contrast, with partial replication, a replica only has to execute the updates
database system?
for data items for which it has local copies, and thus, has more potential
20. Define data fragmentation.
to execute read operations. It also addresses many challenges associated
Self-Instructional Material 209
Transaction Management, with partial replication, such as a more complex concurrency control, the
Concurrency Control and
Distributed System challenge of finding a replica with the data copies needed for a request,
and finally with the necessity of distributed query execution.
• Relationship between middleware and database system: When
NOTES implementing a replication solution outside the database system, the
replication tool does not have access to important components within the
database system, such as concurrency control. Thus, functionality has to
be re-implemented in the middleware.
5.15.4 Characteristics of DBMS Replication
The following are the characteristics of database replication with reference to
concurrency control:
• Availability. Keeping multiple copies of data is one of the most effective
ways to avoid database availability problems. If one database fails, you can
switch to another local copy or even to a copy located on another site.
• Cross-site database operation. Applications like credit card processing use
multiple open databases on different sites, so that there is always a database
available for transactions. Replication can help transfer copies between
distributed databases or send updates to a centrally located copy.
• Scaling. Replicated copies are live databases, so you can use them to distribute
read traffic. For example, you can run backups or reports on a replica without
affecting other copies.
• Upgrades. Replication allows users to upgrade a replica which can then be
switched over to the master copy. This is a classic technique to minimize
downtime as well as provide a convenient back-out in the event of problems.
• Heterogeneous database integration. It is quite common for data to be
entered in one database type, such as Oracle, and used in another, such as
MySQL. Replication can copy data between databases and perform
transformations necessary to ensure proper conversion.
• Data warehouse loading. Replication applies updates in real time. This is
very useful as databases become too large to move using batch processes.
Data warehouse loading is much easier with capabilities such as transforming
data or copying updates to a central location.
• Geographic distribution. Replication allows users to place two or more
clusters in geographically separated locations to protect against site failure
or site unreachability.
The database replication is considered as an essential technology to build
and operate a wide variety of business-critical applications. Tungsten Replicator is
designed to solve these problems as well as many others.
In a master/slave replication, updates are handled by one database server,
known as the master and propagated automatically to replicas, which are known as
slaves. This is a very efficient way to make database copies and keep them up-to-
date as they change.
Master/slave replication is based on a simple idea. Let us assume that two
databases start with the same initial data, which may be called a snapshot. Changes
on one database, recording them in order so that they can be replayed with exactly
210 Self-Instructional Material
the same effect as the original changes. It is called a serialized order. If replayed, Transaction Management,
Concurrency Control and
you will get the serialized order on the second database, you know how master/ Distributed System
slave replication.

NOTES

Fig. 5.17 Master/Slave Replication

Figure 5.17 shows that master/slave replication is popular for a number of


reasons. First, databases can generate and reload snapshots relatively efficiently
using backup tools. Second, databases not only serialize data very quickly but also
write it into a file-based log that can be read by external processes. Master/slave
application is, therefore, reasonably tractable to implement, even though the effort
to do it well is not small.

5.15.5 Fragmentation
Fragmentation consists of breaking a relation into smaller relations or fragments
possibly at different sites. Database applications work with views rather than entire
relations, therefore, data is stored close to where it is mostly frequently used. It
supports parallelism with fragments so that they are the unit of distribution. A
transaction can be divided into several subqueries that operate on fragments.
• Security: Data not required by local applications is not restored, and
consequently not available to unauthorized users.
• Performance: The performance of global applications that require data from
several fragments located at different sites may be slower.
Types of fragmentation
The types of fragmentation are as follows:
• Horizontal fragmentation : A subset of the tuples of a relation, defined as
sp(R), where p is a predicate based on one or more attributes of the relation.
The union of horizontal fragmentation must be equal to the original relation
requiring disjoint.
• Vertical fragmentation: A subset of the attributes of a relation, denoted as
Pa1, a2, .., an (R), where a1, a2, .., an are attributes of the relation R. Vertical
fragmentation has happened in loss-less joins. In this, systems often assign a
unique tuple id for each tuple in the original relation and attach it to the
vertical fragmentation.
• Mixed fragmentation: A horizontal fragment that is subsequently vertically
fragmented, or a vertical fragment that is then horizontally fragmented.
Let us consider a very good example of database fragmentation within the
skeletal structure of a DBMS such as Turbo Image. While specific internal DBMS
fragmentation is outside the scope of all, a DBMS exists on top of the file system.
This means that there is still significant impact of defragmenting Turbo Image data

Self-Instructional Material 211


Transaction Management, sets at the file level. In other words, given the database TRXDB1, you should still
Concurrency Control and
Distributed System defragment the individual files of TRXDB101, TRXDB102, TRXDB103, etc. It is
said that the internal fragmentation of data within a DBMS is still a critical
performance issue. Fragmentation in a DBMS is done via the DETPACK command,
NOTES to actually fix DBMS-level fragmentation on Turbo Image databases. Table 5.12
shows fragmentation in DBMS.
Table 5.12 Fragmentation in DBMS

Table_Name Rows Table_Pages Text_Pages


Child 25000 2689 10013
Normal 1 1 4917
Parent 1 3 56
tblob 500 46 2

5.16 SUMMARY
In this unit, you have learned about ACID properties. Every transaction possesses
four fundamental properties—atomicity, consistency, isolation and durability. You
have also learned about transactions and schedules. A transaction is a logical unit of
work in a DBMS that includes one or more database access operations. Another
topic you have read about is concurrent execution of transactions. Transaction
processing systems usually let many transactions run simultaneously. Schedules
involving aborted transactions and lock-based concurrency control have also been
explained in this unit. One simple method to prevent transactions from interfering
with one another is to execute them serially. In the concurrency approach, a lock is
a variable that is linked with a data item.
You have also been introduced to the concept of crash recovery. Crash recovery
mechanisms guard against system crashes and media failures. This unit also discussed
concurrency control as the process of managing simultaneous execution of
transactions without allowing them to interfere with one another. Concurrency control
can regulate the interaction among concurrent transactions to keep them from
disturbing the database consistency.
In addition, this unit discussed concurrency control mechanisms, including
locking protocols and timestamp ordering. It brought out the importance of protecting
a database against unauthorized access, malicious destruction and accidental
introduction of inconsistency. Further, you have also learned about distributed
database (DDB), which is a collection of multiple interrelated databases that are
spread over a computer network. Each computer contains its own database, which
is managed by an individual database management system. Further, you have learned
about transaction types. In DBMS, there are many types of transactions with reference
to concurrency control which are called implicit, explicit, local and global
transactions.
This unit also discussed distributed DBMS architectures. The data and software
in a DDB system is spread over many sites that are connected by some form of
communication network. Finally, you have learned about the design of distributed

212 Self-Instructional Material


databases. Database management systems allow the data used in different Transaction Management,
Concurrency Control and
applications and software to be processed at different levels. Distributed System

5.17 KEY TERMS


NOTES
• Transaction management: It is the process of controlling the execution of
transactions.
• Transaction: It is a logical unit of work in a DBMS that includes one or
more database access operations.
• Parallel transactions: These are the transactions that take place when two
or more transactions are simultaneously processed.
• Aborted transaction: It is a transaction that fails to complete successfully.
• Concurrency control: It is the process of managing simultaneous execution
of transactions without letting them interfere with one another.
• Dirty read problem: It is a problem that occurs when a transaction updates
an item in the database, and then there is a failure in the transaction for some
reason.
• Schedule: It is a sequence of operations performed by a set of concurrent
transactions that preserves the order of the operations in each of the individual
transaction.
• Locking: It is the most popular concurrency control mechanism implemented
through many important techniques.
• Lock: It is a variable associated with every data item to indicate whether the
data item is available for an operation to be performed on it.
• Database security: Policies taken to protect data from being accessed by
unauthorized users.
• Explicit lock requests: These are requests issued if a different kind of lock is
required during an operation.
• Optimistic locking: A type of locking wherein data is locked only when it is
being saved.
• Strict timestamp ordering: A variation of basic timestamp ordering which
ensures that the schedules are both strict and serializable.

5.18 ANSWERS TO ‘CHECK YOUR PROGRESS’


1. The four ACID properties are: atomicity, consistency, isolation and durability.
2. A transaction is a unit of work involving read, write or control operations on
database elements that must be completed as a whole or not at all.
3. Simultaneous transactions take place when two or more transactions are
processed parallely.
4. A transaction can have either successfully complete execution and be
committed, enabling the database to assume a new consistent state, or it can
fail to successfully complete execution (for any number of reasons) and get
aborted.
Self-Instructional Material 213
Transaction Management, 5. If a transaction fails to complete successfully, it is aborted.
Concurrency Control and
Distributed System 6. Serializablity can be achieved by several different means. Locking and time-
stamping protocols are the two very common techniques.
NOTES 7. Two schedules can be said to be conflict equivalent, if any two conflicting
operations in both the schedules are executed in the same order.
8. A dirty read problem arises when a transaction updates an item in the database
and then there is a failure in the transaction for some reason. As the transaction
fails, the updated item in the database is accessed by another transaction
before it can go back to the original value.
9. It is a sequence of operations performed by a set of concurrent transactions
that preserves the order of the operations in each of the individual transactions.
10. They guard against the following types of crashes—system crashes and media
failures.
11. It is the process of managing simultaneous execution of transactions without
letting them interfere with one another.
12. The typical levels of locking are:
(i) Database locking
(ii) Table locking
(iii) Page locking
(iv) Row locking
(v) Column locking
13. Locks can be binary, shared/exclusive locks and certified locks.
14. In optimistic locking, the data is locked only when it is being saved. The lock
is released after the data is saved. In pessimistic locking, the data is locked
when editing begins. The lock is released after the data is saved or discarded.
15. The strict two-phase locking protocol is adopted by commercial DBMS
because of its simplicity and ease of implementation compared to other
alternatives.
16. ARIES is a Web-based technology that serves as an Internet gateway. It also
provides Internet security by acting as firewall. It guards user’s network and
blocks unsolicited users trying to access another network.
17. Distributed database (DDB) is a collection of multiple interrelated databases
that are spread over a computer network. Each computer contains its own
database that is managed by an individual database management system.
18. In DBMS, there are many types of transactions with reference to concurrency
control. These are implicit transactions, explicit transactions, local transactions
and global transactions.
19. A homogeneous DDB system is basically a network consisting of two or
more databases. These databases use the same software and reside on multiple
machines.
20. Data fragmentation can be defined as a process that is used to break up a
DDB into logical units.

214 Self-Instructional Material


Transaction Management,
5.19 QUESTIONS AND EXERCISES Concurrency Control and
Distributed System

Short-Answer Questions
NOTES
1. Write a note on durability.
2. Draw a state transition diagram and name the typical states that a transaction
goes through during execution.
3. Name the three basic concurrency problems encountered in a DBMS.
4. What is a serial schedule? What is a serializable schedule? Why is a serial
schedule considered correct?
5. When are two schedules said to be equivalent? How can one detect equivalence
of schedules?
6. Define conflict and view serializability.
7. Why is concurrency control needed?
8. Compare binary locks to shared/exclusive locks. Why is the latter type
preferred?
9. What are checkpoints? Why are they important?
10. What is locking management?
11. What is the deadlock prevention protocol?
12. Write a note on ARIES.
13. What are the features of distributed data transparency?
14. What are explicit transactions?
15. What do you understand by a homogeneous distributed database systems?
Long-Answer Questions
1. What is a transaction? Explain the ACID property of database transactions.
2. Discuss serial schedule, non-serial schedule and serializable schedule with
examples.
3. Explain the method of verifying view serializability using the labelled
precedence graph when more than one transaction is executed in a concurrent
execution.
4. Prove that the basic two-phase locking protocol guarantees conflict
serializability. What benefit does the strict two-phase locking provide? What
are the resultant disadvantages?
5. Discuss the different types of locks—binary, shared, exclusive.
6. Explain the timestamp-based concurrency control protocol.
7. Write detailed notes on:
(a) Concurrency control
(b) 2PL protocol
(c) Recovery in DBMS
8. Discuss the problem of deadlock and starvation. Illustrate with example that
the two-phase locking protocol is not deadlock free.
Self-Instructional Material 215
Transaction Management, 9. Explain the two-phase locking protocol. How is it different from the strict
Concurrency Control and
Distributed System and conservative two-phase locking protocol?
10. When a schedule satisfies the two-phase locking protocol, is there any
possibility of dead lock? If not, why? If yes, give an example.
NOTES

5.20 LEARNING OUTCOMES


• Understand the ACID properties
• Transactions and schedules
• The concurrent execution of transactions
• Schedules involving aborted transactions
• Understand lock-based concurrency control
• Concurrency control
• Understand locking management
• Know how to deal with deadlocks
• Understand the concept of ARIES

216 Self-Instructional Material

You might also like