Database System
Database System
MCA
Second Semester
Bharathidasan University
Centre for Distance and Online Education
Chairman:
Dr. M. Selvam
Vice-Chancellor
Bharathidasan University
Tiruchirappalli-620 024
Tamil Nadu
Co-Chairman:
Dr. G. Gopinath
Registrar
Bharathidasan University
Tiruchirappalli-620 024
Tamil Nadu
Course Co-Ordinator:
Dr. A. Edward William Benjamin
Director-Centre for Distance and Online Education
Bharathidasan University
Tiruchirappalli-620 024
Tamil Nadu
The Syllabus is Revised from 2021-22 onwards
Author:
Authors
Manas Ghosh and Sudipta Pathak (Units: 1.4-1.7.3, 1.10-1.12, 1.13.4-1.14, 2.0-2.2.3, 2.4, 2.7, 3.2-3.5, 3.11-3.15, 4.3-4.3.1, 4.5-4.9, 5.0-5.4.2,
5.5, 5.6.3, 5.8-5.10.4)
Vikas Publishing House (Units: 1.0-1.3.3, 1.8-1.9, 1.13-1.13.3, 1.15-1.19, 2.3-2.3.3, 2.5-2.6, 2.8-2.17, 3.0-3.1, 3.6-3.10, 4.0-4.2, 4.4, 5.4.3,
5.6-5.6.2, 5.7, 5.11-5.19)
Information contained in this book has been published by VIKAS® Publishing House Pvt. Ltd. and has
been obtained by its Authors from sources believed to be reliable and are correct to the best of their
knowledge. However, the Publisher, its Authors shall in no event be liable for any errors, omissions
or damages arising out of use of this information and specifically disclaim any implied warranties or
merchantability or fitness for any particular use.
NOTES
A database management system or DBMS simply helps to manage databases. It is
a set of software that facilitates the storage and retrieval of data in a structured
manner.
Most modern enterprises find DBMSs indispensable. The smooth functioning of
an enterprise and its very existence may be heavily dependant on the proper selection
of a DBMS. A wrong choice may have an adverse effect which will be reflected
in the company’s performance and productivity for years to come. If sufficient
thought is not given to the selection of an appropriate DBMS, the enterprise using
it may end up wasting precious time and money in using and maintaining an unsuitable
system. Also, once installed and implemented, it is not only impossible to replace or
change a DBMS but it will definitely be an expensive proposition.
This book, Database System, gives the reader an insight into all aspects of a
DBMS. It begins with an introduction to database management systems, its advantages
and its structure in the first unit. Here, it also gives the reader an idea of database
design, relationships and relationship sets and the features of the E¯R model.
The second unit deals with integrity constraints over relations where it discusses
the primary and foreign keys. It also discusses the translation of relationship sets with
participation constraints; and translating weak entity sets and class hierarchies.
Primitive, selection and projection operations in relational algebra have also been
discussed in the second unit.
The third unit is devoted to discussing structured query language. Among other
things, it covers the basic queries in SQL, nested queries, aggregate operators,
triggers and views in SQL.
The fourth unit focusses on relational database design with special emphasis on
functional dependencies, properties of decomposition and the normal forms.
The last unit explains transaction management with emphasis on the ACID
properties, transaction processing steps, transaction action, transaction states and
additional operations. This unit also discusses concurrency control, locking management,
deadlock detection and prevention, as well as the design of distributed databases.
In accordance with the self-instructional mode format, each unit begins with an
Introduction to the topic and is followed by an outline of the Unit Objectives. The
detailed content is then presented in a simple and structured form, interspersed with
Check Your Progress questions to test the student’s understanding. A Summary of
the content, along with a list of Key Terms and a set of Questions and Exercises is
provided at the end of each unit for recapitulation. Relevant examples/illustrations
have been included for better understanding of the topics.
Self-Instructional Material 1
Introduction to Database
Management Systems and
UNIT 1 INTRODUCTION TO E–R Model
DATABASE MANAGEMENT
NOTES
SYSTEMS AND E–R MODEL
Structure
1.0 Introduction
1.1 Unit Objectives
1.2 Overview of Database Management System
1.3 A Historical Perspective
1.3.1 Navigational DBMS of the 1960s
1.3.2 Relational DBMS of the 1970s
1.3.3 Later Trends
1.4 File System versus DBMS
1.5 Advantages of DBMS
1.6 Describing and Storing Data in DBMS Transaction Management
1.7 Structure of a DBMS
1.7.1 Logical DBMS Architecture
1.7.2 Mapping between the Levels
1.7.3 Physical DBMS Architecture
1.8 People who work with Databases
1.9 Overview of Database Design
1.10 Entities
1.11 Attributes and Entity Sets
1.12 Relationships and Relationship Sets
1.13 Additional Features of E–R Model
1.13.1 Works-in-Relationship
1.13.2 Maintaining Ternary Relationships
1.13.3 Working with Participation Constraints
1.13.4 Key Constraints
1.14 Conceptual Design with E–R Model
1.15 Overview of Unified Modelling Languages
1.15.1 Software Development Methods
1.15.2 Modelling
1.15.3 Diagrams Overview
1.15.4 Structure Diagrams
1.15.5 Behaviour Diagrams
1.15.6 Meta Modelling
1.15.7 Criticism of UML
1.16 Summary
1.17 Key Terms
1.18 Answers to ‘Check Your Progress’
1.19 Questions and Exercises
1.20 Learning Outcomes
1.0 INTRODUCTION
In this unit, you will learn about database management systems. Data is correctly
considered a resource because for any organization to take proper decisions at the
proper time, it is imperative to have correct and up-to-date data in hand. It helps in
the proper utilization of organizational resources. The data that an organization
Self-Instructional Material 3
Introduction to Database must necessarily maintain for its operations is termed as operational data. Sometimes
Management Systems and
E–R Model data and information are interchangeably used, but it is necessary to understand
that data is a known fact that can be recorded and information is processed or
organized or summarized data.
NOTES You will also get a historical perspective of database management systems
(DBMS). This unit traces the development of DBMS through the 1960s to the present.
The file system versus DBMS and the advantages of DBMS are discussed.
You will also learn how to describe and store data in a DBMS transaction.
Databases store large amounts of information that are available for long periods of
time. During this period, data is continually accessed and processed. Databases are
generally stored permanently on magnetic disk secondary storage devices. This
unit also explains the structure of a DBMS. These are very complex systems; so, to
understand general database concepts and the structure and capabilities of a DBMS,
it is useful to examine the architecture of a typical database management system.
You will also learn about the people who work with databases and get an
overview of database design. The latter refers to a system that stores and retrieves
data systematically from the database.
Other important topics that you will learn are entity attributes and entity sets.
An entity is described by a set of properties called attributes. An employee entity
might have attributes such as employee code, employee name, salary, date of joining,
etc. Further, this unit explains relationships and relationship sets, additional features
of E–R model and conceptual design of an E–R model. The latter facilitates a database
by enabling the designer to express the logical properties of the database in an
enterprise schema. In the end, you will get an overview of unified modelling
languages. Developed by the end of the 1990s, unified modelling language (UML)
is a standardized general-purpose modelling language. It uses a set of graphical
notations to create abstract models of specific systems.
4 Self-Instructional Material
Introduction to Database
1.2 OVERVIEW OF DATABASE MANAGEMENT Management Systems and
E–R Model
SYSTEM
A database is an organization of data that is kept in the form of linked lists or tables NOTES
and a database management system (DBMS). In other words, it is a system of software
programs that controls this organized data in storing, managing and retrieving data
in a database. DBMSs are categorized according to data types and their structures.
When a request (through an application program) for data is given to a DBMS, an
instruction is passed on to the operating system to transfer the required data. As the
size and type of data grows, these can be added to the database without any
obstruction in the working of the existing system.
Organizations use different kinds of database for different purposes, for
example, they may use one kind of data for daily transaction processing and send
these details on to another system using another DBMS from which random enquiries
can be made and analysis of data (in the context of business environment) can be
carried out. Database Administrator (DBA) and systems analysts take decision related
overall systems design. The detailing of the database design is done by DBAs.
As the size of data grows, different types of data are kept for different purposes
on a database server or servers, as the case may be. These servers are computers
(keeping the actual databases) running only the DBMS and related software. These
servers are multiprocessor computers with large memory and arrays of disks for
stable storage. Such an array of disks is known as Redundant Array of Inexpensive
Disks (RAID).
The concept of database has also added to the capability of operating systems.
Most database applications have DBMS in their core, and these are connected to
one or more servers using a high-speed channel. In an environment, where large
volumes of transactions are processed, hardware database accelerators are also used.
DBMSs are built around a kernel having multitasking capability with built-in support
for networking. With the growth of technology, such functions are performed by
modern operating system.
However, all these developments did not take place in one day, but have
devopled from one stage to another in due course of time.
6 Self-Instructional Material
To avoid looping for collecting records, Codd gave a solution of using a set- Introduction to Database
Management Systems and
oriented language. RDBMS uses a branch of mathematics known as tuple calculus, E–R Model
and it supports all the operations of normal databases, such as, inserting, updating,
deleting, etc. It also provides a simple system for retrieving sets of data in a single
operation. NOTES
In the beginning of 1973, Ingres Corporation developed its first product and
also tested it, and by 1979 it was used widely. Ingres used a language for data access
called QUEL which was based on the relational model. System R was developed at
IBM and this led to IBM’s SQL/DS and DB2, then Oracle, HP’s Allbase, Tandem’s
Non-Stop SQL. This system used SEQUEL as a query language.
In 1976, P. Chen developed the Entity–Relationship (E–R) model for database
design. This brought about important conceptual data models. Such high-level
modelling enabled designers to focus more on data instead of logical table structures.
1.3.3 Later Trends
Early 1980s: RDBMS was used for commercial purposes in business and by the
mid-1980s, Structured Query Language (SQL) got the status of an established query
language. DB2 became the marketable product of IBM and earlier models, network
and hierarchies started loosing market. Even today, some legacy systems are using
them. The IBM PC was developed and this caused the emergence of many database
companies and many products, such as RIM, RBASE 5000, PARADOX, OS/2
Database Manager, Dbase III, and IV entered the market. Further development led
to the products like Foxbase, Visual FoxPro, Watcom SQL, etc.
Early 1990s: The popularity of databases spread in the field of business and
commerce and during this period, efforts were focussed on the development of
client tools for more powerful applications, such as PowerBuilder (Sybase), Oracle
Developer, VB (Microsoft), and so on. The client–server model got its place in
business world influencing business decisions. Personal productivity tools, such as
spreadsheets and databases, such as Excel/Access (MS) and ODBC became more
and more popular. This was the beginning of prototypes for Object Database
Management Systems (ODBMS).
Mid-1990s: World Wide Web/Internet appeared on the horizon, and its use
was started by people, rather than by developers. Remote access to computer systems
with legacy data became a possibility and client–server became popular and average
users were able to access the Internet through their desktop computers. Desktop
microcomputers became cheaper and got more popular. Web/DB started growing
exponentially.
Late-1990s: With the exponential growth of the Internet and Web technology
and also hardware products, many software tools were also developed. These are:
Active Server Pages, Front Page, Java Servlets, JDBC, Enterprise Java Beans,
ColdFusion, Dream Weaver, Oracle Developer 2000, and so on. Widespread use of
many software products like gcc, cgi, Apache, MySQL, and so on, became open
and was available. During this period, Online Transaction Processing (OLTP) and
Online Analytic Processing (OLAP) became popular.
Early 21st century: The Internet got the status of information super highway.
More and more activities became Net-based and many business processes were
made digital. Growth of database applications was more than that of the Internet.
Self-Instructional Material 7
Introduction to Database More and more interactive applications appeared with the availability and use of
Management Systems and
E–R Model personal digital assistants (PDAs), point of sale (POS) transactions and availability
of vendors for these products. More user-friendly gadgets appeared in the market.
In large database markets, three main companies—IBM (buys Informix), Microsoft
NOTES and Oracle dominated the scene. Online business transactions were carried out. Net
banking, online fund transfer, etc., have become commonplace. Data mining, data
warehousing and data marts are commonly used techniques today.
Future trends: Huge systems with high memory demand have been launched
in the market. Large scientific and technological databases, such as genome project,
geological, national security and space exploration data need to be handled.
More powerful database applications will come in the future as successors to
SQL. XML is gaining popularity and becoming more and more powerful. XML
with Java for databases is going to be the next big thing of the future. Distributed
transaction processing is emerging for business planning applications.
Object-oriented applications will become increasingly popular. Object
Database Management Group (ODMG) standards will bring products in the market
that may perhaps beat all other previous products.
8 Self-Instructional Material
Introduction to Database
1.5 ADVANTAGES OF DBMS Management Systems and
E–R Model
Self-Instructional Material 9
Introduction to Database the disk; copied to main memory for processing, and then rewritten to the disk if if
Management Systems and
E–R Model any changes are made.
File organization is a way of arranging the records in a file when the file is
stored on the disk. Data files are organized so as to facilitate access to records and
NOTES to ensure their efficient storage. There are a number of different primary file
organizations. Actually, these file organizations determine how the records of a file
are physically written to the disk, and how they can be accessed. A heap file
(unordered file), places records on disk by appending new records to the end of the
file, and a sorted file (sequential file) keeps the records ordered by the value of a
particular field, called a sort key. Selection of file organization depends on the
following two factors:
• Typical DBMS applications need a small subset of the database at any given
time.
• When a portion of the data is needed it must be located on disk, copied to
memory for processing and rewritten to disk if the data was modified.
10 Self-Instructional Material
Figure 1.1 Shows the logical architecture of a typical DBMS. Introduction to Database
Management Systems and
E–R Model
NOTES
Physical storage
Database User
Utilities Applications
Front End
API
DBMS Engine
Back End
Transaction File Manager
Manager
Physical
Database
12 Self-Instructional Material
The front end is really just any application that runs on top of the DBMS and Introduction to Database
Management Systems and
acts as a user interface. These may be applications provided by the DBMS vendor, E–R Model
the user or a third party.
The back end can be further divided into the functional software components,
which are as follows: NOTES
Data manipulation language (DML) precompiler: It converts DML
statements embedded in an application program to normal procedure calls in a host
language. Through the DML precompiler, DML commands and application programs
written in host language are separated. DML commands are sent to the DML
interpreter for translating into object code for database access and the rest of the
program is sent to the compiler of the host language. Object codes for the commands
and the rest of the program are combined together through linking and sent to the
DBMS engine (also called database manager) for execution. The precompiler must
interact with the query processor in order to generate the appropriate code. The
precompiler interacts with the query processor.
DML compiler: It translates DML statements into low-level instructions that
a query processor understands.
Data definition language (DDL) interpreter or compiler: It interprets DDL
statements and records definitions into data dictionary.
Query evaluation engine: It executes low-level instructions generated by
the DML compiler.
It mainly deals with solving all problems related to queries and query
processing. It helps the database system simplify and facilitate access to data.
Database manager: A program module providing an interface between low-
level data (stored in the database) and the application programs and queries which
are submitted to the system.
The functions of the database manager include:
Efficient storage, retrieval and updation of data
• Interaction with the file manager
• Ensuring a state of consistency in the database irrespective of system
failures
• Maintenance of integrity controls and user authority for data accession
File manager: It manages allocation of disk space and data structures used
to represent information on disk.
In addition, several data structures are required for physical system
implementation.
• Data files: Data are stored in data files, which store the database itself.
• Data dictionary: Actually, this is a critical element in the DBMS.
The results of compilation of DDL statements are a set of tables that is
stored in a special file called data dictionary that documents data in a
database. A data dictionary contains metadata (data about data). Metadata
is data about the storage details of a database.
• Indices: To improve the performance of a DBMS, a set of access aids in
the form of indexes are usually provided in the database systems. An index
is a data structure that helps access data through a given part of their
Self-Instructional Material 13
Introduction to Database value. It provides fast access to data. There are several indexes and for
Management Systems and
E–R Model implementing indexes, many techniques are used; however, each technique
is specific to a particular database application. A DBMS provides
commands to build, maintain and destroy such indexes.
NOTES • Statistical data file: The query processor uses statistical data to optimize
queries.
• Log file: Each log record comprises the values for database items before
and after a modification, and it can be utilized for the purpose of recovery.
1.10 ENTITIES
Any ‘thing’, that is, an object, a being or an event that has an independent existence
in the real world is called an entity. The entity may have a physical existence in the
real world, such as in the case of a house, a bicycle or an employee. It may also be
an object with a conceptual existence, for example, a company, a job or a university
course.
Self-Instructional Material 15
Introduction to Database An entity set is a set of entities of the same type that share the same properties
Management Systems and
E–R Model or attributes.
The individual entities that constitute a set are said to be the extension or
instance of the entity set. As all entities in an entity set have the same attributes,
NOTES entity sets also share the attributes of the contained entities. Therefore, attributes
are descriptive properties possessed by each member of an entity set.
In other words, an entity is determined by its instantiations. A particular
instance of an attribute is a value. For example, ‘Hari Nandan Tunga’ is one value
of the attribute Name. The value of the attribute can be different for each entity in a
set. A set of permitted values exists for each attribute. This is called domain or a
value set of that attribute. The domain of the attribute employee_name, for instance,
may be the set of all text strings of a particular length.
Types of Attributes
An attribute, as used in the E–R model, can be characterized by the following attribute
types: Figure 1.3 shows the different types of attributes.
• Simple vs composite: A simple attribute has an atomic value, i.e., an attribute
that cannot be decomposed into subparts. For example, employee code, salary
of an employee. In contrast, composite attributes can be divided into subparts
(that is, other attributes). For example, employee-name could be structured
as a composite attribute consisting of first-name, middle-initial and last-name.
Composite attributes help us to group together related attributes making
modelling more readable and flexible. The advantage of defining a composite
attribute for an entity is that a user can refer to the entire composite attribute
as well as only a component of the attribute according to the need.
• Single-valued vs multi-valued: An attribute is said to be single-valued if it
has a single value for any instance. There may be instances where an attribute
has a set of values for a specific entity. This type of attribute is said to be
multi-valued. For example, an employee may have several degrees or he or
she may have several different phones, each with its own number. It is to be
noted that a single-valued attribute that changes value over time does not
imply that it is multi-valued. A single-valued attribute is not necessarily a
simple attribute.
• Stored vs derived: Although most of the attributes will be stored, there will
be some attributes that can be derived or computed from other related
attributes. A derived attribute need not be physically stored within the database;
instead, it is derived by using an algorithm. For example, a person’s AGE
attribute can be derived by subtracting the date of birth (DOB) from the current
date. Similarly, the annual salary of an employee can be derived by multiplying
the salary by twelve.
• Null attributes: When a certain entity does not possess a value for an attribute,
a NULL value is used. This will mean ‘not applicable’ or that the ‘value is
unknown’ or ‘non-existent’. A null value for ‘apt_number’ attribute, for
example, could indicate that the address is not inclusive of an apartment
number or that there is an apartment number which is not known to us, or
that it is unknown whether an apartment number is included in the employee’s
address or not.
16 Self-Instructional Material
Multivalued Introduction to Database
attribute Management Systems and
E–R Model
Derived attribute
Age
Door no. NOTES
Ph no. Address
Street
Employee
City
Date of
Emp_id birth
Name
Composite Attribute
Attributes of an entity set may be divided into the following two groups:
• Identifiers: A set of one or more attributes that uniquely identifies an instance
of an entity. These are also termed as key attributes; for example, employee
code attribute of the EMPLOYEE entity set.
• Descriptors: These attributes provide a non-unique characteristic of an entity
instance. These are said to be non-key attributes; for example, employee name,
salary, date of join, etc., attributes of the EMPLOYEE entity set.
Self-Instructional Material 17
Introduction to Database The number of relationships wherein an entity can participate in a particular
Management Systems and
E–R Model relationship type is specified by the cardinality ratio. In the most common case of
binary relationships, the cardinality ratios are as follows:
• One-to-one (1:1)
NOTES
• One-to-many (1:M or 1:*)
• Many-to-one (M:1 or *:1)
• Many-to-many (M:N)
one-to-one one-to-many
A B A B
many-to-one many-to-many
A B A B
1
1 Head
Teacher _ of School
Partial
Total
18 Self-Instructional Material
Figure 1.5 depicts non-uniform binary 1:1 relationship. As the relationship, Introduction to Database
Management Systems and
Head_of, is connected to two entities, Teacher and School, so it is a binary E–R Model
relationship. Again, a school possesses only one single head master who belongs to
the Teacher entity set. On the other hand, one person cannot act as a head master of
more than one school. So at the most one entity from both entity sets can be associated NOTES
with the relationship Head_of. Observe that every teacher cannot act as a head
master of a school. Therefore, there are some teachers besides the head master. So,
Teacher entity participates partially in the relationship but as every school possesses
at least one head master and more than one teacher. Hence, the entity School
participates totally.
Figure 1.6 is an example of a uniform binary 1:1 relationship:
Employee Chair
Sits_on
Total
Total
1 Manages N
Manager Employees
Self-Instructional Material 19
Introduction to Database
Management Systems andE–
R Model 1
Employees N Work_ Manager
under
NOTES
M N
Employee Works_ Project
for
Unary Relationship
Up to now, we have worked with binary relationships only. Now, we shall observe
unary relationship where only one entity participates in the relationship. Like binary
relationship, unary relationship can also be subdivided into the following categories:
• Unary 1:1
• Unary 1:N
• Unary M:N
Unary 1:1
Wife
Employee Married_to
Husband
20 Self-Instructional Material
In Figure 1.9, an employee is married to another employee. Both employees Introduction to Database
Management Systems and
here belong to the same Employee entity. So only one entity Employee is associated E–R Model
with the relationship set Married_to. If we assume that one employee can get married
to only one employee, then it becomes a unary 1:1 relationship.
Unary 1:N NOTES
Manager
1
Manager_
Employee
of
Subordinate
M
Guarantor
Employee
_of
Medicine
NOTES
In ternary relationships, (see Figure 1.12) more than two entities are involved.
In this example, Prescribes is a ternary relationship that associates three entity sets,
Doctor, Medicine and Patient.
Self-Instructional Material 23
Introduction to Database 1.13.4 Key Constraints
Management Systems and
E–R Model
We can define constraints on the super-types relationships to restrict the participation
of entities to the subtypes.
NOTES • Inclusion Constraints
o The disjoint constraint specifies that the subtypes of a super type are disjoint.
This means that an entity can be a member of only one subtype.
o The non-disjoint (OVERLAPING) constraints specify that the subtypes are
overlapping, and an entity may be a member of more than one subtype.
Based on inclusion constraint, a generalization hierarchy can either be
overlapping or disjoint. In an overlapping hierarchy an entity instance can be part
of multiple subtypes. It is quite possible an employee may be manager as well as
secretary in a company. In a disjoint hierarchy, an entity instance can be in only one
subtype. For example, the entity EMPLOYEE, may have two subtypes,
PERMANENT and CONTRACT- BASIS. An employee may be one type or the
other but not both.
• Completeness Constraints
A total specialization constraint specifies that every entity in the super class must
be a member of some of its subclasses. For example, a student must belong to one
of the subclasses of Post-graduate and Undergraduate. (see Figure 1.14).
Professional
Belongs To
Organization
Cardinality constraint
It specifies lower and upper bounds on the number of relationships each entity can
participate in.
Maximum cardinality: The numbers inside the relationship show the
maximum number of entities that can occur on one side of the relationship.
Minimum cardinality: It is the minimum number of entities that can occur
on one side of the relationship.
The mapping cardinality of a relationship set depends on the real world
relationships it is modelling. Cardinality constraint is denoted by minimum
cardinality and maximum cardinality pair enclosed within parenthesis. Figure 1.15
depicts this.
24 Self-Instructional Material
Introduction to Database
Management Systems and
EMPLOYEE Are_in DEPT E–R Model
(1,N) (1,N)
NOTES
(0,N) (0,1)
SUPERVISE
Self-Instructional Material 25
Introduction to Database Figure 1.16 shows the E–R model for airport management.
Management Systems and
E–R Model
NOTES
Self-Instructional Material 27
Introduction to Database UML also includes certain concrete things such as:
Management Systems and
E–R Model • Programming language statements
• Database schemas
• Reusable software components
NOTES
The approach of UML is to combine the best practice from the concepts of
data modelling. These are: entity relationship diagrams, business modelling, object
modelling and component modelling. Business modelling shows workflow in a
business environment. Such modelling is applicable to all processes, throughout
the software development life cycle and can be used for implementing different
technologies. UML has utilized the concepts of the Object-modelling technique
(OMT) and Object-oriented software engineering (OOSE) by unifying them into a
single modelling language which are common to these and widely used. It is a
standard modelling language capable of modelling concurrent as well as distributed
systems. Although it is not an industrial standard, yet it is taking shape under the
Object Management Group (OMG).
Using Query View Transformation (QVT) like transformations a UML model
may be transformed to other representations, for example, Java, supported by the
OMG. QVT is a model-driven architecture, a standard for model transformation
and is defined by the Object Management Group.
To make it capable of supporting all OO methods, concepts from other OO
methods were integrated with UML. For this reason, this language is useful in many
engineering applications. It is applicable from single process, single user applications
to concurrent as well as distributed systems. This has made UML not only rich but
also vast.
This is an international standard, given by ISO/IEC 19501:2005 Information
technology —Open Distributed Processing—Unified Modelling Language (UML)
Version 1.4.2
1.15.1 Software Development Methods
UML is a modelling language and not a development method. However, it was
designed in such a way that it is compatible with object-oriented software
development methods. After the evolution of UML, some of these methods have
been reconstructed to match with the new notations like that of OMT, and new
methods based on UML. IBM’s Rational Unified Process (RUP) is best known for
this. Other UML-based methods are: Abstraction Method, Dynamic Systems
Development Method and others. These are designed with a view to provide more
specific solutions.
1.15.2 Modelling
The UML model consists of many diagrams, each making a partial graphical
representation of a system’s model. The model also has a ‘semantic backplane’—
written use cases, driving the elements in the model and diagrams.
UML diagrams of a system model have two different views:
(i) Static view: This is also known as structural view. This shows the static
structure of the system that uses objects, attributes, operations and
relationships. The structural view is composed of class diagrams and
composite structure diagrams.
28 Self-Instructional Material
(ii) Dynamic view: This view represents the dynamic behaviour of the Introduction to Database
Management Systems and
system. This shows collaborations among objects, along with changes E–R Model
in the internal states of objects. This comprises of sequence diagrams,
activity diagrams and state machine diagrams.
Exchange of UML models among UML tools requires the use of XMI NOTES
interchange format.
1.15.3 Diagrams Overview
UML has undergone many revisions and its version, known as UML 2.0, has thirteen
types of diagrams, six of them representing structural applications and the other
seven are behavioural, and these are divided into three categories. Four out of seven
diagrams that represent the general types of behaviour represent different aspects of
interaction. The hierarchical view of these diagrams is shown in Figure 1.17.
Diagram
Structure Behaviour
Diagram Diagram
Self-Instructional Material 29
Introduction to Database • Package diagram: It shows the logical groupings into which a system is split
Management Systems and
E–R Model
up, showing dependencies among these groupings.
Since these show the structure of a system, they help to document the
architecture of software systems.
NOTES
1.15.5 Behaviour Diagrams
These diagrams emphasize on things that must happen in the modelled system.
These consist of:
• Activity diagram: It shows the business and operational processes. It also
shows the step-by-step workflows in a system. It shows the overall flow of
control.
• State machine diagram: It uses standardized notations for describing systems;
starting from computer programs up to business processes.
• Use case diagram: It shows the functionality in terms of actors. The goals of
actors are denoted as use cases and dependencies among these use cases are
also shown.
Behavioural diagrams are used to describe the functionality of software
systems.
Interaction diagrams
In terms of the set theory, a set of interaction diagrams is a subset of behaviour
diagrams. They emphasize the flow of data and control among the things in the
modelled system.
• Communication diagram: It shows the interactions among objects or parts as
a sequence of messages. It depicts a combination of information derived from
class, sequence, and use case diagrams and describe both, the static structure
as well as dynamic behaviour of a system.
• Interaction overview diagram: It is a type of activity diagram with nodes
representing interaction.
• Sequence diagram: It shows the way objects communicate in terms of a
sequence of messages and the life spans of objects in relation to those
messages.
• Timing diagrams: It is a specific type of interaction diagram showing timing
constraints.
The Protocol State Machine is a subvariant of the State Machine. It may be
used to model network communication protocols.
1.15.6 Meta Modelling
The Object Management Group (OMG) developed an architecture known as meta
modelling for defining the UML. It is called the Meta-Object Facility (MOF) and is
a standard for model-driven engineering. It is designed as a ‘four-layered
architecture’. There is a meta-meta model at the top layer known as M3 layer. This
model is the language used by MOF to build metamodels, known as M2-models.
M2-models describe elements of the M1-layer, and hence the model, known as M1-
models. The last layer is the M0-layer, also called data layer, and it describes real-
world objects.
30 Self-Instructional Material
Beyond the M3-model, MOF describes the means for creating and Introduction to Database
Management Systems and
manipulating model and meta models by defining an interface known as CORBA to E–R Model
describe those operations. Similarities are there between MOF’s M3-model and
UML structure model, MOF’s meta models are modelled as UML class diagrams. A
supporting standard for MOF is XMI, defining an XML-based exchange format for NOTES
MOF’s M3-, M2- or M1-Layer.
1.15.7 Criticism of UML
Although widely recognized and used as a modelling standard, it is criticized due to
certain deficiencies. Some of them are:
Language bloat: UML is unusually large and complex, containing many
diagrams and constructs that are not much in use and hence are redundant.
Weak visualization: It employs many line styles, graphically very similar
and their meanings reside in permutations of lines, (whether dotted or solid) and
arrow heads (whether open or closed, filled or empty). Thus, the same line style
may mean different things in different diagram types.
Problems in learning and adopting: Problems as stated above can make
learning UML problematic.
1.16 SUMMARY
In this unit, you have learned about DBMS. A database management system provides
software that allows application programs to deal with data field names regardless
of the field’s location within a record; record location within a file and file location
within a device. In a DBMS, all the files are integrated into one system, thus making
data management more efficient by providing centralized control of the operational
data. Database management systems are not only used in commercial applications,
but also in many scientific/engineering applications. Data is stored in data files that
store the database itself. You have also got a historical perspective of DBMS, such
as the navigational DBMS of the 1960s and the relational DBMS of the 1970s. It
familiarized you with the latest trends in DBMS.
You have learned about the features and advantages of DBMS. The three
main features of DBMS are: centralized data management, data independence and
data integration. This unit also discussed the method of describing and storing data
in DBMS transaction management. Databases store large amounts of information
that must be available for long periods of time for continuous access and processing.
In this unit, you have also learned about the structure of a DBMS. One of the
key features of the database approach is to bring about a clear separation between
the logical and physical structures of data. The term logical structure refers to the
way the programmers see it and the physical structure refers to the way data is
actually stored on storage medium. A data dictionary is a critical element in DBMS.
It can be treated as a special file that stores information about the organization and
usage of data contained in the database. This information is called meta data (which
means data about data). It is sometimes termed as a system catalog that documents
the data in the database. The result of compilation of DDL statements is a set of
tables that is stored in the data dictionary that documents data in a database. The
logical architecture of a DBMS is known as the three-level architecture and was
Self-Instructional Material 31
Introduction to Database suggested by the American National Standards Institute/Standards Planning and
Management Systems and
E–R Model Requirements Committee (ANSI/SPARC). The process of transforming requests
and results between different levels of abstraction is called mapping. There are
actually two types of mappings: conceptual/internal mapping and external/conceptual
NOTES mapping.
The other important topics that were dealt with in this unit are: people who
work with databases, overview of database design, entities and attributes and entity
sets. You have also learned about relationships and relationship sets. While a
relationship represents an association between two or more entities, a relationship
set is a set of ‘similar in kind’ relationships among one or more entities.
This unit also explained the additional features of the E–R model and the
conceptual design with the E–R model. The E–R model is used to facilitate database
design by allowing the designer to express the logical properties of the database in
an enterprise schema. An E–R diagram can graphically represent the overall logical
structure of a database. The relative simplicity and pictorial clarity of this
diagramming technique may well account for the widespread use of the E–R model.
Finally, you have learned about UML or unified modelling languages. It was
developed at the end of the 1990s and is a standardized general-purpose modelling
language. It uses a set of graphical notations to create abstract models of specific
systems. It is an open method used for specification, visualization, construction and
documentation of the system under development. It handles artifacts of an object-
oriented system. This modelling language has a standard of writing the blueprints
of a system and includes a few conceptual components such as actors, business
processes, system components and activities.
34 Self-Instructional Material
16. Behavioural diagrams emphasize on things that must happen in the modelled Introduction to Database
Management Systems and
system. These consist of: E–R Model
• Activity diagram: It shows the business and operational processes. It also
shows the step-by-step workflows in a system. It even shows the overall
flow of control. NOTES
• State machine diagram: It uses standardized notations for describing
systems; starting from computer programs up to business processes.
• Use case diagram: It shows the functionality in terms of actors. Goals of
actors are denoted as use cases and dependencies among these use cases
are also shown.
Behavioural diagrams are used to describe the functionality of software
systems.
17. Although widely recognized and used as a modelling standard, UML is
criticized due to certain deficiencies. Some of them are:
• Language bloat: UML is unusually large and complex containing many
diagrams and constructs that are not much in use and are hence are
redundant.
• Weak visualization: It employs many line styles, graphically very similar
and their meanings reside in permutations of lines, (whether dotted or
solid) and arrow heads (whether open or closed, filled or empty). Thus,
the same line style may mean different things in different diagram types.
• Problems in learning and adopting: Problems as stated above can make
learning UML problematic.
Short-Answer Questions
1. What is a DBMS? Why do we need a DBMS?
2. What are the functions of DBMS?
3. Write a note on navigational DBMS of the 1960s.
4. What are the latest trends in RDBMS?
5. Why do we store data in a database instead of in a file?
6. What are the advantages of a database management system?
7. What do you understand by logical DBMS architecture?
8. Briefly explain the physical DBMS architecture.
9. What are attributes and entity sets? What are the differences between simple
and composite attributes?
10. Distinguish between single-valued and multi-valued attributes.
11. How are relationships classified in a DBMS?
12. What do you understand by works-in-relationship?
13. How are ternary relationships made?
14. What are unified modelling languages?
15. What is meta modelling?
Self-Instructional Material 35
Introduction to Database Long-Answer Questions
Management Systems and 1. Discuss the development of DBMS from the historical perspective.
E–R Model
2. Describe the approach for handling of data in the early file-based systems.
Discuss the disadvantages of this approach.
NOTES 3. Explain the general structure of a DBMS.
4. Write a note on the features of database design.
5. Explain attributes and entity sets.
6. Discuss relationships and relationship sets.
7. Draw a diagram of the three-schema architecture of a DBMS. How can it be
used to explain the concept of data independence?
8. What do you understand by database independence?
9. What are logical and physical data independence? What is the difference
between the two?
10. What are the different types of database users? Discuss the main architecture
of each.
11. Describe any five components of a DBMS environment and discuss how
they relate to each other.
12. Discuss the concept of data independence and explain its importance in a
database environment.
13. Explain the utility of E–R modelling.
14. In the context of an E–R relationship diagram, diagrammatically represent at
least three examples for each of the following types of relationships: (a) One-
One (b) One-Many (c) Many-Many.
15. Draw an E–R diagrams of an university, a library and a banking system.
16. Design and draw an E–R diagram for a database given the following: Airplanes
have a model design (Phantom, Tomcat, Crusader, etc.), a unique side number,
total flight hours, 14 pilots have certifications (pilot, carrier, check pilot, etc.),
social security numbers, total hours and names. Pilots are scheduled to fly
airplanes on specific dates, only the latest of which is recorded.
17. Draw an ER diagram for a database catalog. A database contains tables and
attributes. Each table has a name and a set of attributes. Two tables in the
database cannot have the same name. Each attribute has a name and a type.
Two attributes in the same table cannot have the same name; however, two
attributes in different tables can have the same name.
18. Construct an E–R diagram for a hospital with a set of medical doctors and a
set of patients. Associate with each patient a log of the various tests and
examinations conducted.
36 Self-Instructional Material
Relational Model and
QUERY LANGUAGES
NOTES
Structure
2.0 Introduction
2.1 Unit Objectives
2.2 Integrity Constraints over Relations
2.2.1 Primary Key
2.2.2 Foreign Key
2.2.3 General Constraints
2.3 E–R Model to Relational Model
2.3.1 Entity Sets to Tables
2.3.2 Translating
2.3.3 Relationship Sets with Key Constraints
2.4 Translating Relationship Sets with Participation Constraints
2.5 Translating Weak Entity Sets
2.6 Translating Class Hierarchies
2.7 Translating E–R Diagrams with Aggregation
2.8 Introduction to Query Languages
2.9 Relational Algebra
2.9.1 Primitive Operations
2.9.2 Selection Operation
2.9.3 Projection Operation
2.10 Set Operations
2.10.1 Joins
2.11 Renaming
2.12 Joins
2.13 Division
2.14 Summary
2.15 Key Terms
2.16 Answers to ‘Check Your Progress’
2.17 Questions and Exercises
2.18 Learning Outcomes
2.0 INTRODUCTION
In this unit, you will learn about integrity constraints over relations. Integrity
constraints provide a way of ensuring that changes made to a database by authorized
users do not result in data inconsistency. In this regard, the concept of primary key,
foreign key and general constraints will also be explained. You will also be able to
compare the E–R model with the relational model. The E–R model facilitates
database design by enabling the designer to express the logical properties of the
database in an enterprise schema. Identification of real-world objects, referred to as
entities, forms the basis of this model.
You will learn how to translate relationship sets with participation constraints.
The participation constraint specifies whether the existence of an entity is dependent
on the existence of another entity to which it must be related. This unit will also
examine how to translate weak entity sets and class hierarchies. You will learn that
Self-Instructional Material 37
Relational Model and aggregation is a special kind of association that specifies a whole or part relationship
Query Languages
between the aggregate (whole) and a component (part).
You will be introduced to query languages, in this unit. An information retrieval
query language queries the database. It is a language of a special type, where
NOTES definition of semantics of the query are not according to a precise formal syntax;
rather, it is based on an interpretation of the most suitable results of the query put to
the information base. This unit will also discuss relational algebra, which is used
widely in computer science. It is based on the algebra of sets and is an extension of
the first-order logic. The other topics this unit will deal with are set operators,
renaming, joins and division.
R Em
E1 E2
40 Self-Instructional Material
Relational Model and
Student Faculty Query Languages
Is NOTES
Is teaching
taking
Is
Section a section Course
of
Create a relation for each entity set. It can use the same name and same set of
attributes as the entity set. The key of the entity set is used as the primary key of the
relation. For example,
Student(ssn, last, first, street address, city, state, country, zip)
Course(dept, number, title, description, credits)
Section(crn, days, time, room)
Faculty(ssn, last, first, title, dept, office, phone)
Relationships with Entity Sets
Many-to-many relationship
The attributes of this relation are the attributes of the relationship, together with the
primary keys of the two related entity sets. The primary key of the relation combines
the primary keys of the related entity sets.
In the database in Figure 2.2, the ‘is taking’ relationship would be converted
into
Enrolled(ssn, crn, gradeOption, credits)
In the case of a recursive many-to-many relationship, it is necessary to rename
the primary keys borrowed from the related entity sets. For example, if we want to
implement the relationship between courses ‘in the prerequisite of’, we would create
a relation as follows:
Prerequisite(prereqDept, prereqNumber, successorDept, successorNumber)
One-to-many relationships
There are two ways to handle this type of relationship. One is to follow essentially
the same approach as with the many-to-many relationship, by creating a new relation
to represent the relationship. For example, for the ‘is teaching’ relationship, we
would create a relation
Teaches(crn, ssn)
The primary key of the relation is just the primary key of the section, not the
combination of both primary keys.
Self-Instructional Material 41
Relational Model and An alternative is to simply include the primary key of the ‘one’ entity set in
Query Languages
the relation representing the ‘many’ entity set. In this case, we would simply add
the instructor ssn to the section relation:
Section (crn, days, time, room, instructorSsn)
NOTES
One-to-one relationships
As with the one-to-many relationships, they can be implemented by adding a new
relation or by including the primary key of one entity set in the relation representing
the other one.
n-ary relationships
Create a relation representing the relationship. Include the primary key of each
participating entity set. Any attributes of the relationship become attributes of the
relation. For example, if students, majors and faculty advisors are related by a ternary
‘advises in’ relationship, we would create a relation:
Advises(studentSsn, advisorSsn, major)
Multi-valued attributes
Suppose, you are using a version of the E–R model that permits multi-valued
attributes. Then, you might model ‘major’ as a multi-valued attribute of ‘student’.
The relational model does not permit non-atomic attributes. Therefore, to implement
this attribute you need to create a new relation to represent it. A tuple of the relation
contains a value of the attribute and the primary key of the owning entity.
For example:
MajorsIn (studentSsn, major)
This is the primary key of this relation .
In some cases, particularly where it is known that there are at most two values
of the attribute, it may be preferable to define two attributes in the owning entity,
rather than creating a separate relation. For example, a student might have a local
address and a home address.
2.3.2 Translating
One of the fastest growing areas in computer science today is the development and
application of DBMSs. These systems collect and maintain data in user-specified
relationships in support of various interrelated applications. They provide a high-
level interface between application programs and the actual data to isolate the user
from such operations as physical I/O, buffering and pointer manipulation. As new
database management systems are being developed with greater capabilities and
features; users of existing DBMS may wish to convert their application programs
and the databases to operate on the new system software. If the decision to upgrade
is made, all of the data and application programs have to be migrated to the new
DBMS. Migration of DBMS involves all aspects connected with the stored data.
The schema of the data, the data definition language, the application procedures as
well as the physical data items must be converted in the same fashion, so that the
new DBMS functions as desired. Migration is an expensive undertaking especially
if a different manufacturer’s hardware is to be acquired. In the past, when users
42 Self-Instructional Material
converted from one file organization to another file organization (for example: Relational Model and
Query Languages
Honeywell’s ISP to IBM’s ISAM), specialized programs were written to translate
the data between file systems. In addition, all the old application programs were re-
written or modified so that they would operate on the new file organization. All
these entail huge expenditure. However, in future, it is foreseeable that even greater NOTES
expenses will be incurred in order to change data from an existing DBMS to a new
DBMS.
For example, Figure 2.3 shows the E–R diagram representing the relationship,
Borrower, between Cust and Loan entity sets.
Cust_ssn Cust_phone no
Loan_no
Loan_amount
Cust_name
Cust_city
In the above ERD, consisting of Cust and Loan entity sets, the primary key of
the Cust entity set is Cust_ssn and the primary key of the Loan entity set is Loan_no.
Also, the relationship does not contain any descriptive attributes. Therefore, you
can represent the relationship Borrower by a borrower table consisting of Cust_ssn
and Loan_no. column.
The Table 2.1 lists the tabular representation of the Borrower relationship
set.
cust_ssn loan_no
S65243 25000
S65244 30000
S65245 35000
Candidate keys in a relation schema need to be unique for every tuple in any
relation instance of that schema. If a key value in the new tuple t pre-exists in
another tuple in the relation r(R), the key constraints can be violated. The relationship
Self-Instructional Material 43
Relational Model and constraint, is often superior because it avoids creating a distinct table for the
Query Languages
relationship set. The idea is to include the information about the relationship set in
the table corresponding to the entity set with the key taking advantage of the key
constraint. This approach eliminates the need for a separate relationship and queries
NOTES from a combination of two relations. If a relationship set involved n entity sets and
some m sets linked via arrows in the E–R diagram, the relation corresponding to
any one of m sets can be augmented to capture the relationship.
Head
Teacher School
of
Partial
Total
Figure 2.4 illustrates the concept of total and partial participation. Each school
must have one head master. So, the entity School participates in the relationship
Head of totally. On the contrary, from among many teachers in a school, only one is
chosen to be the head master. So here, the entity Teacher participates partially in the
relationship.
Multiplicity of the relationship set determines the key of the table. Each entity
set is translated into a table with the keys on the following basis: NOTES
• Basic cases: Entity set E = relation with attributes of E– relationship R =
relation with attributes being keys of related entity sets + attributes of R
• Special cases: Combining two relations– translating weak entity sets–
translating is–a relationships and subclasses. Weak entity set cannot exist
alone.
The weak entity is set on the following factors:
• It is represented as a table.
• It has a key that becomes a key for the table.
• All single-valued attributes become attributes of the table.
• It creates a new table for any multi-valued attributes; if employee has
many offices, add an extra table.
The following represents the weak entity set:
• To build a table/schema for the weak entity set, construct a table with one
column for each attribute in the weak entity set.
• Remember to include discriminator.
• Augment one extra column on the right side of the table; put in there the
primary key of the strong entity set (the entity set that the weak entity set
depends on).
• Primary key of the weak entity set = Discriminator + foreign key.
Translating entity sets to logical schemas and SQL DDL:
CREATE TABLE STUDENTS
(sid INTEGER,
name VARCHAR(15)
PRIMARY KEY (sid) )
CREATE TABLE COURSES
(cid CHAR(15),
subj VARCHAR(30),
sem INTEGER,
PRIMARY KEY (cid) )
A schema is generated with attributes and key(s) of each associated entity
(foreign keys):
CREATE TABLE Takes
(sid INTEGER,
cid CHAR(15),
exp-grade CHAR(1),
PRIMARY KEY (?),
Self-Instructional Material 45
Relational Model and FOREIGN KEY (cid) REFERENCES COURSES,
Query Languages
FOREIGN KEY (sid) REFERENCES STUDENTS)
Weak entity set and identifying relationship set are translated into a single
NOTES table; when the owner entity is deleted, all owned weak entities must also be deleted.
CREATE TABLE Feed_Pets (
name VARCHAR(20),
species INTEGER,
weeklyCost REAL,
ssn CHAR(11) NOT NULL,
PRIMARY KEY (pname, ssn),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE)
The procedure just explained for representing entity sets does not work for
weak entity sets, because they do not contain their own primary keys as attributes.
Some allowance must be made to handle this situation.
The solution is simply to copy the needed key attributes from related entities
into the relation representing the weak entity. For example, if departments are
modelled as entities, and the department code is part of the key of courses, then the
courses become weak entities. The department code must be included as an attribute
of course, so it can be used as part of the primary key.
46 Self-Instructional Material
• Create a table for each subclass entity set; include all attributes of that Relational Model and
Query Languages
subclass entity set and attributes of the super class entity set. Inheritance
states that one entity is a ‘special kind’ of another entity.
• ‘Subclass’ should be member of ‘base class’.
NOTES
a b c d
M R N
isa
e O S P h
g
f
Translate the E–R diagram in Figure 2.5 to relations using the first method
and then create a relation for each class and sub-class in the given hierarchy.
Algorithm for creating the E–R Diagram to Translate Class
The following are the steps of the algorithm to create a table and translate the class:
• N is an entity, so you would create a table for it: N(c,d)
• P is an entity, so you would create a table for it: P(h,g)
• Since M is a weak entity, you create one table for it and R, which contains the
key of N as a key: M_R (a,b,c), where c is a foreign key of N. Because R is
a weak entity, we must delete an M_R tuple if the corresponding N tuples
disappear.
• Now at this stage, you create a relation for O, which must include the key of
M. The key of M includes the key of N since it is a weak entity, resulting in:
O(e,a,c), where a and c are foreign keys of M_R. The c is really a foreign key
of N, but since the requirements are that you must refer to the entire key of a
table, you must have it refer to M_R’s key, rather than N’s.
• S is a many-to-many relationship, so you would create a table for it which
includes the attributes of S and the keys of O and P, which together form the
primary key of S: S(f,a,c,g), where a and c are foreign key references to O,
and g is a foreign key reference to P.
Translating this into the required SQL, you will be able to define the hierarchy.
The algorithm for creating is set logically and is predefined.
CREATE TABLE N ( c integer,
d integer,
PRIMARY KEY (c))
CREATE TABLE P ( h integer,
Self-Instructional Material 47
Relational Model and g integer,
Query Languages
PRIMARY KEY (g))
CREATE TABLE M_R ( a integer,
NOTES b integer,
c integer,
PRIMARY KEY (a,c),
FOREIGN KEY (c) REFERENCES N ON DELETE
CASCADE)
CREATE TABLE O( e integer,
a integer,
c integer,
PRIMARY KEY(a,c),
FOREIGN KEY (a,c) REFERENCES M_R)
CREATE TABLE S ( f integer,
a integer,
c integer,
g integer,
PRIMARY KEY(a,c,g),
FOREIGN KEY (a,c) REFERENCES O),
FOREIGN KEY (g) REFERENCES P)
Uses
Machinery
Id
Uses
Machinery
Id
Self-Instructional Material 49
Relational Model and Common Query Language (CQL): This language, developed and maintained
Query Languages
by the Z39.50 Maintenance Agency, is a formal language for queries into search
engines, bibliographic catalogues, etc., and is based on the semantics of Z39.50.
The design objective of CQL is to make these queries readable by humans and
NOTES writable and intuitive, yet maintaining the expressiveness of complex queries.
D: Although not a language in itself, it comprises a set of requirements given
by Christopher J. Date and Hugh Darwen and is considered a query language for
relational database management systems (TRDBMS).
Structured Query Language (SQL): It is a query language, well known for
relational databases.
MDX: It is a query language for online analytical processing (OLAP)
databases. OLAP is part of business intelligence, encompassing relational reporting
and data mining. OLAP is generally applied in: (i) Business reporting for sales and
marketing, (ii) Management reporting, (iii) Business process management (BPM),
(iv) Budgeting and forecasting, (v) Financial reporting and similar areas.
Data Mining Extensions (DMX): It is a query language used in data mining
models. The syntax of DMX is more similar to SQL than MDX.
Entity Relationship Role-Oriented Language (ERROL): It is a declarative
query language for database query and manipulation. It is a query language working
on the Entity–Relationship model (ERM) using constructs like those of natural
languages, English and others. It is the most suited to relational models and based
on E–R diagrams, described by the use of simple natural language (NL) sentences.
These simple NL sentences can be combined to form compound or complex sentences
to describe complex operations upon ERM database.
Gellish English: It is a formal language used for queries in Gellish English
databases. It is used for dialogues and also for modelling information and
knowledge.Gellish English is structured and a subset of natural English, formalized
in such a way that it can be interpreted by a computer. It is a standard for modelling
information and knowledge representation.
Information Systems Base Language (ISBL): It is a query language invented
for PRTV, one of the earliest database management systems that implemented
relational model.
Lightweight Directory Access Protocol (LDAP): It is used to query and
modify directory services running TCP/IP. The telephone directory is the most
common example which has a series of names organized alphabetically, with each
name having an address and phone number attached.
Molecular Query Language (MQL): It is a query language for a substructure
search for numerical as well as nominal properties. MQL was designed for complex
and problem-specific search methods.
Object Query Language (OQL): It is a standard for object-oriented databases
modelled after SQL and was developed by the Object Data Management Group
(ODMG). Its overall complexity prevented many vendors from complete
implementation of OQL.
Object Constraint Language (OCL): It is a declarative language used to
describe rules that find application in UML models and are now part of the UML
50 Self-Instructional Material
standard. It is a key component of the new OMG standard recommendation for Relational Model and
Query Languages
Queries/Views/Transformations (QVT) transforming models.
QUEL: It is an access language for relational database, and is similar to
SQL, in most ways.
NOTES
SPARQL Protocol and RDF Query Language (SPARQL): It is an RDF
query language for graphs, and it has a name which is a recursive acronym.
Standardized by the RDF Data Access Working Group (DAWG) of the World Wide
Web Consortium, it is considered a component of the semantic web.
Xquery: It is a query language for data sources that belong to XML and have
a few features of a programming language, designed to query collections of XML
data. It has semantic similarity to SQL.
Xpath: It is a language designed to navigate XML documents.
XSQL: It is the combination of Extensible Markup Language (XML) and
Structured Query Language (SQL) to provide a language and database independent
means for storing SQL queries, clauses and query results. XSQL development is
still in its infancy and welcomes suggestions for improvement (especially in the
form of patches).
Self-Instructional Material 51
Relational Model and 4. Set union
Query Languages
5. Set difference
6. Rename
NOTES These six operators are fundamental and omission of any one loses the
expressive power of relational expressions. Many other operators have been defined
in terms of these six. Among the most important are set operations, division and the
natural join.
So, the following operations of relational algebra will be dealt with:
1. Selection
2. Projection
3. Set operations
4. Renaming
5. Joins
6. Division
2.9.2 Selection Operation
A generalized selection is a unary operation. It is written as a propositional formula
consisting of atoms in the normal selection and with logical operator conjunction,
disjunction and negation. This selection selects all those tuples in R for which it
holds.
In relational algebra, a selection is written as σaθb(R) or σaθv(R) where:
• stands for selection
• a and b denote attribute names
• θ shows binary operation
• v denotes a value constant
• R stands for relation
The selection σaθb(R) selects all tuples in R for which θ holds between attributes
‘a’ and ‘b’.
The selection σaθv(R) also selects all tuples in R for which θ holds between the
attribute ‘a’ and the value v.
For example, consider the following diagram in which the first table is about
the relation named Person, the second table shows the result of Age e” 34 (Person),
and the third table has the result for σAge = Weight(Person).
Person Age ? 34(Person) σAge = Weight(Person)
Manju 25 55 Sonia 50 54
Sohan 29 70 Patel 32 60
Sonia 50 54
Patel 32 60
52 Self-Instructional Material
Semantics of the selection is shown mathematically as: Relational Model and
Query Languages
NOTES
2.9.3 Projection Operation
A projection is mathematically written as a1…..,an (R), where a1,...,an is a set of
attribute names. It is a unary operation. The resultant set from such an operation, is
a set, obtained when all tuples in R contain the set {a1,...,an}. All other attributes are
discarded.
For example, if we use two attributes—name and age—written as (name,
age), then the projection of the relation {(Raman, 5), (Ratan, 8)} attribute field list
(age), yields {5, 8}and age is discarded. It only gives the value of the field age. If
we project (5, 8) only the second component will give 8.
Projection in relational algebra is like a counterpart of existential quantification
in predicate logic. Existentially quantified variables are attributes excluded, i.e.,
corresponding to existentially quantified variables in the predicate, and their
extensions are represented by its projection. Thus, projection is defined as excluded
attributes. In ISBL notations for both have been provided. Other languages have
followed ISBL.
Instance of this concept exists in the category of monoids. The projection of
a string, with removal of one or more letters of an alphabet in the string is a monoid
and is a projection as well.
Let us take an example. There are two relations presented in the following
two diagrams: one is the relation Person and its projection on the attributes Age and
Weight:
Person πAge,Weight(Person)
Hari 30 70 30 70
Shila 26 62 26 62
Ganesh 27 75 27 75
Sonia 50 55 50 55
Patel 30 70
If the name is N and age is A and weight is W, then the predicate is, ‘N is A
years old and weighs W.’ Projection of this is the predicate, ‘There exists N such
that N is A years old and weighs W.’
In this example, Hari and Patel are of the same age, and they have the same
weight but (name, age) combination appears only once in the result. This is because
it is a relation.
Mathematically, semantics of projection are expressed as follows:
NOTES Such a projection, a1,...,an (R), is defined only if {a1,...,an} Header(R). The
header contains attributes and Header (R) is a set of all attributes. Projection on nil
attribute is also possible. This yields a relation of degree zero.
Relational algebra has identical power of expression, like that of domain
relational calculus or tuple relational calculus, but it has less expressive power
first-order predicate calculus without function symbols. Expression wise, relational
algebra is a subset of first-order logic. These represent horn clauses with no recursion
and no negation.
Natural join
It is a binary operator, written as (R S), where R and S are two relations. The result
of this operation is the set of all tuple-combinations in R and S equal on their common
attribute names. For example, natural join of two tables named Employee and Dept
is (Employee Dept) as shown in Table 2.3.
54 Self-Instructional Material
Table 2.3 Two Natural Join Relational Model and
Query Languages
NOTES
Name ID Dept_N Dept_N Manager Name ID Dept_N Manager
The natural join is the relational counterpart of logical AND, and has great
importance. The natural join permits combination of relations associated by a foreign
key. For example, in the previous table, a foreign key probably holds from
Employee.Dept_N to Dept.Dep_N, and then the natural join of Employee and Dept
combines every employee with their respective department. Note that this works
because the foreign key holds between attributes with the same name. If this is not
the case such as in the foreign key from Dept.manager to Emp.ID, then these columns
have to be named before the natural join is taken up. Such a join is also known as an
equijoin. Semantics of the natural join are given as follows:
Self-Instructional Material 55
Relational Model and Table 2.4 θ-join and equijoin
Query Languages
If you want to combine tuples from two relations where the combination
condition is not simply the equality of shared attributes, then it is convenient to
have a more general form of join operator, which is the θ-join (or theta-join). The θ-
join is a binary operator that is written as:
56 Self-Instructional Material
Table 2.5 Semijoin Relational Model and
Query Languages
Employee Dept
Employee Dept Employee Dept
NOTES
Name EmpId DeptName DeptName Manager Name EmpId DeptName
2.11 RENAMING
Rename column(s) in a table is available in RDBMS Oracle 9i Release 2 onwards.
Starting in Oracle 9i Release 2, you can now rename a column. To rename a column
in an existing table, the ALTER TABLE syntax is as follows:
ALTER TABLE table_name
RENAME COLUMN <old_column_name> TO <new_column_name>;
For example:
ALTER TABLE emp RENAME COLUMN ename TO empname;
The column thus remains and is called ename to empname.
To Rename a Column in SQL
SQL> ALTER TABLE NEW_TEST RENAME COLUMN ID TO TEST_ID;
Table altered.
We can use the following command to rename an index. Let us check the existing
primary key constraint and corresponding index first.
SQL> select constraint_name, constraint_Type from
user_constraints where table_name = ‘NEW_TEST’;
CONSTRAINT_NAME C
PK_NEW P
SQL> select index_name, index_Type from user_indexes where
table_name = ‘NEW_TEST’;
INDEX_NAME INDEX_TYPE
PK_NEW NORMAL
The following constraint is renamed by using the following command:
SQL> ALTER TABLE new_test RENAME CONSTRAINT PK_NEW to
PK_NEW_TEST;
Table altered.
Now, if you re-run the above two SELECT statements to see the value of the
constraint and index, you will notice that the constraint has been renamed but not
the index. In Oracle, we need to rename the primary key constraint and its associated
index individually. You will be able to rename the index now.
SQL> ALTER INDEX PK_NEW RENAME TO PK_NEW_TEST;
Index altered.
When a table is renamed, its dependent objects become invalid. One can find out
the dependent objects of a table when the table is renamed, by retrieving information
from the user_dependecies view.
The following is the SQL to get the dependent objects so that we know in
which other places we need to make the corresponding change.
58 Self-Instructional Material
SELECT Type,Name Relational Model and
Query Languages
FROM USER_DEPENDENCIES
WHERE Referenced_Name = ‘NEW_TEST’
AND Referenced_Type = ‘TABLE’
NOTES
/
The output will be displayed as follows:
TYPE NAME
PROCEDURE TEST_PROC
TRIGGER TIB_NEW
It indicates that we need to change trigger and procedure.
When a column is renamed, the name of the procedures, views, triggers, etc.,
is not available through user_dependencies view. For this, you have to search in the
source code of the object for a matching column name. The following is the SQL to
get the dependent objects which depends on a column:
SELECT Type, Name, Line
FROM user_source
WHERE text like ‘%ID%’
/
Output is
TYPE NAME LINE
PROCEDURE TEST_PROC 5
TRIGGER TIB_NEW 6
TRIGGER TIB_NEW 7
The output also indicates the line number you need to look at to make
appropriate changes.
2.12 JOINS
SQL joins are used to query data from two or more tables, based on a relationship
between certain columns in these tables.
Joins between Tables
A query that accesses multiple rows of the same or different tables at one time is
called a join query.
Storage in multitable relations and hence multitable queries involving more
than one table are fundamental to relational databases.
To combine relations, i.e., to perform a join, we have to select data from the
tables, and relate the tables to each other with conditons on some attributes (often
keys attributes).
The JOIN keyword is used in an SQL statement to query data from two or
more tables, based on a relationship between certain columns in these tables.
Tables in a database are often related to each other with keys.
Self-Instructional Material 59
Relational Model and JOINS
Query Languages
• Join conditions should be specified in the WHERE clause where a minimum
of (n–1) conditions are required to join (n) tables.
NOTES • You must add table prefixes to columns.
• Using table aliases (30 character max) will save keystrokes!
• Table aliases are specified in the FROM clause.
Types of Joins
• Equijoin
• Non-Equijoin – uses any other comparison operator other than ( = )
• Outer Join – (+) can be on either side of the WHERE clause but must be on
side of the table without the rows in the condition. Only operators ( = ) &
AND can be used.
• Self Join
A primary key is a column (or a combination of columns) with a unique value
for each row. Each primary key value must be unique within the table. The purpose
is to bind data together, across tables, without repeating all of the data in every
table. (See Table 2.7).
Table 2.7 ‘Persons’ Table
Note that the ‘P_Id’ column is the primary key in the ‘Persons’ table. This
means that no two rows can have the same P_Id. The P_Id distinguishes two persons
even if they have the same name. (See Table 2.8).
Table 2.8 ‘Orders’ Table
Note that the ‘O_Id’ column is the primary key in the ‘Orders’ table and that
the ‘P_Id’ column refers to the persons in the ‘Persons’ table without using their
names.
Notice that the relationship between the two tables above is the ‘P_Id’ column.
Different SQL JOINs
Before we continue with examples, we will list the types of JOINs you can use, and
the differences between them.
60 Self-Instructional Material
• JOINs: Return rows when there is at least one match in both tables Relational Model and
Query Languages
• LEFT JOINs: Return all rows from the left table, even if there are no matches
in the right table
• RIGHT JOINs: Return all rows from the right table, even if there are no NOTES
matches in the left table
• FULL JOINs: Return rows when there is a match in one of the tables
An equijoin is a join based on equality or matching column values. This
equality is indicated with an equal sign (=) as the comparison operator in the WHERE
clause, as Figure 2.8 shows.
SELECT * FROM manufact, stock
WHERE manufact.manu_code = stock.manu_code
Figure 2.8 joins the manufact and stock tables on the manu_code column. It
retrieves only those rows for which the values of the two columns are equal, some
of which are shown in Figure 2.9.
manu_code SMT
manu_name Smith
lead_time 3
stock_num 1
manu_code SMT
description baseball gloves
unit_price $450.00
unit case
unit_descr 10 gloves/case
manu_code SMT
manu_name Smith
lead_time 3
stock_num 5
manu_code SMT
description tennis racquet
unit_price $25.00
unit each
unit_descr each
manu_code SMT
manu_name Smith
lead_time 3
stock_num 6
manu_code SMT
description tennis ball
unit_price $36.00
unit case
unit_descr 24 cans/case
manu_code ANZ
manu_name Anza
lead_time 5
stock_num 5
manu_code ANZ
description tennis racquet
unit_price $19.80
unit each
unit_descr each
·
·
·
Figure 2.10 joins the customer_num column and then selects only those
rows where the call_dtime in the cust_calls table is greater than or equal to the
ship_date in the orders table. Figure 2.11 shows the combined rows that it returns.
order_num 1004
order_date 05/22/1998
ship_date 05/30/1998
customer_num 106
call_dtime 1998-06-12 08:20
user_id maryj
call_code D
call_descr Order received okay, but two of the cans of
ANZ tennis balls within the case were empty
res_dtime 1998-06-12 08:25
res_descr Authorized credit for two cans to customer,
issued apology. Called ANZ buyer to report
the qa problem.
order_num 1008
order_date 06/07/1998
ship_date 07/06/1998
customer_num 110
call_dtime 1998-07-07 10:24
user_id richc
call_code L
call_descr Order placed one month ago (6/7) not received.
res_dtime 1998-07-07 10:30
res_descr Checked with shipping (Ed Smith). Order out
yesterday-was waiting for goods from ANZ.
Next time will call with delay if necessary.
order_num 1023
order_date 07/24/1998
ship_date 07/30/1998
customer_num 127
call_dtime 1998-07-31 14:30
user_id maryj
call_code I
call_descr Received Hero watches (item # 304) instead
of ANZ watches
res_dtime
res_descr Sent memo to shipping to send ANZ item 304
to customer and pickup HRO watches. Should
be done tomorrow, 8/1
FIRST_NAME DEPARTMENT_NAME
Steven Executive
Neena Executive
Lex Executive
Alexander IT
Bruce IT
The syntax will be the same as that of the inner join, but the operator will be not
equal ‘<>’.
For example:
SELECT projects.name AS "Project Name", teachers.name
FROM teachers, projects
WHERE teachers.id != projects.teacher;
With table aliases, you can join a table to itself. In this case, the same table is given
two different alias names. Each alias then represents a different instance of the
table.
Example ‘projects where students 2 and 4 work together’:
SELECT DISTINCT *
FROM assign a, assign b
WHERE a.stud = 2 AND b.stud = 4 AND a.project = b.project;
This joins a table to itself. This table appears twice in the FROM clause and
is followed by table aliases that qualify column names in the join condition.
SELECT a.field1, b.field2 FROM R1 a, R1 b WHERE a.field
= b.field
A self join is a join of a table to itself. This table appears twice (or more) in
the FROM clause and is followed by table aliases that qualify column names in the
join condition and the SELECT clause. Take, for example, the employees table,
the manager of one employee is also an employee. The rows for both are in the
same employees table.
To get information about an employee and his/her manager, you have to join
the employee table to itself, thereby treating employees as if it were two separate
tables. The following example query uses a self join to return the name of each
employee along with the name of the employee’s manager:
64 Self-Instructional Material
Relational Model and
SELECT e.last_name employee, m.last_name manager Query Languages
EMPLOYEE MANAGER
Kochhar King
De Haan King
Hunold De Haan
Ernst Hunold
Austin Hunold
Pataballa Hunold
...
But
COUNT(*)
107
Even though the employees table has 107 rows, the previous query returned
only 106 rows. This is because there is an employee without a manager_id. Oracle
excludes that employee’s row from the result set while performing the self inner
join. To include that employee without a manager, you need an outer join as follows:
Self-Instructional Material 65
Relational Model and
Query Languages SELECT e.last_name employee, m.last_name manager
FROM employees e LEFT OUTER JOIN employees m
ON e.manager_id = m.employee_id;
NOTES
EMPLOYEE MANAGER
King
Kochhar King
De Haan King
Hunold De Haan
Ernst Hunold
Austin Hunold
Pataballa Hunold
...
The previous examples showed self equijoins. However, there are situations
when you need to perform self non-equijoins. For example, let’s assume that you
are in charge of organizing an interdepartmental soccer competition for the following
departments:
SELECT department_name
FROM departments
WHERE department_id IN (10, 20, 30, 40);
DEPARTMENT_NAME
Administration
Marketing
Purchasing
Human Resources
4 row(s) retrieved
You decide that each department plays against the other three departments
only once, which means that the output of the query you are going to write cannot
contain one combination Administration, Marketing and another Marketing,
Administration because that means that each department plays against the other
twice. The solution is to use a self non-equijoin query as follows:
66 Self-Instructional Material
Relational Model and
SELECT d1.department_name dept1, d2.department_name dept2 Query Languages
DEPT1 DEPT2
Administration Marketing
Administration Purchasing
Administration Human Resources
Marketing Purchasing
Marketing Human Resources
Purchasing Human Resources
6 row(s) retrieved
OUTER JOIN is used to join two tables even if there is not a match. An
OUTER JOIN can be used to return a list of all the customers and the orders even if
no orders have been placed for some of the customers. A keyword, RIGHT or LEFT,
is used to specify which side of the join returns all possible rows. I like using LEFT
because it makes sense to me to list the most important table first. Except for one
example demonstrating the RIGHT OUTER JOIN, we will use left joins. Just a
note: the keywords INNER and OUTER are optional.
The next example returns a list of all the customers and the SalesOrderID for
the orders that have been placed, if any.
It uses the LEFT keyword because the Sales.Customer table is located on the
left side and we want all rows returned from that table even if there is no match in
the Sales.SalesOrderHeader table. This is an important point. Notice also that the
CustomerID column is the primary key of the Sales.Customer table and a foreign
key in the Sales.SalesOrderHeader table. This means that there must be a valid
customer for every order placed. Writing a query that returns all orders and the
customers, if they match, doesn’t make sense. The LEFT table should always be the
primary key table when performing a LEFT OUTER JOIN.
If the location of the tables in the query are switched, the RIGHT keyword is
used and the same results are returned:
Self-Instructional Material 67
Relational Model and
Query Languages SELECT c.CustomerID, s.SalesOrderID
FROM Sales.SalesOrderHeader s
RIGHT OUTER JOIN Sales.Customer c ON c.CustomerID =
NOTES s.CustomerID
68 Self-Instructional Material
FULL OUTER JOIN returns all the joined rows, plus one row for each unmatched Relational Model and
Query Languages
left-hand row (extended with nulls on the right), plus one row for each unmatched
right-hand row (extended with nulls on the left).
(LEFT) OUTER JOIN example ‘project-percentage assignment’:
NOTES
SELECT assign.project, projects.name, assign.percentage
FROM projects LEFT OUTER JOIN assign
ON projects.id = assign.project ;
2.13 DIVISION
The division is written as R ÷ S. It is a binary operation. The result has restrictions
of tuples in R to the attribute names unique to R. These are attributes in the header
of R but not in the header of S, and that all combinations with tuples in S are present
in R. For example, the following two tables named Completed and DBProject are
shown along with their division:
Completed
Student Task
Farhan Database1
DBProject Completed ÷ DBProject
Farhan Database2
Task Student
Farhan Compiler1
Database1 Farhan
Heena Database1
Database2 Sona
Heena Compiler1
Sona Database1
Sona Database2
If DBProject has all the tasks of the Database project, then the division of
table named Completed by table DBProject contains all students who have completed
the Database project.
Self-Instructional Material 69
Relational Model and Semantics of the division is expressed mathematically as:
Query Languages
where {a1,...,an} is the set of attribute names unique to R and t[a1,...,an] is the
NOTES restriction of t to this set. The requirement is that the attribute names in the header
of S be a subset of R otherwise the result of the division operation will be empty.
2.14 SUMMARY
In this unit, you have learned about integrity constraints over relations. These provide
a way of ensuring that changes made to a database by authorized users do not result
in a loss of data consistency. An integrity constraint can be any arbitrary predicate
applied to the database. Data integrity allows defining of certain data quality
requirements that the data in the database needs to meet. In this context, primary
key, foreign key, general constraints and additional integrity constriants were also
explained.
You have also learned about the E–R model, which is a data modelling
technique that creates a graphical repesentation of entities and the relationships
between entities within an information system. It views the real world as a set of
basic objects (entities) and relationships among these objects. It is intended primarily
for the database design process by allowing the specification of an enterprise scheme.
It represents the overall logical structure of the database.
Translating relationship sets with participation constraints, translating weak
entity sets and translating class hierarchies are the other important topics that you
have learned in this unit. The participation constraint specifies whether the existence
of an entity is dependent on the existence of another entity to which it must be
related. There are two types of participation constraints—total and partial.
You have also learned about translating E–R diagrams with aggregation.
Aggregation is used to deal with a relationship set as an entity set for participation
with other relationships. It is a special kind of association that specifies a whole or
part relationship between the aggregate (whole) and a component part.
In this unit, you were also introduced to query languages and relational algebra.
An information retrieval query language makes queries into a database and is a
language where definition of semantics of the query are not according to a precise
formal syntax. It is based on an interpretation of the most suitable results of the
query put to the information base. Relational algebra is widely used in computer
science. It is based on algebra of sets and is an extension of first-order logic. It is
concerned with a set of relations associated with operators which operate on one or
more relations, yielding another relation. The other important topics that you have
learned in this unit are set operations, naming joins and division.
70 Self-Instructional Material
Relational Model and
2.15 KEY TERMS Query Languages
72 Self-Instructional Material
Relational Model
2.17 QUESTIONS AND EXERCISES andQuery
Languages
Short-Answer Questions
1. Write a short note on the concept of key. NOTES
2. What are general constraints?
3. What do you understand by translating?
4. Write short notes on any two: QL, MDX, Entity Relationship Role-Oriented
Language.
5. Briefly explain primitive operations.
6. What are set operations?
7. Write a note on binary joins.
8. What is renaming?
Long-Answer Questions
1. What are the integrity constraints over relations?
2. Explain the INSERT, DELETE and UPDATE operations.
3. Discuss the E–R model in comparison with the relational model.
4. How is the translating of relationship sets with participation constraints done?
5. Explain SQL joins.
Self-Instructional Material 73
Structured Query Language
3.0 INTRODUCTION
In this unit, you will learn about structured query language (SQL) and get an overview of
it. It is a language that interacts with relational database management systems (RDBMSs).
There are many advantages of this language. This unit will also make you understand
the basic queries in SQL and UNION. A UNION, as defined in relational algebra, is the
set union of two relations.
You will also study about INTERSECT and EXCEPT. INTERSECT is the
intersection of two relations; it returns only those rows that are returned by both component
queries. It is the logical product of two sets. Another important topic dealt with in this
unit is nested queries. These are queries within a query. It is similar to the nesting of
conditional statements, one within the other.
You will learn about aggregate operators and null values. This unit will also examine
complex integrity constraints in SQL. The latter is a user-defined rule for a column or
set of columns that allows or disallows inserts, updates or deletes of a row, based on the
value it contains for the column or set of columns.
In this unit, you will also learn about triggers and views in SQL. A trigger is a
special kind of PL/SQL anonymous block. There are database triggers, dropping triggers,
etc. Finally, you will learn about embedded SQL and cursors.
Self-Instructional Material 75
Structured Query Language
3.1 UNIT OBJECTIVES
After going through this unit, you will be able to:
NOTES • Understand structured query language
• Explain how basic queries are made in SQL
• Understand union, intersect and except
• Describe nested queries and aggregate operators
• Explain null values
• Describe complex integrity constraints in SQL
• Understand triggers and views in SQL
• Explain embedded SQL and cursors
76 Self-Instructional Material
but it is not needed to indicate how to get it. The given DBMS examines the SQL Structured Query Language
and plans the way to get the information needed.
• Language for all users: SQL could be applied by all types of users, like DBA
application programmmers, naïve users and others.
NOTES
• Standardized: The standard is established byAmerican National Standards Institute
(ANSI—an organization that approves certain standards in many different
industries). Many vendors now use the latest SQL 99 version. In 1987, the
International Standards Organization (ISO) recognized the ANSI SQL standard
as the international standard. The standard was revised again in 1992 and was
called SQL 92. The newest standard is now called SQL 99; it is also referred to
as SQL 3.
• Dynamic data definition: For embedded and multi-level database queries, SQL
gives advanced data processing commands.
SQL has become universally acceptable with many commercial relational DBMS
products, such as ORACLE, DB2, INGRES, SYBASE and others due to these factors.
3.2.1 SQL in Oracle
Like other RDBMS packages, SQL in Oracle helps certain elementary functions to
manipulate, control and define it. It also helps in defining, storing and managing data in a
centralized area so that other users can share them. Some terminologies need to be
defined when working with Oracle. A database is a collection of a defined group of
database objects like tables, indexes, tablespaces, storage areas etc. The physical structure
of the data sets used to put up the data is defined by a tablespace—a database object.
A table is another database object. Each table comprises a set of rows (horizontally)
and columns (vertically) that symbolize the conceptual view of the data to be stored. A
column is a single non-decomposable data element in a table. A view is a virtual table.
Generally, it neither stores data nor exists physically. All the tables contained in it are
owned by the tablespace. It also owns the indexes, views and columns defined for
them.
When a view is accessed, the SQL statement that defines it is executed to get
back the desired data. To efficiently access the data stored in a table, indexes are used.
Rather than its physical sequence, an index places table data in its logical sequence.
When a query is issued with one or more tables as inputs, SQL outputs a single
table known as a result set. The result table is composed of many columns and rows. It
also contains a single value taken from one column and one row. The most common
SQL operation is taking out data from a database. To issue a query a database retrieval
is called a query the SELECT command is used. A specified subset of rows from a
table is retrieved by the selection operation, and when selecting information from a table,
some primary keywords are used. These are WHERE, SELECT, FROM and ORDER
BY. FROM and SELECT are employed in every query to choose data from tables.
Forms of SQL
There are two forms of SQL:
• Interactive
• Embedded
Self-Instructional Material 77
Structured Query Language Interactive SQL operates on a database to produce output for user demand.
In embedded SQL, SQL commands can be put inside a program written in some
other language (called host language) like C, C++, etc. Data is passed to a program
environment through SQL. The combined source code is accepted by a special SQL
NOTES precompiler and, along with other tools, it is converted into an executable program.
Types of SQL commands
SQL commands are of various types to suit different purposes.
The primary types are as follows:
• Data Definition Language (DDL)
• Data Manipulation Language (DML)
• Data Query Language (DQL)
• Data Control Language (DCL)
• Transactional control language (TCL)
Data Definition Language (DDL)
Data Definition Language (DDL) is a part of SQL that allows a database user to create
and restructure database objects, such as the creation or deletion of a table. Some of the
most fundamental DDL commands include the following:
CREATE
ALTER
DROP
78 Self-Instructional Material
also control the distribution of privileges among users. Some data control commands are Structured Query Language
as follows:
GRANT
REVOKE
NOTES
Transactional Control Languages (TCL)
In addition to the previously introduced categories of commands, the following commands
allow the user to manage database transactions:
COMMIT
ROLLBACK
SAVEPOINT
SET TRANSACTION
Self-Instructional Material 79
Structured Query Language • Be 1 to 30 characters long (Database names need not exceed 8 characters)
• Not consist of a quotation mark
Notations used in the syntax of SQL commands
NOTES In this unit, the lowercase is used to identify names or conditions entered by the user, and
uppercase is used to identify SQL keywords.
• User-given data or object name or expression or condition is enclosed within
a “<‘and’>”
• Square brackets (‘[]’) identify optional items. Do not include the brackets
when we enter a query.
• A vertical bar (‘|’) indicates a choice and underlining indicates a default.
• Ellipses (‘...’) are used to specify items that may repeat.
Output:
ENAME SALARY SALARY*12
KOUSHIK GHOSH 5000 60000
JAYANTA DUTTA 3500 42000
HARI NANDAN TUNGA 4000 48000
JAYANTA GANGULY 6000 72000
RAJIB HALDER 4000 48000
JISHNU BANERJEE 6500 78000
RANI BOSE 3000 36000
GOUTAM DEY 5000 60000
PINAKI BOSE 5500 66000
9 rows selected.
Operator precedence
• Precedence over addition and subtraction is taken by multiplication and division.
• The same priority operators are evaluated from left to right.
• Parentheses are used to clarify statements and to force prioritized evaluation.
Column Aliases in SELECT
To provide an alternate name to a column or expression, aliases are used that appear as
column heading in the result set. For the above query, the following command can be
used:
SELECT ename “ Employee Name”, salary “Salary”, salary*12
“Annual salary” FROM emp;
80 Self-Instructional Material
Output: Structured Query Language
9 rows selected.
Column alias can also be given using ‘AS’ followed by alias name after column
name. The only restriction is that alias should not contain spaces.
SELECT ename AS EmployeeName, salary AS Salary, salary*12
AS Annual_salary FROM emp;
Output:
EmployeeName Salary Annual_salary
KOUSHIK GHOSH 5000 60000
JAYANTA DUTTA 3500 42000
HARI NANDAN TUNGA 4000 48000
JAYANTA GANGULY 6000 72000
RAJIB HALDER 4000 48000
JISHNU BANERJEE 6500 78000
RANI BOSE 3000 36000
GOUTAM DEY 5000 60000
PINAKI BOSE 5500 66000
9 rows selected.
|| Operator is used to concatenate character expressions, e.g.,
SELECT ‘Designation of ‘ || ename || ‘ is ’ || desg
from emp;
Output:
‘DESIGNATION OF’ || ENAME ||’IS’|| DESG
Designation of KOUSHIK GHOSH is SYSTEM ANALYST
Designation of JAYANTA DUTTA is PROGRAMMER
Designation of HARI NANDAN TUNGA is PROGRAMMER
Designation of JAYANTA GANGULY is ACCOUNTANT
Designation of RAJIB HALDER is CLERK
Designation of JISHNU BANERJEE is SYSTEM MANAGER
Designation of RANI BOSE is PROJECT ASSISTANT
Designation of GOUTAM DEY is PROGRAMMER
Designation of PINAKI BOSE is PROGRAMMER
9 rows selected.
Self-Instructional Material 81
Structured Query Language Removing Duplicates—DISTINCT
Strictly speaking, it is incorrect to refer to SQL tables as relations as SQL queries might
result in duplicate tuples. SQL gives a mechanism to do away with duplicates. It is done
by specifying the keyword DISTINCT after SELECT.
NOTES
The default display of queries is all rows, including duplicate rows.
SELECT dno FROM emp;
Output:
DNO
D01
D01
D02
D03
D03
D02
D01
D01
D02
9 rows selected.
The above command will display all the dept numbers including duplicate
dept numbers.
Instead of the above command, if we use:
SELECT DISTINCT dno FROM emp;
Output:
DNO
D01
D02
D03
82 Self-Instructional Material
WHERE Clause and Relational Operator Structured Query Language
NOTES
The relational operator condition is fulfilled when the expression on either side of
the relational operator satisfies the relation set up by the operator. There are two other
differences between the SQL operators and operators used in C/C++. First, the equality
comparison and assignment operators, both represented by a single equal sign (=), are
the same in SQL. Ambiguity is resolved by context. Second, the standard SQLinequality
operator is represented by angle brackets (<>), though Oracle also supports the C/C++-
style (!=).
Comparisons with numeric column
Query: Display all employees getting salary over Rs 5000
SELECT * FROM emp WHERE salary > 5000;
Output:
Self-Instructional Material 83
Structured Query Language o Date values are format-sensitive and character values are case-sensitive .
o The default date format is ‘DD-MON-YY’.
Query: List the details of the employee whose name is ‘JAYANTA DUTTA’
SELECT * FROM emp WHERE ename = ‘JAYANTA DUTTA’
NOTES
Output:
ECODE ENAME SALARY DNO DESG DT_JN
E02 JAYANTA DUTTA 3500 D01 PROGRAMMER 15-JAN-94
Query: Find the code, name and salary of employees of the department ‘D01’
SELECT ecode, ename, salary FROM emp WHERE dno=’D01’;
Output:
ECODE ENAME SALARY
E01 KOUSHIK GHOSH 5000
E02 JAYANTA DUTTA 3500
E07 RANI BOSE 3000
E08 GOUTAM DEY 5000
Query: List the information of the employees who are not working in the
department ‘D01’
SELECT * FROM emp WHERE dno<>’D01';
Output:
ECODE ENAME SALARY DNO DESG DT_JN
E03 HARI NANDAN TUNGA 4000 D02 PROGRAMMER 01-JUL-95
E04 JAYANTA GANGULY 6000 D03 ACCOUNTANT 12-SEP-96
E05 RAJIB HALDER 4000 D03 CLERK 07-OCT-95
E06 JISHNU BANERJEE 6500 D02 SYSTEM MANAGER 19-SEP-96
E09 PINAKI BOSE 5500 D02 PROGRAMMER 26-AUG-94
N.B: The comparison operators, when used with strings, must match case
correctly. There are SQL functions that are employed to convert strings in the database
to every uppercase or lowercase to find a case-insensitive match. The functions are
as follows:
• LOWER (CHAR) returns CHAR, with each letter in the lowercase
• UPPER (CHAR) returns CHAR, with each letter in uppercase
Query: Find the code, name and salary of employees of the department ‘D01’
SELECT ecode, ename, salary FROM emp WHERE
UPPER(dno)=’D01’;
Date and time comparisons are similar to comparisons with numbers because
underlying every date and time is a number. In Oracle, there is a single data type,
DATE, that represents both date and time with a single number. If we want to compare
a date column with another date, we can use a string literal in the default date format
and Oracle performs the conversion for:
84 Self-Instructional Material
Query: List the employees who joined after 20th September, 1996 Structured Query Language
SELECT * FROM emp WHERE dt_jn>’20-SEP-96';
Output:
ECODE ENAME SALARY DNO DESG DT_JN NOTES
E07 RANI BOSE 3000 D01 PROJECT ASSISTANT 17-JUN-97
E08 GOUTAM DEY 5000 D01 PROGRAMMER 23-OCT-97
The current date and time can be obtained by using the pseudo column,
SYSDATE. Oracle automatically stores information, such as row numbers and
row descriptions, and Oracle is directly accessible, i.e., not through tables. This
information is contained within pseudo columns. These pseudo columns can be
retrieved in queries and can be included in queries, that select data from tables.
Available pseudo columns in oracle include the following:
• ROWNUM: Row number. Order number in which a row value is retrieved.
• ROWID: Physical row (memory or disk address) location, i.e., unique row
identification.
• SYSDATE: System or today’s date.
• UID: User identification number indicating the current user.
• USER: Name of the user currently logged in.
Oracle table ‘DUAL’
Oracle automatically creates a DUAL table along with the data dictionary. DUAL is in the
schema of the user SYS, however, it is accessible by the name DUAL to all users. It has
one column, DUMMY, defined to be VARCHAR2(1), and contains one row with a value
‘X’. Selecting from the DUAL table is useful for computing a constant expression with
the SELECT statement. Since, DUAL has only one row, the constant is returned only
once. Each Oracle account has access to a table called dual. We can query against this
table to get the current account, system date/time, and execute mathematical functions.
The following example shows how to obtain the username used when the user
logged into Oracle:
SELECT USER FROM DUAL;
Output:
USER
——————
MANAS
Output:
SYSDATE
——————————
15-APR-05
SELECT POWER(4,3) FROM DUAL;
Output:
POWER(4,3)
—————
64
Self-Instructional Material 85
Structured Query Language Query: List all of the names of employees in the EMP table, the date they
joined and the current system date
SELECT ename, dt_jn,SYSDATE FROM emp;
Output:
NOTES
ENAME DT_JN SYSDATE
KOUSHIK GHOSH 10-MAR-93 15-APR-05
JAYANTA DUTTA 15-JAN-94 15-APR-05
HARI NANDAN TUNGA 01-JUL-95 15-APR-05
JAYANTA GANGULY 12-SEP-96 15-APR-05
RAJIB HALDER 07-OCT-95 15-APR-05
JISHNU BANERJEE 19-SEP-96 15-APR-05
RANI BOSE 17-JUN-97 15-APR-05
GOUTAM DEY 23-OCT-97 15-APR-05
PINAKI BOSE 26-AUG-94 15-APR-05
9 rows selected.
Suppose there is one of the employees, ASOK BASU, who has joined today.
Then SYSDATE and dt_jn are the same. But if we query for the employees joined
today using SYSDATE, we will find no results. When the dt_jn were entered using
the default date format that does not have a time part, the time is defaulted to 12:00:00
a.m. SYSDATE, which is the current date and time, does have a time part even
though only the date part is displayed by default.
SELECT * FROM emp WHERE dt_jn=SYSDATE;
The above command will display the message ‘no rows selected’.
The TRUNC function is used to remove the time part of an Oracle DATE.
This provides us a way to compare the dates in the table with today’s date,
disregarding any hours, minutes or seconds.
SELECT * FROM emp WHERE dt_jn=TRUNC(SYSDATE);
Similarly, suppose the date of joining column is filled with SYSDATE as follows:
INSERT INTO emp
VALUES (‘E11’, ‘KUNTAL GHOSH’,5000, ‘D03’, ‘JR ASSISTANT’,
SYSDATE);
Say SYSDATE is 20-JUL-05.
When the condition is satisfied, there will be a output in the tabular form displaying
the joining date equal to the system date.
WHERE Clause with Logical Operator
AND, OR, NOT are known as logical operators.
Operator Meaning
AND Returns TRUE if both component conditions are TRUE
OR Returns TRUE if either component condition is TRUE
NOT Returns TRUE if the following condition is FALSE
86 Self-Instructional Material
SQLhas three logical values, TRUE, FALSE and NULL. Every condition, simple Structured Query Language
or compound, evaluates to one of these three values. In a WHERE clause, if this condition
evaluates to TRUE, the row is returned if it is part of a SELECT statement. If it is
FALSE or NULL, it is not.
NOT NOTES
FALSE and NULL are not the same. When FALSE is negated, TRUE is obtained. But
when NULL is negated, we still get NULL. Table 3.2 is the truth table for
NOT:
Table 3.2 Truth Table for NOT
NOT
TRUE FALSE
FALSE TRUE
NULL NULL
Query: List the employees who do not work in the department ‘D01’
SELECT * FROM emp WHERE dno <> ‘D01’;
Output:
ECODE ENAME SALARY DNO DESG DT_JN
E03 HARI NANDAN TUNGA 4000 D02 PROGRAMMER 01-JUL-95
E04 JAYANTA GANGULY 6000 D03 ACCOUNTANT 12-SEP-96
E05 RAJIB HALDER 4000 D03 CLERK 07-OCT-95
E06 JISHNU BANERJEE 6500 D02 SYSTEM MANAGER 19-SEP-96
E09 PINAKI BOSE 5500 D02 PROGRAMMER 26-AUG-94
AND
AND is used to combine two conditions. Table 3.3 is the truth table for AND:
Table 3.3 Truth Table for AND
Output:
ECODE ENAME SALARY DNO DESG DT_JN
E01 KOUSHIK GHOSH 5000 D01 SYSTEM ANALYST 10-MAR-93
Self-Instructional Material 87
Structured Query Language
3.4 UNION
The outer join can be used on only one side of the join condition. Using the outer join on
NOTES both sides will cause an error to be issued. Since Oracle will issue an error, then another
way to get an outer join is to use UNION.
SELECT ecode, ename, D.dno, dname
FROM emp E, dept D
WHERE E.dno(+)=D.dno
UNION
SELECT ecode, ename, E.dno, dname
FROM emp E, dept D
WHERE E.dno=D.dno(+);
Output:
ECODE ENAME DNO DNAME
E01 KOUSHIK GHOSH D01 PROJECT
E02 JAYANTA DUTTA D01 PROJECT
E03 HARI NANDAN TUNGA D02 RESEARCH
E04 JAYANTA GANGULY D03 PERSONNEL
E05 RAJIB HALDER D03 PERSONNEL
E06 JISHNU BANERJEE D02 RESEARCH
E07 RANI BOSE D01 PROJECT
E08 GOUTAM DEY D01 PROJECT
E09 PINAKI BOSE D02 RESEARCH
E10 RAJA SEN ... ...
... ... D04 EDUCATION
11 rows selected.
Notice the last two rows that are in bold fonts. The first query gets the
employees of all departments including department ‘D04’ in which no employee
has been assigned. The second query gets all the employees including employee ‘E10’
who has not been assigned to any department. In the case of the first query, any dept
without employee would return NULL values for the two columns, ecode and ename. In
the case of the second query, any employee without dno would return NULL values for
the two columns, dno and dname. The UNION puts both together and eliminates any
repetitions.
UNION, as defined in relational algebra, is the set union of two relations. The
UNION returns all the distinct rows returned by either of the queries it is applied to.
Since a mathematical set contains no duplicates, the set union also contains no duplicates,
and any row appearing in both results must appear only once in the result.
As the UNION operation eliminates duplicates, DISTINCT, while syntactically
correct, is not necessary.
Query: List all employees who are working in department ‘D01’ or ‘D03’
SELECT * FROM emp WHERE dno=’D01’
UNION
SELECT * FROM emp WHERE dno=’D03’
88 Self-Instructional Material
Output: Structured Query Language
6 rows selected.
This UNION could have been done more concisely by using an IN clause.
SELECT * FROM emp WHERE dno IN(’D01’,’D03’);
However, it is not easy if the two component queries are from different tables.
UNION ALL
The UNION ALL operator merges the result sets of two component queries. This
operation returns rows retrieved by either of the component queries without eliminating
duplicates.
Query: List code, name and designation of all employees who are working in
the project ‘P01’ or ‘P03’
SELECT ecode, ename, desg FROM emp WHERE ecode IN (SELECT
ecode FROM assign WHERE pid=’P01’)
UNION ALL
SELECT ecode, ename, desg FROM emp WHERE ecode IN (SELECT
ecode FROM assign WHERE pid=’P03’);
Output:
ECODE ENAME DESG
E01 KOUSHIK GHOSH SYSTEM ANALYST
E02 JAYANTA DUTTA PROGRAMMER
E08 GOUTAM DEY PROGRAMMER
E01 KOUSHIK GHOSH SYSTEM ANALYST
E03 HARI NANDAN TUNGA PROGRAMMER
E06 JISHNU BANERJEE SYSTEM MANAGER
E09 PINAKI BOSE PROGRAMMER
7 rows selected.
Notice that because the ALL clause was used, the duplicates were retained in the
result. The UNION eliminates the duplicates.
SELECT ecode, ename, desg FROM emp WHERE ecode IN (SELECT
ecode FROM assign WHERE pid=’P01’)
UNION
SELECT ecode, ename, desg FROM emp WHERE ecode IN (SELECT
ecode FROM assign WHERE pid=’P03’);
Self-Instructional Material 89
Structured Query Language Output:
ECODE ENAME DESG
E01 KOUSHIK GHOSH SYSTEM ANALYST
E02 JAYANTA DUTTA PROGRAMMER
NOTES
E03 HARI NANDAN TUNGA PROGRAMMER
E06 JISHNU BANERJEE SYSTEM MANAGER
E08 GOUTAM DEY PROGRAMMER
E09 PINAKI BOSE PROGRAMMER
6 rows selected.
Now let us observe the result by running the query using EXCEPT command.
SELECT COL1 FROM TEST1 EXCEPT SELECT COL1 FROM TEST2
GO
This query will return following values. Data which exists in the left query but not
in the right query.
ALABAMA
FLORIDA
90 Self-Instructional Material
SELECT COL1 FROM TEST2 EXCEPT SELECT COL1 FROM TEST1 Structured Query Language
GO
Output:
ECO DE
——————
E01
E02
E03
E06
E07
E08
E09
————
7 rows selected.
The INTERSECT operation can be implemented using the EXISTS clause as
follows:
SELECT ecode FROM emp
WHERE EXISTS (SELECT * FROM assign WHERE
emp.ecode=assign.ecode);
MINUS
MINUS, as defined in relational algebra, is the set difference of two relations. It
returns only rows returned by the first component query but not by the second. It is the
difference between two sets. Since a mathematical set contains no duplicates, the set
difference contains no duplicates. Since the MINUS operation eliminates duplicates,
DISTINCT, while syntactically correct, is not necessary.
Query: Retrieve code of the employees who are not assigned in any project
SELECT ecode FROM emp
MINUS
SELECT DISTINCT ecode FROM assign;
Self-Instructional Material 91
Structured Query Language Output:
ECODE
—————
E04
E05
NOTES
The above query can be used in a subquery to list the employee details.
Query: Retrieve employee details who are not assigned in any project
SELECT * FROM emp
WHERE ecode IN (SELECT ecode FROM emp
MINUS
SELECT DISTINCT ecode FROM assign);
Output:
ECODE ENAME SALARY DNO DESG DT_JN
E04 JAYANTA GANGULY 6000 D03 ACCOUNTANT 12-SEP-96
E05 RAJIB HALDER 4000 D03 CLERK 07-OCT-95
The MINUS operation can be implemented using the NOT EXISTS clause as
follows:
SELECT ecode FROM emp
WHERE NOT EXISTS ( SELECT * FROM assign WHERE
emp.ecode=assign.ecode);
Output:
ECODE
—————
E04
E05
DIVIDE
In SQL, the division operation is not directly supported. It is, however, defined as a
component of Relational Algebra. According to Date, ‘DIVIDE takes two relations, one
binary and one unary, and builds a relation consisting of all values of one attribute of the
binary relation that match (in the other attribute) all values in the unary relation’. Using
subquery with the existence test, the DIVISION operation can be implemented. Consider
the following query:
Query: List the employees who are working on all the projects
SELECT * FROM emp E WHERE NOT EXISTS(SELECT * FROM project P
WHERE NOT EXISTS(SELECT * FROM assigns A
WHERE A.pid=P.pid AND E.ecode=A.ecode));
Alternatively, it can be written as follows:
SELECT * FROM emp WHERE ecode IN( SELECT ecode FROM
assign GROUP BY ecode HAVING COUNT(ecode)=(SELECT
COUNT(pid) FROM project));
Output:
ECODE ENAME SALARY DNO DESG DT_JN
E01 KOUSHIK GHOSH 5000 D01 SYSTEM ANALYST 10-MAR-93
92 Self-Instructional Material
ORDER BY clause in set operation Structured Query Language
Output:
ECODE ENAME
D01 PROJECT
D03 PERSONNEL
E01 KOUSHIK GHOSH
E02 JAYANTA DUTTA
E07 RANI BOSE
E08 GOUTAM DEY
6 rows selected.
Notice that the column name used in the ORDER BY clause of this query is
taken from the first component query. Specifying columns from the second
component query will generate an error as in the following example:
SELECT dno, dname FROM dept WHERE city=’KOLKATA’
UNION
SELECT ecode, ename FROM emp WHERE dno=’D01’
ORDER BY ecode;
Output:
ORDER BY ecode
*
ERROR at line 4:
ORA-00904: invalid column name
The solution is that the column should be specified in the ORDER BY clause
by column position as follows:
SELECT dno, dname FROM dept WHERE city=’KOLKATA’
UNION
SELECT ecode, ename FROM emp WHERE dno=’D01’
ORDER BY 1;
Self-Instructional Material 93
Structured Query Language Output:
ECODE ENAME
D01 PROJECT
D03 PERSONNEL
NOTES
E01 KOUSHIK GHOSH
E02 JAYANTA DUTTA
E07 RANI BOSE
E08 GOUTAM DEY
6 rows selected.
Precedence of Set Operators
If more than two component queries are combined using the set operators, then the
expression is evaluated from left to right.
SELECT ecode, ename FROM emp WHERE dno=’D01’
UNION
SELECT ecode, ename FROM emp WHERE ecode IN (SELECT
ecode FROM assign
WHERE pid=’P02’)
INTERSECT
SELECT ecode, ename FROM emp WHERE desg=’PROGRAMMER’;
Output:
ECODE ENAME
E02 JAYANTA DUTTA
E08 GOUTAM DEY
94 Self-Instructional Material
Structured Query Language
3.6 NESTED QUERIES
Nested queries are queries within a query. These are subqueries. It is something like
nesting of conditional statements, one within the other. NOTES
Subqueries have similarity with chaining of the command, SELECT. Chaining
with SELECT combines one or more SELECT commands on the same level in a query.
But nested queries or subqueries allow embedding of one or more SELECT commands
inside the body of other queries.
They replace a constant (Scalar Subqueries).
They can return a list of values for a comparison.
They can replace a constant, but may vary, which is based on the row being
processed.
The following subquery expressions return Boolean values, true or false.
EXISTS (subquery)
The argument of EXISTS is a SELECT statement, chosen arbitrarily. This subquery
returns one or more rows. If at least one row is returned, the result of EXISTS is
TRUE, FALSE otherwise. The subquery may be variables from the surrounding
query, acting as constants during any one evaluation of the subquery.
One example is being given below:
SELECT col1 FROM tab1
WHERE EXISTS(SELECT 1 FROM tab2 WHERE col2 = tab1.col2);
The above example is also an inner join on col2, producing at most one row
as output row for each row of tab1, although there may be multiple matching rows
of tab2.
We take an example in which students have been given projects. It is named
as ‘Students in Projects’. The following nested queries are given:
SELECT name FROM stud WHERE EXISTS (SELECT 1 FROM assign
WHERE stud = stud.id ) ;
The result of command depends on what is stored in a file named ‘assign’
under the filed stud matching a particular stud.id as put in the nested query. A typical
result may be:
Name
———
Fajal
James
Lily
(3 rows)
[ NOT ] IN or IN [ NOT ]
This query word has two forms, scalar form and subquery form.
Self-Instructional Material 95
Structured Query Language IN [ NOT ] (scalar form)
This is used as:
expression IN (value[, ...])
NOTES The part on the right of IN, contains the list of scalar expressions which is TRUE,
if part to the left of IN, written here as expression, equals any of the right-hand
expressions.
IN [ NOT ] (subquery form)
expression IN (subquery)
The part on the right side of IN, is a parenthesized subquery. This must return
exactly one column. The expression on the left of IN is evaluated and compared to
every row of the subquery result. If any subquery row is found equal, the result of
IN is TRUE.
For example, a query “Is Fazal in Project 1? (—Yes)”:
SELECT id, name FROM stud WHERE id in ( SELECT stud FROM assign
WHERE id = 1 ) ;
id | name
——+——
1 | Fazal
(1 row)
ANY and SOME
These two query words are used in the following way:
expression operator ANY (subquery)
expression operator SOME (subquery)
The subquery on the right side of ANY is a subquery under parenthesis must
return exactly one column. The left part of ANY is evaluated and compared to each
row of the subquery result and uses the given operator, and it must yield a Boolean
result which is TRUE if any true result is obtained.
SOME and ANY are synonyms and IN is equivalent to = ANY.
ALL
It is written like:
expression operator ALL (subquery)
The right part of ALL is a subquery put under parenthesis, and it must return only
one column. The expression to the left of ALL is evaluated and then compared to each
row of the subquery result by using the given operator, and this must yield a Boolean
result which is TRUE if all rows yield TRUE. This also includes the special case when
the subquery returns no row.
NOT IN is equivalent to <> ALL.
96 Self-Instructional Material
Row-wise comparison Structured Query Language
Self-Instructional Material 97
Structured Query Language test (> )
test -> ORDER BY firstname, lastname;
98 Self-Instructional Material
Structured Query Language
12
2 4 6 NA NOTES
The average operator calculates the average of all real data, producing an aggregate
value of ((2 + 4 + 6)/3) = 4, as shown in the Figure 3.2.
2 4 6 NA
The hierarchical operators include null values in the count of cells. In the following
figure, the hierarchical average operator produces an aggregate value of ((2 + 4 + 6
+NA)/4) = 3.
2 4 6 NA
20
2 4 6 NA
Weight
3 2 NA 4
In Figure 3.6 the Maximum is calculated first down one dimension of the cube,
and then the Sum is calculated across the other dimension. The maximum value of the
sums in Figure 3.5 is 15, while the sum of the maximum values is 19 in Figure 3.6.
other value from any attribute domain. At the same time, it belongs to every attribute
domain in the database, i.e., it may appear as a value for any attribute and not violate any
type constraints. Syntactically, different DBMSs use different symbols to denote null
values. The null values affect relational operations. All relational operations involve NOTES
comparing values in tuples, including projection that involves comparison of result tuples
for duplicates. The keyto answering this question is in how we evaluate boolean operations
involving null values. For example, what does ‘? > 5’ evaluate to? The unknown value
could be greater than 5. But then again, it may not be. That is, the value of the boolean
expression cannot be determined on the basis of available information. They all rely on
comparisons evaluating categorically to one of two values: TRUE or FALSE. For example,
if the above comparison ‘? > 5’ was generated in the process of selection, you would not
know whether to include or exclude the associated tuple in the result if you were to
admit a third value UNKNOWN. If you wanted to do that, you must go back and
redefine all these operations based on some form of three-valued logic. To avoid this
problem, most systems that allow null values simply interpret any comparison involving
them as FALSE. The rationale is that even though they could be true, they are not
demonstrably true on the basis of what is known. That is, the result of any relational
operation conservatively includes only tuples that demonstrably satisfy conditions
of the operation. Adopting this convention, all the operations defined previously
still hold without any amendment. Some implications on the outcome of each
operation are considered here.
For the Select operation, an unknown value cannot identify a tuple. This is illustrated
in Figure 3.7 that shows two Select operations applied to the relation R. The selection
criteria ranges over the entire domain of the attribute I2 that exists between the two
operations, One would expect, therefore, that any tuple in R1 would either be in the
result of the first or the second. This is not the case, however, as the second tuple in R1
‘<b,?>’ is not selected in either operation, the unknown value in it falsifies the selection
criteria of both operations.
R...
R1 I1 I2
Select R1 Where I2<=2
I1 I2 a 1
a 1
b ? R...
Select R1 Where I2>=2
c 3 I1 I2
c 3
For Projection, tuples containing null values that are otherwise identical are not
considered to be duplicates. This is because the comparison ‘? = ?’, by the above
convention, evaluates to FALSE. This leads to the situation as illustrated in Figure 3.8.
The reader should note from this example that the symbol ‘?’, while it denotes some
value much like a mathematical variable, is quite unlike the latter in that its occurrences
do not always denote the same value. Thus ‘? = ?’ is not demonstrably true and therefore
considered FALSE.
Self-Instructional Material 101
Structured Query Language
R1 R ...
I1 I2 I3
Project Over 12, 13
I2 I3
a 1 ? 1 ?
NOTES
b 1 ? 2 ?
c 1 ? 3 ?
In a Join operation, tuples having null values under the common attributes are not
concatenated. This is illustrated in Figure 3.9 ‘?=1’, ‘1=?’ and ‘?=?’ are all FALSE.
R1 R2
I1 I2 I2 I3
a 1 Join R1 and R2 1 x
b ? ? y
c ? ? z
R1
I1 I2 I3
a 1 x
In Division, the occurrence of even one null value in the divisor means that the
result will be an empty relation, as any value in the dividend’s common attributes will fail
when matched with it. This is illustrated in Figure 3.10. It is not necessarily the case if
only the dividend contains null values under the common attributes, division may still be
successful on tuples not containing null values.
R1
R2
I1 I2
a 1 Drive R1 by R2 I2
?
a ?
b ?
R .. .
I1
Empty!
EMPNO MGR
210
211 210
212 211
UPDATE employees
SET employee_id = employee_id + 5000,
manager_id = manager_id + 5000;
Even though a constraint is defined to verify that each mgr value matches an
empno value, this statement is legal because Oracle effectively performs its constraint
checking after the statement completes. (See Figure 3.12).
The examples in this section illustrate the constraint checking mechanism during
INSERT and UPDATE statements. The same mechanism is used for all types of DML
statements, including UPDATE, INSERT and DELETE statements.
The complex integrity constraints use the checking mechanism. The same
mechanism is used for all types of constraints, including the following:
• NOT NULL
• UNIQUE key
• PRIMARY KEY
• All types of FOREIGN KEY constraints
• CHECK constraints
104 Self-Instructional Material
Structured Query Language
3.10 TRIGGERS AND VIEWS IN SQL
1. Triggers
NOTES
A trigger is a special kind of PL/SQL anonymous block. You can define triggers to fire
before or after SQL statements, either on a statement level or for each row that is
affected. You can also define INSTEAD OF triggers or system triggers (triggers on
DATABASE and SCHEMA).
Triggers are PL/SQL functions that are executed based on some specific activity
or condition in the form. These activities, called events, are the foundation of Oracle
Forms’ user extensibility feature. While many trigger events perform a particular default
function, a trigger can be written to disable, modify or enhance the default processing
capabilities associated with the event. Triggers can be defined at the form, block record
or individual field level. In addition, a trigger may cause another trigger to ‘fire’ as well
as execute another related program unit.
A trigger is a PL/SQL block structure which is fired when DML statements like
Insert, Delete and Update are executed on a database table. A trigger is triggered
automatically when an associated DML statement is executed.
Anonymous Blocks in Oracle Tools
Database triggers
A database trigger is a stored subprogram associated with a table. You can have Oracle
automatically fire the database trigger before or after an INSERT, UPDATE or DELETE
statement affects the table. One of the many uses for database triggers is to audit data
modifications; for example, the following database trigger fires whenever salaries in the
emp table are updated:
CREATE TRIGGER audit_sal
AFTER UPDATE OF sal ON emp
FOR EACH ROW
BEGIN
INSERT INTO emp_audit VALUES ...
END;
You can use all the SQLdata manipulation statements and any procedural statement
in the executable part of a database trigger.
The PL/SQLeditor (shown in Figure 3.13) will now appear, indicating the trigger
level and the name of the trigger with an area that may be used to enter the trigger
procedure as shown. Once the text has been entered, click the Compile button to make
sure that there are no code errors and then click the Close button to complete the trigger
definition. Note the colon used to reference form fields. Other buttons can be used to
revert to the version prior as of the last close or compilation, to create a new trigger, or
to delete the existing trigger.
Finally, WHEN-BUTTON-PRESSED triggers should be created for the three
buttons that were created on the form canvas as follows:
Save: begin
commit_form;
end;
Clear: begin
clear_form;
end;
Exit: begin
do_key (‘EXIT_FORM’);
end;
Self-Instructional Material 109
Structured Query Language The first two triggers use standard built-in procedures, while the third uses the
DO_KEY built-in procedure to simulate pressing the Exit key.
The Exit key is defined by the keyboard mapping for your display device. On the
IBM PC keyboard, this normally defaults to the Esc key, but may be modified using
NOTES Oracle Terminal. To view the actual key mapping during Oracle Forms runtime, type
Cntrl-K or select show keys from the help menu in the form.
This is generally a good practice to use when there is more than one way to
perform the same function. Any special logic that needs to be performed prior to exiting
the form can be coded in a KEY-EXIT trigger to ensure consistency. Additionally, a
trigger can be written so that it calls a user defined program unit. For example, if the
customer maintenance form changes the credit limit for a customer, an acceptance
letter should be sent to the customer. The POST-COMMIT trigger should be written as
follows:
begin
if :customers.cust_cre dit_limit > 0 and
:customers.old_credit is null then
print_confirmation_letter (:customers.cust_no);
end if;
end;
Property Value
Canvas ITEM_CANVAS
Displayed True
Width 200
Height 200
Database Table Item False
These fields will need to be populated with data whenever an ITEMS record is
queried. To do this, create the POST-QUERY trigger on the ITEMS block as follows:
begin
select pl.pl_name, pc.pc_name, prod.prod_name, cp.cp_name,
s.size_desc
into :items.pl_name, :items.pc_name, :items.prod_name,
:items.cp_name,
:items.size_desc
from product_lines pl, product_classes pc, products prod,
color_patterns cp, sizes s
where pl.pl_id = pc.pc_pl_id
and pc.pc_pl_id = prod.prod_pl_id
and pc.pc_id = prod.prod_pc_id
and prod.prod_pl_id = :items.item_pl_id
and prod.prod_pc_id = :items.item_pc_id
and prod.prod_no = :items.item_prod_no
and cp.cp_no = :items.item_cp_no
and s.size_code = :items.item_size_code;
exception
when NO_DATA_FOUND then
message (‘Database Integrity Error. Contact
your DBA.’);
bell;
raise FORM_TRIGGER_FAILURE;
end;
SQL Statement
UPDATE t1 SET . . .;
Fires the
UPDATE_T1 UPDATE_T1 Trigger
Trigger BEFORE UPDATE 01T t1
FOR EACH ROW
BEGIN
.
.
INSERT INTO t2 VALUES (...);
END;
Fires the
INSERT_T2
INSERT_T2 Trigger
Trigger BEFORE INSERT ON t2
FOR EACH ROW
BEGIN
.
.
INSERT INTO ... VALUES (...);
END;
etc.
SQL Statement
Executor
Database
2. SQL Operators
PL/SQL lets you use all the SQL comparison, set and row operators in SQL statements.
This section briefly describes some of these operators.
Comparison operators
Typically, you can use comparison operators in the WHERE clause of a data manipulation
statement to form predicates that compare one expression to another and yield TRUE,
FALSE or NULL. You can use the comparison operators in the following list to form
predicates. You can also combine predicates using the logical operators AND, OR and
NOT.
Operator Description
ALL Compares a value to each value in a list or returned by a
subquery and yields TRUE if all of the individual comparisons
yield TRUE.
ANY, SOME Compares a value to each value in a list or returned by a
subquery and yields TRUEif any of the individual comparisons
yields TRUE.
BETWEEN Tests whether a value lies in a specified range.
120 Self-Instructional Material
EXISTS Returns TRUE if a subquery returns at least one row. Structured Query Language
The cursor is not a PL/SQL variable: you cannot assign values to a cursor or use
it in an expression. Cursors and variables follow the same scoping rules. Naming cursors
after database tables is possible but not recommended.
A cursor can take parameters that can appear in the associated query wherever
constants can appear. The formal parameters of a cursor must be IN parameters; they
supply values in the query, but do not return any values from the query. You cannot
impose the constraint NOT NULL on a cursor parameter.
As the following example shows, you can initialize cursor parameters to default
values. You can pass different numbers of actual parameters to a cursor, accepting
or overriding the default values as you please. Also, you can add new formal
parameters without having to change existing references to the cursor.
DECLARE
CURSOR c1 (low NUMBER DEFAULT 0, high NUMBER DEFAULT 99)
IS
SELECT * FROM departments WHERE department_id > low
AND department_id < high;
Cursor parameters can be referenced only within the query specified in the cursor
declaration. The parameter values are used by the associated query when the cursor is
opened.
Opening a cursor
Opening the cursor executes the query and identifies the result set that consists of all
rows that meet the query search criteria. For cursors declared using the FOR UPDATE
clause, the OPEN statement also locks those rows. An example of the OPEN statement
follows:
DECLARE
CURSOR c1 IS SELECT employee_id, last_name, job_id, salary
FROM employees
WHERE salary > 2000;
BEGIN
OPEN C1;
Rows in the result set are retrieved by the FETCH statement but not when the
OPEN statement is executed.
Unless you use the BULK COLLECT clause, the FETCH statement retrieves the rows
in the result set one at a time. Each fetch retrieves the current row and advances the
cursor to the next row in the result set. You can store each column in a separate variable
or store the entire row in a record that has the appropriate fields, usually declared using NOTES
%ROWTYPE.
For each column value returned by the query associated with the cursor, there
must be a corresponding, type-compatible variable in the INTO list. Typically, you use
the FETCH statement with a LOOP and EXIT WHEN .. NOTFOUND statements.
Note the use of built-in regular expression functions in the queries.
Example
DECLARE
v_jobid employees.job_id%TYPE; — variable for
job_id
v_lastname employees.last_name%TYPE; — variable for
last_name
CURSOR c1 IS SELECT last_name, job_id FROM employees
WHERE REGEXP_LIKE (job_id,
‘S[HT]_CLERK’);
v_employees employees%ROWTYPE; — record variable
for row
CURSOR c2 is SELECT * FROM employees
WHERE REGEXP_LIKE (job_id,
‘[ACADFIMKSA]_M[ANGR]’);
BEGIN
OPEN c1; — open the cursor before fetching
LOOP
FETCH c1 INTO v_lastname, v_jobid; — fetches 2 columns
into variables
EXIT WHEN c1%NOTFOUND;
DBMS_OUTPUT.PUT_LINE( RPAD(v_lastname, 25, ‘ ‘) ||
v_jobid );
END LOOP;
CLOSE c1;
DBMS_OUTPUT.PUT_LINE( ‘——————————————————’ );
OPEN c2;
LOOP
FETCH c2 INTO v_employees; — fetches entire row into
the v_employees record
EXIT WHEN c2%NOTFOUND;
DBMS_OUTPUT.PUT_LINE( RPAD(v_employees.last_name, 25,
‘ ‘) ||
v_employees.job_id );
END LOOP;
CLOSE c2;
The query can reference PL/SQL variables within its scope. Any variables in the
NOTES query are evaluated only when the cursor is opened. In each retrieved salary is multiplied
by 2, even though factor is incremented after every fetch.
Example
Referencing PL/SQL variables within its scope
DECLARE
my_sal employees.salary%TYPE;
my_job employees.job_id%TYPE;
factor INTEGER := 2;
CURSOR c1 IS
SELECT factor*salary FROM employees WHERE job_id =
my_job;
BEGIN
OPEN c1; — factor initially equals 2
LOOP
FETCH c1 INTO my_sal;
EXIT WHEN c1%NOTFOUND;
factor := factor + 1; — does not affect FETCH
END LOOP;
CLOSe c1;
END;
/
To change the result set or the values of variables in the query, you must close
and reopen the cursor with the input variables set to their new values. However, you can
use a different INTO list on separate fetches with the same cursor. Each fetch retrieves
another row and assigns values to the target variables, as shown the following example.
Example:
DECLARE
CURSOR c1 IS SELECT last_name FROM employees ORDER BY
last_name;
name1 employees.last_name%TYPE;
name2 employees.last_name%TYPE;
name3 employees.last_name%TYPE;
BEGIN
OPEN c1;
FETCH c1 INTO name1; — this fetches first row
FETCH c1 INTO name2; — this fetches second row
FETCH c1 INTO name3; — this fetches third row
CLOSE c1;
END;/
If you fetch past the last row in the result set, the values of the target variables
are undefined. Eventually, the FETCH statement fails to return a row. When that happens,
124 Self-Instructional Material
no exception is raised. To detect the failure, use the cursor attribute %FOUND or Structured Query Language
%NOTFOUND.
Fetching bulk data with a cursor
The BULK COLLECT clause lets you fetch all rows from the result set at once. In the NOTES
following example, you bulk-fetch from a cursor into two collections.
Example:
Fetching bulk Data with a cursor
DECLARE
TYPE IdsTab IS TABLE OF employees.employee_id%TYPE;
TYPE NameTab IS TABLE OF employees.last_name%TYPE;
ids IdsTab;
names NameTab;
CURSOR c1 IS
SELECT employee_id, last_name FROM employees WHERE
job_id = ‘ST_CLERK’;
BEGIN
OPEN c1;
FETCH c1 BULK COLLECT INTO ids, names;
CLOsE c1;
— Here is where you process the elements in the collections
FOR i IN ids.FIRST .. ids.LAST
LOOP
IF ids(i) > 140 THEN
DBMS_OUTPUT.PUT_LINE( ids(i) );
END IF;
END LOOP;
FOR i IN names.FIRST .. names.LAST
LOOP
IF names(i) LIKE ‘%Ma%’ THEN
DBMS_OUTPUT.PUT_LINE( names(i) );
END IF;
END LOOP;
END;
/
Closing a cursor
The CLOSE statement disables the cursor, and the result set becomes undefined. Once
a cursor is closed, you can reopen it, which runs the query again with the latest values of
any cursor parameters and variables referenced in the WHERE clause. Any other operation
on a closed cursor raises the predefined exception INVALID_CURSOR.
3.12 SUMMARY
In this unit, you have learned about SQL. It is a language used for interacting with
RDBMSs. IBM, based on E.F. Codd’s paper, ‘A Relational Model of Data for Large
Shared Data Banks’ was the first to develop the prototype for SQL. In 1979, ORACLE,
Self-Instructional Material 125
Structured Query Language an SQL product, was released. It is one of the most popular RDBMSs today. SQL is
independent of the underlying database structure and many different versions of SQL
exist. It is the current standard query language for organizing, managing and retrieving
data/information from databases. SQL is more than a query language. It is not only used
NOTES for retrieving data but also for managing all DBMS functions. SQL in Oracle supports
certain basic functions to control, define and manipulate data. It also provides for the
definition, storage and management of data in a centralized area that can be shared by
many users.
You have also learned about some basic queries in SQL including SELECT,
DISTINCT, WHERE CLAUSE AND RELATIONAL OPERATOR and WHERE
CLAUSE WITH LOGICAL OPERATOR. You have studied about UNION in this unit.
In relational algebra, UNION is defined as the set union of two relations. INTERSECT
AND EXCEPT have also been discussed. INTERSECT is the set intersection of two
relations. It returns only those rows returned by both component queries.
You have also learned about nested queries. These are queries within a query.
Nested queries allow embedding of one or more SELECT commands inside the body of
other queries. This unit also discussed aggregate operators. Aggregation is the process
of consolidating multiple values into a single value. It allows patterns in the data to
emerge, and these patterns form the basis for analysis and decision-making. Null values
were also explained in this unit. Null values denote the unknown. It has a special value
different from any other value from any attribute domain.
You have also learned complex integrity constraints in SQL. Acomplex integrity
constraint is a user-defined rule for a column or set of columns that allows or disallows
inserts, updates or deletes of a row based on the value it contains for the column or set
of columns. Triggers and views in SQL were also explained in the unit. A trigger is a
special kind of PL/SQL anonymous block. A view may be defined as a ‘stored query’ or
a ‘virtual table’. It is called a virtual table because it does not exist as an independent
entity in the database as do ‘real’ tables.
Finally, you have learned about embedded SQL and cursors. In embedded SQL,
SQL commands can be put inside a program written in some other language (called host
language) like C, C++, etc. SQL commands are of varied types to suit different purposes.
A cursor is a handle or name for an area in memory in which a parsed statement
and other information for processing the statement are kept. Such an area is also called
a private SQL area.
• View: It is a virtual table. A view normally does not store data and does not
physically exist.
• Join: It is a query that retrieves rows from more than one table or view. Most
NOTES
join queries contain the WHERE clause conditions that compare two columns
each from a different table.
• Intersect: It is the set intersection of two relations. It returns only those rows
that are returned by both component queries.
• Nested queries: These are queries within a query. It is like nesting of conditional
statements one within the other.
Short-Answer Questions
1. What is SQL?
2. What are the advantages of SQL?
3. Define the terms, DDL and DML. What are the differences between them?
4. What is the difference between procedural and non-procedural DML?
5. Write a note on SQL in Oracle.
6. What is the use of the SELECT statement?
7. How is a DUAL table created in Oracle?
8. What are nested queries?
9. How is it possible to select null values? Give examples.
10. What is embedded SQL?
Long-Answer Questions
1. What are data types in SQL? Explain with examples.
2. What are the basic queries in SQL?
128 Self-Instructional Material
3. Explain the use of UNION in SQL, giving examples. Structured Query Language
4.0 INTRODUCTION
In this unit, you will learn about relational database design. Database design is very
important because database structure is the foundation for the development of any
successful database application. Problems arise with data anomalies when a poor
design leads to uncontrolled data redundancies. In this context, you will be introduced
to schema refinement. The latter is basically the description of the database design.
It is an analytical study of a specific DBMS with reference to Boyce-Codd Normal
Form (BCNF) relation and anomalies. This unit will discuss functional dependencies.
A functional dependency (FD) is derived from a mathematical theory. It implies the
dependency of values of a single attribute or collection of attributes on another
attribute or collection of attributes.
In this unit, you will also study the properties of decomposition. Finally, you
will learn about normal forms (NFs). These are a set of rules for organizing data to
ensure that it meets specific requirements of data normalization. Normalization is a
database design technique that leads to improved database quality, and is based on
the mathematical theory of relations. A set of relations that is normalized facilitates
easy management and retrieval of data in a database system. Furthermore, if database
relations are not in the normalized form, then an RDBMS cannot be utilized properly.
X Y
R
S
If t and u agree Then they must
here agree here
Two employees can have the same name or salary but they might not be the
same person. To differentiate them, they should have different employee codes.
This reflects in FDs ECODE ENAME and ECODE SALARY. Here, ECODEis
defined as the candidate key.
4.3.1 Dependency Types
• Full functional dependency: X Y is a full functional dependency if Y is
not dependent on any proper subset of X, and Y is said to be fully functionally
dependent upon X. That is, in X Y where X is the minimal set of attributes
that uniquely determines Y. If we remove an attribute from X, it no longer
determines Y.
• Partial dependency: X Y is a partial dependency if Y is dependent on a
proper subset of X and Y is said to be partially dependent upon X. X Y
where X is not the minimal set of attributes that uniquely determines Y. Some
attributes could be removed from X, and the dependency would still hold.
Partial dependencies must be decomposed into separate relations. It is partial
if removal of any attribute from X does not result in the violation of the rule.
• Trivial functional dependency: Trivial functional dependencies are those
FDs that are impossible not to satisfy. A dependency is trivial, if and only if
the right-hand side (the consequent) is a subset (not necessarily a proper
subset) of the left-hand side (the determinant). That is, an FD A , A , ..., A
1 2 n
B1, B2, ..., Bm is said to be a trivial FD if and only if {B1, B2, ..., Bm} – {A1, A2,
..., A } = or, in other words, {B , B , ..., B } {A , A , ..., A }.
n 1 2 m 1 2 n
C
NOTES
A B
u
If t and u They must agree in B
agree on A
So surely they
agree in C
R W
8 10
7 13
The properties of decomposition can be explained as follows:
• It is used only when needed. For example, SNLRWH has functional
dependencies, such as S SNLRWH and R W.
• It takes into consideration that information that is stored consists of
SNLRWH tuples if projection of these tuples are also stored.
• It considers dependencies that require decomposed relations.
• It considers relation of R with functional dependencies, for example, if
X Y violates BCNF decompose R into R % Y and XY.
• It is an easier way to check for dependency preserving decomposition.
• It ensures that updates to the database do not result in illegal relations.
• It also checks whether joins are computed to determine functional
dependencies. The main property of decomposition is to check updates
without having to compute natural joins.
Illustration 1: In Table 4.4 each employee has more than one telephone number.
Table 4.4 Employee with Multiple Telephone Numbers
EMP
ECODE ENAME TEL_NO
E01 JAYANTA 24185462
DUTTA 24181450
E02 KOUSHIK 24148618
GHOSH 24731961
E03 SOURAV 25551189
BASAK 24725782
This is thus not in 1NF. This can be achieved by ensuring that every tuple
defines a single entity by containing only atomic values. There are two ways of
doing it.
(a) One can reorganize into one relation as shown in Table 4.5.
Table 4.5
EMP
ECODE ENAME TEL_NO
E01 JAYANTA DUTTA 24185462
E01 JAYANTA DUTTA 24181450
E02 KOUSHIK GHOSH 24148618
E02 KOUSHIK GHOSH 24731961
E03 SOURAV BASAK 25551189
E03 SOURAV BASAK 24725782
Table 4.6
EMP
ECODE ENAME NOTES
E01 JAYANTA
DUTTA
E02 KOUSHIK
GHOSH
E03 SOURAV
BASAK
EMP_TEL
ECODE TEL_NO
E01 24185462
E01 24181450
E02 24148618
E02 24731961
E03 25551189
E03 24725782
A B C D
1 1 c1 d1
1 2 c2 d2
2 1 c3 d1
2 2 c4 d2
where the composite attribute (A, B) is the main key. Assume that the following
functional dependencies exist:
(A, B) C
i.e., complete functional dependency on the composite keys (A, B).
B D
i.e., part functional dependency on the composite keys (A, B).
B D R2 A B C
1 d1 1 1 c1
2 d2 1 2 c2
1 d1 2 1 c3
2 d2 2 2 c4
A B C
A B
B C
ADMINISTRATIO
E01 KOUSHIK GHOSH MANAGER 25000 D01
N
Consider again the operations that you may want to do on the data.
Update
Can you change the employees servicing in department ‘D03’, i.e., in PROJECT
department? In this department, there are many employees belonging to the
department ‘D03’ (e.g., RAJIB HALDER and PINAKI BOSE). Thus, you must
make certain that all tuples are updated, or there will be problems with database
inconsistency.
Insert
Suppose, you want to add a new employee in the EMP relation with EMPNO as
its primary key. The new tuple must also include correct information about the
department where he will be working. If not, the data will become inconsistent.
Self-Instructional Material 143
Relational Database Design Table 4.8 EMP Relation with EMPNO
SYSTEM
E07 RAJIB HALDER 20000 D03 PROJECT
ANALYST
If you want to add a new department and no employees are assigned, then as
you do not have any information about the employees, you will have to put NULLs
as their attributes including the primary key, which is absolutely not permitted.
Table 4.9 NULLs as Attributes
KOUSHIK ADMINISTRAT
E01 MANAGER 25000 D01
GHOSH ION
FINANCE
E06 GOUTAM DEY 10000 D02 PERSONNEL
OFFICER
SYSTEM
E07 RAJIB HALDER 20000 D03 PROJECT
ANALYST
Delete
What happens if you delete the employee data from the relation? In this case,
information about the department is also lost, which must be avoided. For example,
if you want to delete information of an employee whose employee code is E06 in
the EMP instance as shown in Table 4.9, then you would lose information concerning
the department D02.
Table 4.10 EMP after deleting the Tuple of EMP E06
Table 4.9 makes it clear that in spite of a relation existing in 2NF, problems can still
occur and they should be removed. They need to be normalized further, i.e., there is
a need for a third normal form to remove these anomalies. If you examine the
functional dependencies of Table 4.10 the presence of ‘transitive’ dependencies are NOTES
evident.
Here EMPNO DEPTNAME is only a transitive or an indirect dependency.
It is treated as indirect as EMPNO DEPTNO and DEPTNO DEPTNAME
and thus EMPNO DEPTNAME.
4.5.4 Boyce-Codd Normal Form
Consider a relation R (A, B, C) where the given FDs are AB C, C B. Then an
instance of this relation can be shown in Table 4.11.
Table 4.11
A B C
a1 b2 c1
a4 b2 c1
a1 b3 c2
a3 b3 c3
A C
a1 c1
a4 c1
a1 c2
a3 c3
Where
ECODE = Employee Code
PID = Project Identification Number
P_LEADER = Project Leader
The only assumption is that a project leader manages every project. This is in
3NF, because it has no partial functional dependencies and no transitive
dependencies. It does not have the much needed property that each determinant be
a candidate key. Name the determinants in R. A pair of attributes is determinant,
ECODE and PID. Every distinctive pair of values of ECODE and PID establishes a
value for the attribute which is unique, P_LEADER. Another determinant is the
pair, ECODE and P_LEADER which establishes values of the attribute that are
unique PID. Still another determinant is the attribute, P_LEADER, for each different
value of P_LEADER determines a unique value of the attribute PID. These
observations about the relation R correspond to the real-world facts that each
employee has a single project leader (P_LEADER) for each of his or her project,
and just one project leader (P_LEADER) manages each project.
146 Self-Instructional Material
These three determinants need to be examined whether they are candidate Relational Database Design
keys or not. The response is that the pair, ECODE and PID, is a candidate key, since
each pair identifies a row in R uniquely. Similarly, the pair, ECODE and P_LEADER,
is a candidate key, because the value MG appears in two rows of the P_LEADER
column, the determinant, P_LEADER, is not a candidate key. Therefore, the relation NOTES
R is unsuccessful in fulfilling the condition that every determinant in it is a candidate
key.
Even though R is in 3NF, there are still anomalies in it.
INSERTION: If you want to add a new project with a project leader, you
cannot add until you have an employee assigned to that project.
DELETION: If you delete employee E03, You lose all information that SS
manages the project P03.
UPDATE: If AC replaces MG as project leader of project P01, you have to
update multiple rows.
The problem occurs because there is a determinant that is not a candidate
key. Now, formulate the whole thing discussed.
R(ECODE, PID, P_LEADER)
FDs: ECODE, PID P_LEADER
P_LEADER PID.
By decomposition we get
R1(ECODE,PID)
R2(P_LEADER,PID)
ECODE PID P_LEADER PID
E01 P01 MG P01
E02 P02 JDR P02
R1 R2
E03 P03 SS P03
E04 P02
E05 P01
Table 4.14
1211 11 DBMS
1211 28 C/C++
Next, you add an employee (1544) who uses C/C++ programming skills on
Project 11 as shown in Table 4.15.
Table 4.15
1211 11 DBMS
1211 28 C/C++
1544 11 C/C++
1211 11 DBMS
1211 11 C/C++ <<—Spurious data
1211 28 C/C++
1544 11 DBMS <<—Spurious data
1544 11 C/C++
4.6 SUMMARY
In this unit, you have learned about relational database design. Database design is
essentially a business problem, not a data problem. The aim of a relational database
design is to produce a set of relation schemas that represent the real-world situation
that is being modelled. The design must also allow storage of information without
unnecessary redundancy as well as retrieval of information efficiently. You have
thus learned about schema refinement. This unit has also familiarized you with
functional dependencies. Functional dependencies depend on knowledge of what
can be stored in the relation and serve as integrity constraints that should be validated
when adding or updating tuples in a relation. A relation state r of R that satisfies the
functional dependency constraints is called a legal relation state (or legal extension)
of R. You have further studied about the properties of decomposition—the process
of splitting a relation into two or more relations.
Finally, you have learned about normal forms. A technique called
normalization is used to analyse the given relational schema to achieve the desirable
properties of minimizing redundancy as well as inconsistency, besides minimizing
the insertion, deletion and update anomalies. Normalization is a bottom-up approach
to database design that begins by examining the relationships between attributes.
The process of normalization is a formal method that identifies relational schemas
based upon their primary or candidate keys and the functional dependencies that
exist among their attributes. Normalization is often performed as a series of tests on
a relational schema to determine whether it satisfies or violates the requirements of
a given normal form. To sum up, there are seven normal forms and each one was
created to deal with a specific type of problem.
Short-Answer Questions
1. What do you understand by schema refinement?
2. What is full functional dependency?
3. What is partial dependency? Explain with an example.
4. What is transitive dependency? Explain with an example.
5. What is trivial dependency? Explain with an example.
6. What are the main rules for normalization?
7. What is an insertion anomaly?
8. What is a deletion anomaly?
9. What is a modification anomaly?
10. Write a note on normal forms.
Long-Answer Questions
1. Discuss the role of functional dependency in DBMS. Give examples.
2. Discuss the different types of functional dependencies. Give examples.
3. What is a key? Explain the different types of keys with examples.
4. Given a relation with three attributes, how many candidate keys can it have
at any given time? At least how many candidate keys does it have? State
reasons in support of your answer.
5. Explain the difference between a primary key and a candidate key.
6. What is meant by attribute preservation condition on decomposition?
7. What are Armstrong’s axioms? Explain each with example.
8. Give the basic rules for 1NF and 2NF considering only primary keys. Define
any term that you feel necessary for these definitions.
9. Define 2NF. Give an example of a relation in 1NF but not in 2NF. Transform
the relation into relations in 2NF.
10. Define 3NF. Give an example of a relation in 2NF but not in 3NF. Transform
the relation into relations in 3NF.
152 Self-Instructional Material
11. How does the definition for BCNF differ from the general definition for 3NF? Relational Database Design
12. Why is BCNF much strict normal form than 3NF? Explain.
13. What is a multi-valued dependency? What types of constraint does it specify?
When does it arise? NOTES
14. Define BCNF. Give an example of a relation in 3NF but not in BCNF.
Transform the relation into relations in BCNF.
15. A university wishes to keep track of student records. They construct a universal
relation describing students and the courses students are enrolled in. Distinct
teachers teach distinct sections of courses even in the same term, and students
may take courses multiple times. They may be enrolled in at most one section
of any one course in any one term. Building and room BUILDING and ROOM
describe where a section of a course is taught.
STUDENT_ENROLLMENT_HISTORY
STUDENT_ID TEACHER_IDROOM_CAPACITY
STUDENT_LASTNAME TEACHER_NAME
COURSE_ENROLLMENT
STUDENT_FIRSTNAME COURSE_DESCRIPTION
SECTION_ENROLLMENT
COURSE_NUMBER BUILDING_NAME STUDENT_MAJOR_1
SECTION_NUMBER BUILDING_ADDRESS
STUDENT_MAJOR_2
TERM_COURSE_TAKEN ROOM_NUMBER
STUDENT_MAJOR_3
COURSE_MARK
(a) Normalize the STUDENT_ENROLLMENT_HISTORY relation and
present your normalized solution as a schematic diagram.
(b) Show primary keys in your normalized design.
(c) Show relationships between foreign keys and primary keys appropriately.
MANAGEMENT,
NOTES
CONCURRENCY CONTROL
AND DISTRIBUTED SYSTEM
Structure
5.0 Introduction
5.1 Unit Objectives
5.2 The ACID Properties
5.3 Transactions and Schedules
5.3.1 Transaction
5.3.2 Transaction Processing Steps
5.3.3 Interleaved vs Simultaneous vs Serial Transaction
5.3.4 Transaction Action
5.3.5 Transaction States and Additional Operations
5.4 Concurrent Execution of Transactions
5.4.1 Motivation for Concurrent Execution of Transactions
5.4.2 Serializability
5.4.3 Anomalies due to Interleaved Execution
5.5 Schedules Involving Aborted Transactions
5.6 Lock-Based Concurrency Control
5.6.1 Binary Locks
5.6.2 Shared or Exclusive Locks
5.6.3 Two-Phase Locking: Basic and Strict
5.7 Introduction to Crash Recovery
5.8 Concurrency Control
5.8.1 The Lost Update Problem
5.8.2 Uncommitted Dependency: The Dirty Read Problem
5.8.3 Unrepeatable Read or Inconsistent Retrieval Problem
5.8.4 Phantom Reads
5.8.5 Recoverability
5.9 Introduction to Locking Management
5.9.1 Types of Locks
5.10 Dealing with Deadlock
5.10.1 Deadlock
5.10.2 Deadlock Prevention Protocol
5.10.3 Deadlock Detection
5.10.4 Starvation
5.11 Introduction to ARIES
5.12 Introduction to Distributed Databases
5.12.1 Distributed Database
5.12.2 Distributed Data Transparency Features
5.12.3 Advantages and Disadvantages of Distributed Databases
5.12.4 Catalogue Management
5.12.5 Update Propagation
5.12.6 Recovery Control
5.12.7 Concurrency Control
5.13 Transaction Types
5.13.1 Implicit Transactions
5.0 INTRODUCTION
In this unit, you will learn about transaction management, concurrency control and
distributed system. These are three closely-related functions that are designed to
ensure that a database is reliable and remains in a consistent state. In this regard,
you will learn about the four fundamental properties that every transaction should
possess. It is abbreviated as the ACID properties. You will then study about
transactions and schedules. A transaction is a logical unit of work in a DBMS that
includes one or more database access operations. This unit will examine the
concurrent execution of transactions. Most DBMSs allow concurrent access to the
database. If these concurrent operations are not controlled, the various accesses
may interfere with one another and the database could arrive at an inconsistent
state. To prevent this from happening, the DBMS implements a concurrency control
protocol that prevents concurrent accesses from interfering with one another.
You will also learn about schedules involving aborted transactions. A schedule
is a sequence of operations performed by a set of concurrent transactions that prevents
the order of the operations in each of the individual transactions. In lock-based
concurrency control, you will know that in a concurrency approach, a lock is a
variable that is lined with a data item. This defines the status of the database operation
that can be performed on it.
You will also learn about crash recovery and concurrency control. The
reliability and consistency of a database must be maintained even in the event of
failures in both hardware and software components, and when multiple users are
accessing the database. Both concurrency control and recovery are required to protect
the database from data inconsistencies and loss of data. Recovery is the process of
restoring the database to a correct state following any failure that may have occurred
in either the hardware or software or both. Malicious corruption or destruction is
another problem associated with the reliability and consistency of a database. While
recovery from such an event certainly must be handled by the DBMS, its prevention
is a DBMS security issue.
You will also learn about locking management and the methods of dealing
with deadlock. While locking is the most popular concurrency control mechanism
implemented by many important techniques, there are various deadlock prevention
protocols that provide rules that you can use to avoid the problem of deadlock. You
156 Self-Instructional Material
will be introduced to ARIES and distributed databases. ARIES is a Web-based Transaction Management,
Concurrency Control and
technology and serves as an Internet gateway, providing security to an Internet Distributed System
connection by acting as a firewall. A distributed database is a collection of multiple
interrelated databases spread across a computer network.
You will also study about transaction types and distributed DBMS NOTES
architectures. In DBMS, there are many types of transactions with reference to
concurrency control—known as implicit, explicit, local and global transactions.
The data and software in a DDB system is spread over multiple sites. In the end,
you will learn about the design of distributed databases. DBMS allows the data
used in different applications and software to be processed at different levels.
T1 T2 T3 T2 T1
t0 t1 t2 t3 t4 t5 time
Interleaved transactions (one CPU)
T2
T1
T3
t0 t1 time
Terminate Terminated
Failed
Abort Terminate
Rollback
Committed
Read/Write
T1 T2
Read(X)
X=X+N
Write(X)
Read(Y)
Y=Y+N
Write(Y)
Time Read(X)
X=X+P
Write(X)
T1 T2 T1 T2
Now, these can be termed as serial schedules, since all the operations of one
transaction are being followed by the entire sequence of operations of the other
transaction.
In the interleaved mode, the operations of T1 are mixed with the operations of
T2. This can be done in a number of ways. Two such sequences are given here:
Read(X )
X=X+N NOTES
Read( X)
X=X+P
Write( X)
Read(Y)
Write(X)
Y=Y+N
Write(Y)
T1 T2
Read(X)
X=X+N
Write( X)
Read( X)
X=X+P
Write(X)
Read(Y)
Y=Y+N
Write(Y)
T1 T2 T1 T2
X
Schedule A Schedule B
T1 T2
T1 T2
Schedule C Schedule D
We may conclude that schedule D is equivalent to schedule A.
T1 T2
R(A)
W(A)
R(A)
W(A)
R(B)
W(B)
Commit
Abort
If a transaction Ti is aborted, all its actions have not to be done. Not only that,
if Tj reads an object last written by Ti, then Tj must be aborted as well. Most systems
avoid such cascading aborts by releasing a transaction’s locks only at commit time.
If Ti writes an object, Tj can read this only after Ti commits. In order to undo the
actions of an aborted transaction, DBMS maintains a log in which every write is
recorded. This mechanism is also used to recover from system crashes: all active X
acts at the time of the crash are aborted when the system comes back up. Write-
ahead logging (WAL) is used to undo the actions of aborted transactions and to
restore the system to a consistent state after a crash.
readlock(Y)
read(Y) Growth Phase I
writelock(X)
—————————————————
unlock(Y)
read(X) Shrinkage Phase II
X=X+Y
write(X)
unlock(X)
No downgrading lock:
According to these rules, a transaction cannot downgrade a write lock into a
read lock, because downgrading is equivalent to releasing the write lock and
subsequently requesting and acquiring a new read lock.
Upgrading a lock is accepted.
However, it is possible for a transaction to upgrade a read lock into the stronger
write lock during the growing phase.
The order in which transactions are granted locks, forces an execution ordering
on the transactions in connection with their conflicting operations. This is because
transactions are forced to wait on conflicting locks.
By preventing transactions from acquiring any lock after the release of a
lock, the 2PL protocol ensures serializability.
This can be illustrated by showing how the non-repeatable read problem is
solved using 2PL.
Examine the next example of inconsistent retrieval involving two transactions
T and S and two columns A and B. Their non-2PL and 2PL executions with time
diagrams are shown in the example. Local variables with lower case letters and
• Conservative 2PL: This involves acquiring of all locks at the beginning and
releasing when done. It is inefficient because you will request more than and NOTES
longer than what is really needed.
• Strict 2PL: There is a distinct growth phase, but the shrinkage phase is abrupt.
It is used by Oracle.
There are a number of variations of 2PL. The technique just described is
known as basic 2PL. A variation known as conservative 2PL (or static 2PL) requires
a transaction to lock all the items it accesses before the transaction begins execution,
by predeclaring its read-set and write-set. As mentioned earlier, the write-set is the
set of all items that a transaction writes and the read-set of a transaction is the set of
all items that are read by the transaction. If any of the predeclared items required
cannot be locked, the transaction does not lock any item. Instead, it waits until all
the items are available for locking. Conservative 2PL is a deadlock-free protocol.
However, it is difficult to use in practice because of the need to predeclare the read-
set and write-set, which is not possible in most situations.
In practice, the most popular variation of 2PL is the strict 2PL, which
guarantees strict schedules. Here, none of the exclusive (write) locks are released
by a transaction T till it has committed or aborted. Therefore, no item written by T
can be read or written in by any other item until T commits. This leads to a strict
recoverability schedule. Strict 2PL is not deadlock-free. A more restrictive variation
of strict 2PL is the rigorous 2PL, which also guarantees strict schedules. In this
variation, a transaction T does not release any of its locks (exclusive or shared) until
after it commits or aborts, and so it is easier to implement than strict 2PL. Notice
the difference between conservative and rigorous 2PL; the former must lock all its
items before it starts so once the transaction starts, it is in the shrinking phase,
whereas the latter does not unlock any of its items until after it terminates (by
committing or aborting) so the transaction is in its expanding phase until it ends.
In many cases, the concurrency control subsystem itself is responsible for
generating the readlock and writelock requests. For example, if the system is to
enforce the strict 2PL protocol, then, whenever transaction T issues a read(X), the
system calls the readlock(X) operation on behalf of T. If the state of LOCK(X) is
write-locked by some other transaction T’, the system places T on the waiting queue
for item X; otherwise, it grants the readlock(X) request and permits the read(X)
operation of T to execute. On the other hand, if transaction T issues a writeitem(X),
the system calls the writelock(X) operation on behalf of T. If the state of LOCK(X)
is writelocked or readlocked by some transaction T', the system places T on the
waiting queue for item X; if the state of LOCK(X) is readlocked and T itself is the
only transaction holding the read lock on X, the system upgrades the lock to write
locked and permits the write_item(X) operation by T; Finally, if the state of LOCK
(X) is unlocked, the system grants the writelock(X) request and permits the write(X)
operation to execute. After each action, the system must update its lock table
appropriately.
Real-Time Mobile
HLR Computing FA CIM
DBMS
User Interface
Crash
Received Transaction
Manager Manager
Main Memory Manager
Backup
Manager Main Memory
Backup Database
Application areas
The following are the application areas of the crash recovery:
• Real-time database management systems
• Home location register (HLR) in mobile communications
• Mobile computing
• Factory automation (FA)
• Factory control DBMS
T1 T2
Read(X)
X = X - 50
Read(X)
temp = X * 0.2X
= X - temp
Write (X)
Read(Y)
Write(X)
Read(Y)
Y = Y + 50
Write(Y)
Y = Y + temp
Write(Y)
X Time Y
Read(salary) of Employee 100 1 -
- 2 Sum = 0.0
Update Salary 3 -
- 4 Read(salary) of Employee 100
Write(salary) 5 -
- 6 Sum = Sum + Salary of Employee 100
Read(salary) of Employee 101 7 -
- 8 Read(salary) of Employee 101
Update Salary 9 -
- 10 Sum = Sum + Salary of Employee 101
Write(salary) of Employee 101 11 -
- 12 -
- 13 -
etc - etc
The problem illustrated in Table 5.2 is called the inconsistent retrieval anomaly.
During the execution of a transaction therefore, changes made by another transaction
that has not yet committed should not be visible since that data may not be consistent.
5.8.4 Phantom Reads
Phantom reads take place when insertion or deletion is performed against a row
belonging to a range of rows read by a transaction. The first read of the range of
rows reflects a row that does not exist in the second or successive read because of a
deletion by another transaction. In the same way, due to an insertion by another
transaction, the second or successive read of the transaction shows a row that was
non-existent in the original read.
An editor, for instance, modifies a document submitted by an author. However,
when the changes are incorporated into the main/master copy of the document by
the production department, it is discovered that the new content which is unedited
has been added by the author. Such a problem can be kept at bay if there is a
provision to check anybody from adding new material. In other words, it would
help to have a system wherein no one can add/modify a document till the editors
and the production department have completed their work on the original document.
T1 T2
Read(A)
Write(A)
F2= Read(A)
Write(A)
Abort
Abort
Else
{
wait(until Lock(X)=0) and the lock manager wakes up the
transaction)
go to start;
}
The unlocking algorithm
Unlock item(X):
Lock(X) ¬ 0;
{ If any transactions are waiting,
Wakeup one of the waiting transactions}
The only restriction on the use of the binary locks is that they should be
implemented as indivisible units (also called ‘critical sections’ in operating systems
terminology). That means, no interleaving operations should be allowed, once a
lock or unlock operation is started, until the operation is completed. Otherwise, if a
transaction locks a unit and gets interleaved with many other transactions, the locked
unit may become unavailable for long times to come with catastrophic results.
To make use of the binary lock schemes, every transaction should follow
certain rules:
1. The lockitem(X) command is to be issued before issuing a read(X) or
write(X).
2. The unlockitem(X) command is to be issued after completion of all
read(X) and write(X) operations on X.
3. If a transaction holds a lock on a data item X, it should not issue another
lockitem(X) command.
4. If a transaction is not currently holding a lock on a data item X, it should
not issue an unlockitem(X) command.
Requested Exclusive N N Y
Lock
Shared N Y Y
Unlock Y Y -
Tables 5.3 and 5.4 show requested lock and the locking rules with reference
to read, write and certify locks respectively based on priority.
Table 5.4 Locking Rules as per Priority Inversion
Lock requested by a high
Lock already set by a low priority transaction
priority transaction
Read Write Certify
Read Granted L-Aborted Can not Occur*
Write Granted/Blocked# Granted Granted
Certify Conversion Granted Conversion
In Table 5.4:
• L-Aborted means Low Priority Transaction Aborted, whereas conversion
means Low Priority Transaction is converted to write lock.
• * - It means if the transaction is already aborted, there is no reading lock set.
• # - This symbol depends on the implementation.
Optimistic Locking and Pessimistic Locking
Optimistic locking: The data is locked only when data is being saved. The lock is
released after data is saved.
Pessimistic locking: The data is locked when editing begins. The lock is released
after the data is saved or discarded.
Optimistic locking is only for solving physical I/O conflict. It cannot be used
to handle the concurrency mentioned above. In other words, it cannot provide a
solution the update loss problem and uncommitted dependency problem.
NOTES
NOTES
Ti Tj
5.10.4 Starvation
Starvation occurs when a transaction remains unexecuted for a long time while
other transactions are under execution. This occurs if some transactions are assigned
high priority than others. To avoid starvation, a system can use the first-come-first-
serve waiting method, according to which the transactions can lock the data item in
the same order in which the request is made for the data item. The wait-die and
wound-wait schemes also help prevent starvation among transactions. Another way
to avoid starvation is to define the priority of transactions and increasing the priority
of a transaction, which is waiting for a data item.
Site 5
Site 1
Site 4
Communications
Network
Site 2
Site 3
NOTES
NOTES
Figure 5.11 shows that implicit recipients work with implicit transaction and
able are to pass all the messages to a queue that represents a specific queue. The
word recipient is used here for maintaining the queue for implicit and explicit types
of transactions. The category of transactions can be viewed via ‘explicit recipients’
in that they are designated targets of particular messages. The Figure 5.11 shows a
workflow consisting of four steps performed by Application A, Application B,
Application C and Application D. The queues are used to buffer the flow of
information between different processing stages of the business process. By
specifying the delay interval and the expiration time for a message, a window of
execution can be provided for each of the applications.
5.13.1 Implicit Transactions
By default, all INSERT, DELETE, UPDATE statements in DBMS run as implicit
transactions. For example, if an UPDATE query is run, and a primary key constraint
is violated, a rollback will occur and changes will not take place. One of the server-
level options available in the SQL server causes all operations to be encapsulated in
the context of a transaction. To use this mode, you must set the S e t
Implicit_Transactions On statement for the database connection. Any of
the following statements will serve as an implicit start to a transaction:
Alter Table Create Delete Drop
Fetch Grant Insert Open
Revoke Select Truncate Table Update
This adds another layer of integrity protection and applies for the following
three things:
(i) The size of the transactions themselves
(ii) The size of the transaction log Check Your Progress
(ii) Data recovery 16. What is ARIES?
17. What do you understand by
You can only recover committed data so that big transactions that do not
distributed databases?
commit, if there is a problem, can be lost. Performance is also considered by using 18. How many types of
transactions. transaction are there in a
DBMS?
NOTES
purchase.com
Purchase
5.15.2 Design
The design of DDBs is carried out using the following techniques:
• Data fragmentation
• Data replication and allocation
1. Data fragmentation
Data fragmentation can be defined as a process that is used to break up a distributed
database into logical units. These logical units are known as fragments that can be
assigned to various sites of a Distributed database for storage. The fragments can be
defined as relations that are stored a particular site. Distributed database can be is a
collection of number of interrelated databases that are spread through a computer
network. Distributed database system is used to improve the reliability and the
performance of the database. The following are the types of data fragmentation:
• Horizontal fragmentation
• Vertical fragmentation
Horizontal fragmentation: Horizontal fragmentation can be defined as a
process that horizontally divides a relation by grouping rows to create subsets of
tuples. Each subset has a logical meaning. These subsets of tuples, also called
segments, are then assigned to different sites which are part of a distributed database
system. A horizontal fragmentation of a relation is a subset of the tuples existing in
that relation. These tuples in horizontal fragmentation are specified by a condition
on one or more attributes of the relation. Consider, for example, that you have a
relation called Employee as shown in the Table 5.7. Table 5.8 and 5.9 show horizontal
fragments for the Employee relation. The fragment Employee2 shown in Table 5.8
NOTES
Serial No. Employee Status City
Name
S5 Reeve 20 Paris
S4 John 40 London
S3 Blake 20 Athens
S2 Adams 10 London
S1 Clark 40 Paris
S3 Blake 20 Athens
S1 Clark 40 Paris
S4 John
S3 Blake
S2 Adams
S1 Clark
Status City
20 Paris
40 London
20 Athens
10 London
30 Paris
NOTES
5.15.5 Fragmentation
Fragmentation consists of breaking a relation into smaller relations or fragments
possibly at different sites. Database applications work with views rather than entire
relations, therefore, data is stored close to where it is mostly frequently used. It
supports parallelism with fragments so that they are the unit of distribution. A
transaction can be divided into several subqueries that operate on fragments.
• Security: Data not required by local applications is not restored, and
consequently not available to unauthorized users.
• Performance: The performance of global applications that require data from
several fragments located at different sites may be slower.
Types of fragmentation
The types of fragmentation are as follows:
• Horizontal fragmentation : A subset of the tuples of a relation, defined as
sp(R), where p is a predicate based on one or more attributes of the relation.
The union of horizontal fragmentation must be equal to the original relation
requiring disjoint.
• Vertical fragmentation: A subset of the attributes of a relation, denoted as
Pa1, a2, .., an (R), where a1, a2, .., an are attributes of the relation R. Vertical
fragmentation has happened in loss-less joins. In this, systems often assign a
unique tuple id for each tuple in the original relation and attach it to the
vertical fragmentation.
• Mixed fragmentation: A horizontal fragment that is subsequently vertically
fragmented, or a vertical fragment that is then horizontally fragmented.
Let us consider a very good example of database fragmentation within the
skeletal structure of a DBMS such as Turbo Image. While specific internal DBMS
fragmentation is outside the scope of all, a DBMS exists on top of the file system.
This means that there is still significant impact of defragmenting Turbo Image data
5.16 SUMMARY
In this unit, you have learned about ACID properties. Every transaction possesses
four fundamental properties—atomicity, consistency, isolation and durability. You
have also learned about transactions and schedules. A transaction is a logical unit of
work in a DBMS that includes one or more database access operations. Another
topic you have read about is concurrent execution of transactions. Transaction
processing systems usually let many transactions run simultaneously. Schedules
involving aborted transactions and lock-based concurrency control have also been
explained in this unit. One simple method to prevent transactions from interfering
with one another is to execute them serially. In the concurrency approach, a lock is
a variable that is linked with a data item.
You have also been introduced to the concept of crash recovery. Crash recovery
mechanisms guard against system crashes and media failures. This unit also discussed
concurrency control as the process of managing simultaneous execution of
transactions without allowing them to interfere with one another. Concurrency control
can regulate the interaction among concurrent transactions to keep them from
disturbing the database consistency.
In addition, this unit discussed concurrency control mechanisms, including
locking protocols and timestamp ordering. It brought out the importance of protecting
a database against unauthorized access, malicious destruction and accidental
introduction of inconsistency. Further, you have also learned about distributed
database (DDB), which is a collection of multiple interrelated databases that are
spread over a computer network. Each computer contains its own database, which
is managed by an individual database management system. Further, you have learned
about transaction types. In DBMS, there are many types of transactions with reference
to concurrency control which are called implicit, explicit, local and global
transactions.
This unit also discussed distributed DBMS architectures. The data and software
in a DDB system is spread over many sites that are connected by some form of
communication network. Finally, you have learned about the design of distributed
Short-Answer Questions
NOTES
1. Write a note on durability.
2. Draw a state transition diagram and name the typical states that a transaction
goes through during execution.
3. Name the three basic concurrency problems encountered in a DBMS.
4. What is a serial schedule? What is a serializable schedule? Why is a serial
schedule considered correct?
5. When are two schedules said to be equivalent? How can one detect equivalence
of schedules?
6. Define conflict and view serializability.
7. Why is concurrency control needed?
8. Compare binary locks to shared/exclusive locks. Why is the latter type
preferred?
9. What are checkpoints? Why are they important?
10. What is locking management?
11. What is the deadlock prevention protocol?
12. Write a note on ARIES.
13. What are the features of distributed data transparency?
14. What are explicit transactions?
15. What do you understand by a homogeneous distributed database systems?
Long-Answer Questions
1. What is a transaction? Explain the ACID property of database transactions.
2. Discuss serial schedule, non-serial schedule and serializable schedule with
examples.
3. Explain the method of verifying view serializability using the labelled
precedence graph when more than one transaction is executed in a concurrent
execution.
4. Prove that the basic two-phase locking protocol guarantees conflict
serializability. What benefit does the strict two-phase locking provide? What
are the resultant disadvantages?
5. Discuss the different types of locks—binary, shared, exclusive.
6. Explain the timestamp-based concurrency control protocol.
7. Write detailed notes on:
(a) Concurrency control
(b) 2PL protocol
(c) Recovery in DBMS
8. Discuss the problem of deadlock and starvation. Illustrate with example that
the two-phase locking protocol is not deadlock free.
Self-Instructional Material 215
Transaction Management, 9. Explain the two-phase locking protocol. How is it different from the strict
Concurrency Control and
Distributed System and conservative two-phase locking protocol?
10. When a schedule satisfies the two-phase locking protocol, is there any
possibility of dead lock? If not, why? If yes, give an example.
NOTES