DBMS Lecture Notes

LECTURE NOTES
ON
DATABASE MANAGEMENT
SYSTEMS
2018 – 2019
II B. Tech I Semester (JNTUA-R17)
Mrs. G. Indiravathi, Assistant Professor
CHADALAWADA RAMANAMMA ENGINEERING COLLEGE

(AUTONOMOUS)
Chadalawada Nagar, Renigunta Road, Tirupati – 517 506
Department of Computer Science and Engineering

UNIT-I
Introduction to Database systems

Introduction
DATABASE:-A database is a collection of information that is organized so that it can be easily accessed, managed and
updated. Data is organized into rows, columns and tables, and it is indexed to make it easier to find relevant
information. Data gets updated, expanded and deleted as new information is added. Databases process workloads to
create and update themselves, querying the data they contain and running applications against it.
DATA: - Any factor that can be stored.
Example: text, numbers, images, videos and speech.
Database System Applications
A Database application is a computer program whose primary purpose is entering and retrieving information from a
computerized database.
1. Banking: all transactions

2. Airlines: reservations, schedules
3. Universities: registration, grades
4. Sales: customers, products, purchases
5. Online retailers: order tracking, customized recommendations
6. Manufacturing: production, inventory, orders, supply chain
7. Human resources: employee records, salaries, tax deductions
8. Databases touch all aspects of our lives
What Is a DBMS?
A Database Management System (DBMS) is a software package designed to interact with end- users,
other applications, store and manage databases. A general-purpose DBMS allows the definition,
creation, querying, update, and administration of databases.
• A very large, integrated collection of data.
• Models real-world enterprise. Entities (e.g., students, courses) Relationships (e.g., Madonna is
taking CS564).
DBMS contains information about a particular enterprise
1. Collection of interrelated data
2. Set of programs to access the data

3. An environment that is both convenient and efficient to use
Why Use a DBMS?
A database management system stores, organizes and manages a large amount of information within a single
software application. It manages data efficiently and allows users to perform multiple tasks with ease.
• Reduced application development time.

• Data integrity and security.
• Uniform data administration.
• Concurrent access, recovery from crashes
Purpose of Database Systems

In the early days, database applications were built directly on top of file systems. A DBMS provides users with a
systematic way to create, retrieve, update and manage data. It is a middleware between the databases which store all
the data and the users or applications which need to interact with that stored database. A DBMS can limit what data
the end user sees, as well as how that end user can view the data, providing many views of a single database schema.
Database + database management system = database system
Drawbacks of using file systems to store data:
• Data redundancy and inconsistency.

• Multiple file formats, duplication of information in different files.
• Difficulty in accessing data.
• Need to write a new program to carry out each new task.
• Data isolation — multiple files and formats
• Integrity problems
• Hard to add new constraints or change existing ones
• Atomicity of updates
• Failures may leave database in an inconsistent state with partial updates carried out
• Example: Transfer of funds from one account to another should either complete or not happen at all
• Concurrent access by multiple users
• Concurrent accessed needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
• Example: Two people reading a balance and updating it at the same time
• Security problems
• Hard to provide user access to some, but not all, data
• Database systems offer solutions to all the above problems
View of Data
Architecture for a database system:
A database system is a collection of interrelated data and a set of programs that allow users to access and modify
these data. The main task of database system is to provide abstract view of data i.e hides certain details of storage to the
users.
Data Abstraction:
Major purpose of dbms is to provide users with abstract view of data i.e. the system hides cert ain details of how the
data are stored and maintained. Since database system users are not computer trained,developers hide the complexity
from users through 3 levels of abstraction , to simplify user’s interaction with the system.
Levels of Abstraction
1) Physical level of data abstraction: Describes how a record (e.g., customer) is stored. This is the lowest level
of abstraction which describes how data are actually stored.
2) Logical level of data abstraction: The next highest level of abstraction which hides what data are actually
stored in the database and what relations hip exists among them. Describes data stored in database, and the relationships
among the data.
type customer = record;
customer_id:string;
customer_name:string;
customer_stree:string;
customer_city : string;
end;
3)View Level of data abstraction: The highest level of abstraction provides security mechanism to prevent user
from accessing certain parts of database. Application programs hide details of data types. Views can also hide
information (such as an employee’s salary) for security purposes and to simplify the interaction with the system.
Instances and Schemas:
Similar to types and variables in programming languages. Database changes over time when information is inserted
or deleted.
Instance – the actual content of the database at a particular point in time analogous to the value of a variable is called
an instance of the database.
Schema – the logical structure of the database called the database schema. Schema is of three types: Physical schema,
logical schema and view schema.
• Example: The database consists of information about a set of customers and accounts and the relationship
between them)Analogous to type information of a variable in a program
Physical schema: Database design at the physical level is called physical schema. How the data stored in blocks of
storage is described at this level.
Logical schema: database design at the logical level Instances and schemas, programmers and database administrators
work at this level, at this level data can be described as certain types of data records gets stored in data structures,
however the internal details such as implementation of data structure is hidden at this level.
View schema: Design of database at view level is called view schema. This generally describes end user interaction
with database systems.
Physical Data Independence – The ability to modify the physical schema without changing the logicalschema.
Applications depend on the logical schema

In general, the interfaces between the various levels and components should be well defined so that changes in some
parts do not seriously influence others.
Example: University Database
Conceptual schema:
Students(sid: string, name: string, login: string, age: integer, gpa:real)
Courses(cid: string, cname:string, credits:integer)
Enrolled(sid:string, cid:string, grade:string)
Physical schema: Relations stored as unordered files.

Index on first column of Students.
External Schema (View):
Course_info(cid:string,enrollment:integer)
Data Independence:
Applications insulated from how data is structured and stored.
Logical data independence: Protection from changes in logical structure of data.
Physical data independence: Protection from changes in physical structure of data.
Data Models:
A Data Model is a logical structure of Database. It is a collection of concepts for describing data, reflects entities,
attributes, relationship among data, constrains etc. A schema is a description of a particular collection of data, using
the given data model. The relational model of data is the most widely used model today. it is a collection of tools
for describing
– Data
– Data relationships
– Data semantics
– Data constraints
– Relational model
– Entity-Relationship data model (mainly for database design)
– Object-based data models (Object-oriented and Object-relational)
– Semi structured data model (XML)
– Other older models:
o Network model
o Hierarchical model
Every relation has a schema, which describes the columns, or fields.
Different types of data models are:
1) Relational model: The relational model uses a collection of tables to represent both data and relationships
among those data. Each table has multiple columns with unique name.
– It is example of record based model.
– These models are structured is fixed-format of several types.
– Each table contains records of particular type
– Each record type defines fixed number of fields, or attributes.
– The columns of the table correspond to attributes of the record type.
The relational data model is the most widely used data model and majority of current database systems are based on
relational model.
2) Entity-relationship model: The E-R model is based on a perception of real world that consists of basic objects
called entities and relationships among these objects. An entity is a ‘thing’ or ‘object’ in the real world, E-R
model is widely used in database design.
Database Architecture:
The architecture of a database systems is greatly influenced bythe underlying computer system on which the
database is running:
• Centralized
• Client-server
• Parallel (multiple processors and disks)
• Distributed
Overall System Structure

Database Application Architectures:
Transaction Management:
• A transaction is a collection of operations that performs a single logical function in a

database application A transaction in a database system must maintain atomicity, consistency,
isolation, and durability − commonly known as ACID properties properties − in order to
ensure accuracy, completeness, and data integrity.
• Transaction-management component ensures that the database remains in a consistent

(correct) state despite system failures (e.g., power failures and operating system crashes)
and transaction failures.
• Concurrency-control manager controls the interaction among the concurrent
transactions, to ensure the consistency of the database.
Data storage and Querying:

A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. The functional components of the database system are
– Storage management
– Query processing
– Transaction processing
Storage Management
Storage manager is a program module that provides the interface between the low-level data stored
in the database and the application programs and queries submitted to the system.
The storage manager is responsible to the following tasks:
– Interaction with the file manager

– Efficient storing, retrieving and updating of data
– Authorization and integrity manager
– Integrity
– Transaction manager
– File manager
– Buffer manager
Issues:
– Storage access
– File organization
– Indexing and hashing
Query Processing
1. Parsing and translation

2. Optimization
3. Evaluation
• Alternative ways of evaluating a given query

– Equivalent expressions
– Different algorithms for each operation
• Cost difference between a good and a bad way of evaluating a query can be enormous
• Need to estimate the cost of operations
–Depends critically on statistical information about relations which the database must maintain
–Need to estimate statistics for intermediate results to compute cost of complex expressions
Database Users and Administrators:

Database Users
Users are differentiated by the way they expect to interact with the system
• Application programmers – interact with system through DML calls
• Sophisticated users – form requests in a database query language
• Specialized users – write specialized database applications that do not fit into the traditional data processing
framework
• Naïve users – invoke one of the permanent application programs that have been written previously
– Examples, people accessing database over the web, bank tellers, clerical staff
Database Administrator
• Coordinates all the activities of the database system
– has a good understanding of the enterprise’s information resources and needs.
Database administrator's duties include:
– Storage structure and access method definition
– Schema and physical organization modification
– Granting users authority to access the database
– Backing up data
– Monitoring performance and responding to changes
– Database tuning.
History of Database Systems:
• 1950s and early 1960s:

– Data processing using magnetic tapes for storage
• Tapes provide only sequential access
– Punched cards for input
• Late 1960s and 1970s:
– Hard disks allow direct access to data
– Network and hierarchical data models in widespread use
– Ted Codd defines the relational data model
• Would win the ACM Turing Award for this work
• IBM Research begins System R prototype
• UC Berkeley begins Ingres prototype
– High-performance (for the era) transaction processing
• 1980s:
– Research relational prototypes evolve into commercial systems
• SQL becomes industry standard
– Parallel and distributed database systems
– Object-oriented database systems
• 1990s:
– Large decision support and data-mining applications
– Large multi-terabyte data warehouses
– Emergence of Web commerce
• 2000s:
– XML and XQuery standards
– Automated database administration
– Increasing use of highly parallel database systems

– Web-scale distributed data storage systems
Introduction to Database Design:
• Conceptual design: (ER Model is used at this stage.)
– What are the entities and relationships in the enterprise?
– What information about these entities and relationships should we store in the
database?
– What are the integrity constraints or business rules that hold?
– A database `schema’ in the ER Model can be represented pictorially
(ER diagrams).
– Can map an ER diagram into a relational schema.
ER Model:
• Entity: Real-world object distinguishable from other objects. An entity is described (in DB) using a set of
attributes.
• Entity Set: A collection of similar entities. E.g., all employees.
– All entities in an entity set have the same set of attributes. (Until we consider ISA hierarchies, anyway!)
– Each entity set has a key.
– Each attribute has a domain.
• Relationship: Association among two or more entities. E.g., Attishoo works in Pharmacy department.
• Relationship Set: Collection of similar relationships.
– An n-ary relationship set R relates n entity sets E1 ... En; each relationship in R involves entities e1 E1, ..., en En
• Same entity set could participate in different relationship sets, or in different “roles” in same set.
Modeling:
• A database can be modeled as:
– a collection of entities,
– relationship among entities.
Entities and Entity Sets:

• An entity is an object that exists and is distinguishable from other objects.
Example: specific person, company, event, plant
• Entities have attributes
Example: people have names and addresses
• An entity set is a set of entities of the same type that share the same properties.
Example: set of all persons, companies, trees, holidays

Example:Entity Sets customer and loan
Attributes:
• An entity is represented by a set of attributes, that is descriptive properties possessed

by all members of an entity set.
• Domain – the set of permitted values for each attribute
• Attribute types:
– Simple and composite attributes.
– Single-valued and multi-valued attributes
Example: multivalued attribute: phone_numbers
– Derived attributes can be computed from other attributes
Example: age, given date_of_birth

Composite Attributes
Mapping Cardinality Constraints
• Express the number of entities to which another entity can be associated via a
relationship set.
• Most useful in describing binary relationship sets.
• For a binary relationship set the mapping cardinality must be one of the following types:
– One to one
– One to many
– Many to one
– Many to many
Mapping Cardinalities:
Note: Some elements in A and B may not be mapped to any elements in the other set
Mapping Cardinalities
Note: Some elements in A and B may not be mapped to any elements in the other set
Relationships and Relationship Sets
• A relationship is an association among several entities
A relationship set is a mathematical relation among n  2 entities, each taken from
entity sets
• {(e1, e2, … en) | e1  E1, e2  E2, …, en  En}where (e1, e2, …, en) is a relationship
– Example:
(Hayes, A-102)  depositor
Relationship Set borrower
An attribute can also be property of a relationship set.

• For instance, the depositor relationship set between entity sets customer and account
may have the attribute access-date
Degree of a Relationship Set
• Refers to number of entity sets that participate in a relationship set.
• Relationship sets that involve two entity sets are binary (or degree two).
Generally, most relationship sets in a database system are binary.
• Relationship sets may involve more than two entity sets.
 Example: Suppose employees of a bank may have jobs (responsibilities) at

multiple branches, with different jobs at different branches. Then there is a ternary
relationship set between entity sets employee, job, and branch
• Relationships between more than two entity sets are rare. Most relationships are binary.
Weak Entities
• A weak entity can be identified uniquely only by considering the primary key of
another (owner) entity.
 Owner entity set and weak entity set must participate in a one-to-many
relationship set (one owner, many weak entities).
 Weak entity set must have total participation in this identifying relationship set.
Weak Entity Sets
• An entity set that does not have a primary key is referred to as a weak entity set.
• The existence of a weak entity set depends on the existence of a identifying entity set
 it must relate to the identifying entity set via a total, one-to-many relationship
set from the identifying to the weak entity set
 Identifying relationship depicted using a double diamond
• The discriminator (or partial key) of a weak entity set is the set of attributes
that distinguishes among all the entities of a weak entity set.
• The primary key of a weak entity set is formed by the primary key of the strong entity
set on which the weak entity set is existence dependent, plus the weak entity set’s
discriminator.
• depict a weak entity set by double rectangles.
• underline the discriminator of a weak entity set with a dashed line.
• payment_number – discriminator of the payment entity set
• Primary key for payment – (loan_number, payment_number)
• Note: the primary key of the strong entity set is not explicitly stored with the
• weak entity set, since it is implicit in the identifying relationship.
• If loan_number were explicitly stored, payment could be made a strong entity, but then
the relationship between payment and loan would be duplicated by an implicit
relationship defined by the attribute loan_number common to payment and loan
More Weak Entity Set Examples
• In a university, a course is a strong entity and a course_offering can be modeled as a

weak entity
• The discriminator of course_offering would be semester (including year) and
section_number (if there is more than one section)
• If we model course_offering as a strong entity we would model course_number as an

attribute
• Then the relationship with course would be implicit in the course_number attribute
SA (ìs a’) Hierarchies
Aggregation
• Relationship sets works_on and manages represent overlapping information
– Every manages relationship corresponds to a works_on relationship
• However, some works_on relationships may not correspond to any manages

relationships. So we can’t discard the works_on relationship
• Eliminate this redundancy via aggregation
– Treat relationship as an abstract entity
– Allows relationships between relationships
– Abstraction of relationship into new entity
• Without introducing redundancy, the following diagram represents:
– An employee works on a particular job at a particular branch
– An employee, branch, job combination may have an associated manager

E-R Diagram with Aggregation
Conceptual Design with ER Model
Design choices:
– Should a concept be modeled as an entity or an attribute?
– Should a concept be modeled as an entity or a relationship?
– Identifying relationships: Binary or ternary? Aggregation?
• Constraints in the ER Model:
– A lot of data semantics can (and should) be captured.
– But some constraints cannot be captured in ER diagrams.
Entity vs. Attribute
• Should address be an attribute of Employees or an entity (connected to Employees by

a relationship)?
• Depends upon the use we want to make of address information, and the semantics of
the data:
If we have several addresses per employee, address must be an entity (since attributes cannot
be set-valued).
If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given
city, address must be modeled as an entity (since attribute values are atomic).
Binary vs. Ternary Relationships
• Previous example illustrated a case when two binary relationships were better than
one ternary relationship.
An example in the other direction: a ternary relation Contracts relates entity sets Parts,
Departments and Suppliers, and has descriptive attribute qty. No combination of binary
relationships is an adequate substitute:
– S “can-supply” P, D “needs” P,and D “deals-with” S does not imply that D has
agreed to buy P from S.
– How do we record qty?

Introduction to relational model
Relational Database: Definitions
• Relational database: a set of relations
• Relation: made up of 2 parts:
– Instance : a table, with rows and columns.
#Rows = cardinality, #fields = degree / arity.
– Schema : specifies name of relation, plus name and type of each column.
E.G. Students (sid: string, name: string, login: string, age: integer, gpa: real).
• Can think of a relation as a set of rows or tuples (i.e., all rows are distinct).
Example Instance of Students Relation
sid name login age gpa

53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
Cardinality = 3, degree = 5, all rows distinct
Do all columns in a relation instance have to be distinct?
Relational Query Languages
• A major strength of the relational model: supports simple, powerful querying of data.
• Queries can be written intuitively, and the DBMS is responsible for efficient evaluation.
– The key: precise semantics for relational queries.
– Allows the optimizer to extensively re-order operations, and still ensure that the answer
does not change.
The SQL Query Language
Creating Relations in SQL
• Creates the Students relation. Observe that the type of each field is specified, and
enforced by the DBMS whenever tuples are added or modified.
CREATE TABLE Students (sid CHAR(20), name CHAR(20),login CHAR(10),age:

INTEGER, gpa: REAL)
• As another example, the Enrolled table holds information about courses that students take.
CREATE TABLE Enrolled (sid: CHAR(20),cid: CHAR(20), grade: CHAR(2))
Integrity Constraints (ICs) over Relations:
• IC: condition that must be true for any instance of the database; e.g., domain constraints.
• ICs are specified when schema is defined.
• ICs are checked when relations are modified.
• A legal instance of a relation is one that satisfies all specified ICs.
• DBMS should not allow illegal instances.
• If the DBMS checks ICs, stored data is more faithful to real-world meaning.
– Avoids data entry errors, too!
Primary Key Constraints
• A set of fields is a key for a relation if :
1. No two distinct tuples can have same values in all key fields, and
2. This is not true for any subset of the key.
– Part 2 false? A superkey.
– If there’s >1 key for a relation, one of the keys is chosen (by DBA) to be the
primary key.
• E.g., sid is a key for Students. (What about name?) The set {sid, gpa} is a superkey.
Primary and Candidate Keys in SQL
• Possibly many candidate keys (specified using UNIQUE), one of which is chosen as
the primary key.
Foreign Keys, Referential Integrity
• Foreign key : Set of fields in one relation that is used to `refer’ to a tuple in another
relation. (Must correspond to primary key of the second relation.) Like a `logical
pointer’.
• E.g. sid is a foreign key referring to Students:
– Enrolled(sid: string, cid: string, grade: string)
– If all foreign key constraints are enforced, referential integrity is achieved, i.e.,
no dangling references.
– Can you name a data model w/o referential integrity?

• Links in HTML!
Foreign Keys in SQL
• Only students listed in the Students relation should be allowed to enroll for courses.
Enforcing Integrity constraints
• Consider Students and Enrolled; sid in Enrolled is a foreign key that references
Students.
• What should be done if an Enrolled tuple with a non-existent student id is
inserted? (Reject it!)
• What should be done if a Students tuple is deleted?
– Also delete all Enrolled tuples that refer to it.

– Disallow deletion of a Students tuple that is referred to.
– Set sid in Enrolled tuples that refer to it to a default sid.
– (In SQL, also: Set sid in Enrolled tuples that refer to it to a special value null,
denoting ùnknown’ or ìnapplicable’.)
• Similar if primary key of Students tuple is updated.
Referential Integrity in SQL
• SQL/92 and SQL:1999 support all 4 options on deletes and updates.
– Default is NO ACTION (delete/update is rejected)
– CASCADE (also delete all tuples that refer to deleted tuple)
– SET NULL / SET DEFAULT (sets foreign key value of referencing tuple)
Where do ICs Come From?
• ICs are based upon the semantics of the real-world enterprise that is being described
in the database relations.
• We can check a database instance to see if an IC is violated, but we can NEVER infer
that an IC is true by looking at an instance.
– An IC is a statement about all possible instances!
– From example, we know name is not a key, but the assertion that sid is a key
is given to us.
• Key and foreign key ICs are the most common; more general ICs supported too.
Introduction To Views:
Views and Security
• Views can be used to present necessary information (or a summary), while
hiding details in underlying relation(s).
– Given YoungStudents, but not Students or Enrolled, we can find students s
who have are enrolled, but not the cid’s of the courses they are enrolled in.
• View Definition
• A relation that is not of the conceptual model but is made visible to a user as a “virtual
relation” is called a view.
• A view is defined using the create view statement which has the form
create view v as < query expression >
where <query expression> is any legal SQL expression. The view name is represented
by v.
• Once a view is defined, the view name can be used to refer to the virtual relation
that the view generates.
• Example Queries
• A view consisting of branches and their customers
• Uses of Views
• Hiding some information from some users
– Consider a user who needs to know a customer’s name, loan number and
branch name, but has no need to see the loan amount.
– Define a view
(create view cust_loan_data as
select customer_name, borrower.loan_number, branch_name
from borrower, loan
where borrower.loan_number = loan.loan_number )
– Grant the user permission to read cust_loan_data, but not borrower or loan
• Predefined queries to make writing of other queries easier
– Common example: Aggregate queries used for statistical analysis of data
– Processing of Views
• When a view is created
the query expression is stored in the database along with the view name
– the expression is substituted into any query using the view
• Views definitions containing views
– One view may be used in the expression defining another view
– A view relation v1 is said to depend directly on a view relation v2 if v2 is used in
the expression defining v1
– A view relation v1 is said to depend on view relation v2 if either v1 depends

directly to v2 or there is a path of dependencies from v1 to v2
– A view relation v is said to be recursive if it depends on itself.
• View Expansion
• A way to define the meaning of views defined in terms of other views.
• Let view v1 be defined by an expression e1 that may itself contain uses of view
relations.
• View expansion of an expression repeats the following replacement step:
repeat
Find any view relation vi in e1
Replace the view relation vi by the expression defining vi
until no more view relations are present in e1
• As long as the view definitions are not recursive, this loop will terminate
• With Clause
• The with clause provides a way of defining a temporary view whose definition is
available only to the query in which the with clause occurs.
• Find all accounts with the maximum balance
with max_balance (value) as

select max (balance)
from account
select account_number
from account, max_balance
where account.balance = max_balance.value
• Complex Queries using With Clause
• Find all branches where the total account deposit is greater than the average of the
total account deposits at all branches.
• Update of a View
• Create a view of all loan data in the loan relation, hiding the amount attribute
create view loan_branch as

select loan_number, branch_name
from loan
• Add a new tuple to loan_branch
insert into loan_branch
values ('L-37‘, 'Perryridge‘)
This insertion must be represented by the insertion of the tuple
('L-37', 'Perryridge', null ) into the loan relation
Destroying and Altering Tables and Views:
• Destroys the relation Students. The schema information and the tuples are deleted.
• Adding and Deleting Tuples
• Can insert a single tuple using:
• What if Policies is a weak entity set?
• Views
• A view is just a relation, but we store a definition, rather than a set of tuples.
Unit-II
Relational Algebra and Calculus
• Basic operations:
– Selection ( ) Selects a subset of rows from relation.
– Projection ( ) Deletes unwanted columns from relation.
– Cross-product ( ) Allows us to combine two relations.
– Set-difference ( ) Tuples in reln. 1, but not in reln. 2.
– Union ( ) Tuples in reln. 1 and in reln. 2.
• Additional operations:
– Intersection, join, division, renaming: Not essential, but (very!) useful.
• Since each operation returns a relation, operations can be composed! (Algebra is

“closed”.)
• Deletes attributes that are not in projection list.
• Schema of result contains exactly the fields in the projection list, with the same names
that they had in the (only) input relation.
• Projection operator has to eliminate duplicates! (Why??)
– Note: real systems typically don’t do duplicate elimination unless the user
explicitly asks for it. (Why not?)
• Selects rows that satisfy selection condition.
• No duplicates in result! (Why?)
• Schema of result identical to schema of (only) input relation.
• Result relation can be the input for another relational algebra operation! (Operator
composition.)
Set Operations:
Union, Intersection, Set-Difference
• All of these operations take two input relations, which must be union-compatible:
– Same number of fields.
– `Corresponding’ fields have the same type.
• What is the schema of result?

Cross-Product
• Each row of S1 is paired with each row of R1.
• Result schema has one field per field of S1 and R1, with field names ìnherited’ if
possible.
– Conflict: Both S1 and R1 have a field called sid.

• Condition Join:
• Result schema same as that of cross-product.
• Fewer tuples than cross-product, might be able to compute more efficiently
• Sometimes called a theta-join.
• Equi-Join: A special case of condition join where the condition c contains only
equalities.
• Result schema similar to cross-product, but only one copy of fields for which equality is
specified.
• Natural Join: Equijoin on all common fields.

Find names of sailors who’ve reserved boat #103
• Solution 1:
• Find names of sailors who’ve reserved a red boat
• Information about boat color only available in Boats; so need an extra join:
Find sailors who’ve reserved a red or a green boat
• Can identify all red or green boats, then find sailors who’ve reserved one of these boats:
Find sailors who’ve reserved a red and a green boat
• Previous approach won’t work! Must identify sailors who’ve reserved red boats, sailors
who’ve reserved green boats, then find the intersection (note that sid is a key for Sailors):
Relational Calculus:
• Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus
(DRC).
• Calculus has variables, constants, comparison ops, logical connectives and quantifiers.
– TRC: Variables range over (i.e., get bound to) tuples.
– DRC: Variables range over domain elements (= field values).
– Both TRC and DRC are simple subsets of first-order logic.
• Expressions in the calculus are called formulas. An answer tuple is essentially an

assignment of constants to variables that make the formula evaluate to true.
Tuple Relational Calculus:
TRC – a declarative query language
TRC Formulas
Atomic expressions are the following:
1. r ( t ) -- true if t is a tuple in the relation instance r
2. t1. Ai t2 .Aj compOp is one of {, ≥, =, ≠ }
3. t.Ai c c is a constant of appropriate type
Composite expressions:
1. Any atomic expression
2. F1 ∧ F2 ,, F1 ∨ F2 , ¬ F1 where F1 and F2 are expressions
3. (∀t) (F), (∃t) (F) where F is an expression and t is a tuple variable
Free Variables
Bound Variables – quantified variables
Obtain the rollNo, name of all girl students in the Maths Dept
{s.rollNo,s.name | student(s) ^ s.sex=‘F’ ^ (∃ d)(department(d) ^ d.name=‘Maths’ ^ d.deptId =
s.deptNo)}
s: free tuple variable
d: existentially bound tuple variable
Determine the departments that do not have any girl students
student (rollNo, name, degree, year, sex, deptNo, advisor) department (deptId, name, hod,
phone)
{d.name|department(d) ^ ¬(∃ s)(student(s) ^ s.sex =‘F’ ^ s.deptNo = d.deptId)
Obtain the names of courses enrolled by student named Mahesh
{c.name | course(c) ^ (∃s) (∃e) ( student(s) ^ enrollment(e) ^ s.name = “Mahesh” ^ s.rollNo =

e.rollNo ^ c.courseId = e.courseId }
Get the names of students who have scored ‘S’ in all subjects they have enrolled. Assume
that every student is enrolled in at least one course.
{s.name | student(s) ^ (∀e)(( enrollment(e) ^ e.rollNo = s.rollNo) → e.grade =‘S’)}
Get the names of students who have taken at least one course taught by their advisor
{s.name | student(s) ^ (∃e)(∃t)(enrollment(e) ^ teaching(t) ^ e.courseId = t.courseId ^ e.rollNo

= s.rollNo ^ t.empId = s.advisor}
Domain Relational Calculus:
• Query has the form:
DRC Formulas
• Atomic formula:
– , or X op Y, or X op constant
– op is one of
• Formula:
– an atomic formula, or
– , where p and q are formulas, or
– , where variable X is free in p(X), or
– , where variable X is free in p(X)
• The use of quantifiers and is said to bind X.
– A variable that is not bound is free.
Free and Bound Variables
• The use of quantifiers and in a formula is said to bind X.
– A variable that is not bound is free.
• Let us revisit the definition of a query:
Find all sailors with a rating above 7
• The condition ensures that the domain variables I, N, T and A are bound to fields of the
same Sailors tuple.
• The term to the left of `|’ (which should be read as such that) says that every tuple
that satisfies T>7 is in the answer.
• Modify this query to answer:
– Find sailors who are older than 18 or have a rating under 9, and are called ‘Joe’.
Find sailors rated > 7 who have reserved boat #103
• We have used as a shorthand for
• Note the use of to find a tuple in Reserves that `joins with’ the Sailors tuple under
consideration.
Find sailors rated > 7 who’ve reserved a red boat
• Observe how the parentheses control the scope of each quantifier’s binding.
• This may look cumbersome, but with a good user interface, it is very intuitive. (MS
Access, QBE)
Find sailors who’ve reserved all boats
• Find all sailors I such that for each 3-tuple either it is not a tuple in Boats or there is a
tuple in Reserves showing that sailor I has reserved it.
Find sailors who’ve reserved all boats (again!)
• Simpler notation, same query. (Much clearer!)
• To find sailors who’ve reserved all red boats:
Expressive Power of Algebra and Calculus
• It is possible to write syntactically correct calculus queries that have an infinite number
of answers! Such queries are called unsafe.
– e.g.,
• It is known that every query that can be expressed in relational algebra can be expressed
as a safe query in DRC / TRC; the converse is also true.
• Relational Completeness: Query language (e.g., SQL) can express every query that is
expressible in relational algebra/calculus.
Basic SQL Query
The Form of a Basic SQL Queries:
History
• IBM Sequel language developed as part of System R project at the IBM San Jose
Research Laboratory
• Renamed Structured Query Language (SQL)
• ANSI and ISO standard SQL:
– SQL-86
– SQL-89
– SQL-92
– SQL:1999 (language name became Y2K compliant!)
– SQL:2003
• Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets
from later standards and special proprietary features.
– Not all examples here may work on your particular system.
• Data Definition Language
• The schema for each relation, including attribute types.
• Integrity constraints
• Authorization information for each relation.
• Non-standard SQL extensions also allow specification of
– The set of indices to be maintained for each relations.
– The physical storage structure of each relation on disk.
• Create Table Construct
• An SQL relation is defined using the create table command:

create table r (A1 D1, A2 D2, ..., An Dn,
(integrity-constraint1),
...,
(integrity-constraintk))
– r is the name of the relation
– each Ai is an attribute name in the schema of relation r
– Di is the data type of attribute Ai
Example:
create table branch

(branch_name char(15),
branch_city char(30),
assets integer)
• Domain Types in SQL
• char(n). Fixed length character string, with user-specified length n.
• varchar(n). Variable length character strings, with user-specified maximum length n.
• int. Integer (a finite subset of the integers that is machine-dependent).
• smallint. Small integer (a machine-dependent subset of the integer domain type).
• numeric(p,d). Fixed point number, with user-specified precision of p digits, with n

digits to the right of decimal point.
• real, double precision. Floating point and double-precision floating point numbers,
with machine-dependent precision.
• float(n). Floating point number, with user-specified precision of at least n digits.
• More are covered in Chapter 4.
• Integrity Constraints on Tables
• not null
• primary key (A1, ..., An )
• Basic Insertion and Deletion of Tuples
• Newly created table is empty
• Add a new tuple to account
insert into account values ('A-9732', 'Perryridge', 1200)
– Insertion fails if any integrity constraint is violated
• Delete all tuples from account
delete from account
Note: Will see later how to delete selected tuples
• Drop and Alter Table Constructs
• The drop table command deletes all information about the dropped relation from the
database.
• The alter table command is used to add attributes to an existing relation:
alter table r add A D
where A is the name of the attribute to be added to relation r and D is the domain of A.
– All tuples in the relation are assigned null as the value for the new attribute.
• The alter table command can also be used to drop attributes of a relation:
alter table r drop A
where A is the name of an attribute of relation r
– Dropping of attributes not supported by many databases

Basic Query Structure
• Atypical SQL query has the form:
select A1, A2, ..., An

from r1, r2, ..., rm
where P
– Ai represents an attribute
– Ri represents a relation
– P is a predicate.
• This query is equivalent to the relational algebra expression. The result of an SQL
query is a relation.
• The select Clause
• The select clause list the attributes desired in the result of a query
– corresponds to the projection operation of the relational algebra
• Example: find the names of all branches in the loan relation:

select branch_name
from loan
• In the relational algebra, the query would be:
Õbranch_name (loan)
• NOTE: SQL names are case insensitive (i.e., you may use upper- or lower-case letters.)
– E.g. Branch_Name ≡ BRANCH_NAME ≡ branch_name
– Some people use upper case wherever we use bold font.
• SQL allows duplicates in relations as well as in query results.
• To force the elimination of duplicates, insert the keyword distinct after select.
• Find the names of all branches in the loan relations, and remove duplicates
select distinct branch_name from loan
• The keyword all specifies that duplicates not be removed.
select all branch_name from loan
• The select Clause (Cont.)
• An asterisk in the select clause denotes “all attributes”
select * from loan
• The select clause can contain arithmetic expressions involving the operation, +, –, *,
and /, and operating on constants or attributes of tuples.
E.g.:
select loan_number, branch_name, amount * 100 from loan
• The where Clause
• The where clause specifies conditions that the result must satisfy
– Corresponds to the selection predicate of the relational algebra.
• To find all loan number for loans made at the Perryridge branch with loan amounts
greater than $1200.
select loan_number from loan where branch_name = 'Perryridge' and amount

> 1200
• Comparison results can be combined using the logical connectives and, or, and not.
• The from Clause
• The from clause lists the relations involved in the query
– Corresponds to the Cartesian product operation of the relational algebra.
• Find the Cartesian product borrower X loan

Select *from borrower, loan
• The Rename Operation
• SQL allows renaming relations and attributes using the as clause:
old-name as new-name
• E.g. Find the name, loan number and loan amount of all customers; rename the column
name loan_number as loan_id.
• Tuple Variables
• Tuple variables are defined in the from clause via the use of the as clause.
• Find the customer names and their loan numbers and amount for all customers having a
loan at some branch.
11.Example Basic Sql Queries:
• We will use these instances of the Sailors and Reserves relations in our examples.
• If the key for the Reserves relation contained only the attributes sid and bid, how would
the semantics differ?
Basic SQL Query
• relation-list A list of relation names (possibly with a range-variable after each name).
• target-list A list of attributes of relations in relation-list
• qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of

) combined using AND, OR and NOT.
• DISTINCT is an optional keyword indicating that the answer should not contain
duplicates. Default is that duplicates are not eliminated!
Conceptual Evaluation Strategy
• Semantics of an SQL query defined in terms of the following conceptual evaluation

strategy:
– Compute the cross-product of relation-list.
– Discard resulting tuples if they fail qualifications.
– Delete attributes that are not in target-list.
– If DISTINCT is specified, eliminate duplicate rows.
• This strategy is probably the least efficient way to compute a query! An optimizer will find
more efficient strategies to compute the same answers.
A Note on Range Variables
• Really needed only if the same relation appears twice in the FROM clause. The
previous query can also be written as:
• Find sailors who’ve reserved at least one boat

• Would adding DISTINCT to this query make a difference?
• What is the effect of replacing S.sid by S.sname in the SELECT clause? Would adding
DISTINCT to this variant of the query make a difference?
Expressions and Strings
• Illustrates use of arithmetic expressions and string pattern matching: Find triples (of
ages of sailors and two fields defined by expressions) for sailors whose names begin and end
with B and contain at least three characters.
• AS and = are two ways to name fields in result.
• LIKE is used for string matching. `_’ stands for any one character and `%’ stands for 0
or more arbitrary characters.
String Operations
• SQL includes a string-matching operator for comparisons on character strings. The

operator “like” uses patterns that are described using two special characters:
– percent (%). The % character matches any substring.
– underscore (_). The _ character matches any character.
• Find the names of all customers whose street includes the substring “Main”.
select customer_name
from customer
where customer_street like '% Main%'
• Match the name “Main%” like 'Main\%' escape '\'
• SQL supports a variety of string operations such as

– concatenation (using “||”)
– converting from upper to lower case (and vice versa)
– finding string length, extracting substrings, etc.
Ordering the Display of Tuples
• List in alphabetic order the names of all customers having a loan in Perryridge branch
select distinct customer_name

from borrower, loan
where borrower loan_number = loan.loan_number and
branch_name = 'Perryridge'
order by customer_name
• We may specify desc for descending order or asc for ascending order, for each
attribute; ascending order is the default.
– Example: order by customer_name desc
Duplicates
• In relations with duplicates, SQL can define how many copies of tuples appear in the
result.
• Multiset versions of some of the relational algebra operators – given multiset relations
r1 and r2:
1.  (r1): If there are c1 copies of tuple t1 in r1, and t1 satisfies selections ,, then there
are c1 copies of t1 in  (r1).
2.  A (r ): For each copy of tuple t1 in r1, there is a copy of tuple A (t1) in A (r1) where
A (t1) denotes the projection of the single tuple t1.
3. r1 x r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuple t2 in r2, there are c1
x c2 copies of the tuple t1. t2 in r1 x r2
• Example: Suppose multiset relations r1 (A, B) and r2 (C) are as follows:
r1 = {(1, a) (2,a)} r2 = {(2), (3), (3)}

• Then B(r1) would be {(a), (a)}, while B(r1) x r2 would be
{(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)}
• SQL duplicate semantics:
select A1,, A2, ..., An
from r1, r2, ..., rm
where P
is equivalent to the multiset version of the expression:
Nested Queries:
• A very powerful feature of SQL: a WHERE clause can itself contain an SQL query!
(Actually, so can FROM and HAVING clauses.)
• To find sailors who’ve not reserved #103, use NOT IN.
• To understand semantics of nested queries, think of a nested loops evaluation: For

each Sailors tuple, check the qualification by computing the subquery.
Correlated Nested Queries:
Nested Queries with Correlation

• EXISTS is another set comparison operator, like IN.
• If UNIQUE is used, and * is replaced by R.bid, finds sailors with at most one
reservation for boat #103. (UNIQUE checks for duplicate tuples; * denotes all attributes. Why
do we have to replace * by R.bid?)
• Illustrates why, in general, subquery must be re-computed for each Sailors tuple.
Set comparison Operators:
Nested Sub queries
• SQL provides a mechanism for the nesting of subqueries.
• A subquery is a select-from-where expression that is nested within another query.
• A common use of subqueries is to perform tests for set membership, set comparisons,
and set cardinality.
• The set operations union, intersect, and except operate on relations and correspond to
the relational algebra operations   
• Each of the above operations automatically eliminates duplicates; to retain all

duplicates use the corresponding multiset versions union all, intersect all and except all.
Suppose a tuple occurs m times in r and n times in s, then, it occurs:
– m + n times in r union all s

– min(m,n) times in r intersect all s
– max(0, m – n) times in r except all s
• Find all customers who have a loan, an account, or both:
Find sid’s of sailors who’ve reserved a red or a green boat
More on Set-Comparison Operators
• We’ve already seen IN, EXISTS and UNIQUE. Can also use NOT IN, NOT EXISTS
and NOT UNIQUE.
• Also available: op ANY, op ALL, op IN
• Find sailors whose rating is greater than that of some sailor called Horatio:
Rewriting INTERSECT Queries Using IN
• Similarly, EXCEPT queries re-written using NOT IN.

• To find names (not sid’s) of Sailors who’ve reserved both red and green boats, just
replace S.sid by S.sname in SELECT clause. (What about INTERSECT query?)
Division in SQL
Aggregate Operators:
• These functions operate on the multiset of values of a column of a relation, and return a
value
avg: average value

min: minimum value
max: maximum value
sum: sum of values
count: number of values

Aggregate Operators examples
• Significant extension of relational algebra.

Motivation for Grouping
• So far, we’ve applied aggregate operators to all (qualifying) tuples. Sometimes, we

want to apply them to each of several groups of tuples.
• Consider: Find the age of the youngest sailor for each rating level.
– In general, we don’t know how many rating levels exist, and what the rating
values for these levels are!
– Suppose we know that rating values go from 1 to 10; we can write 10 queries
that look like this (!):
Queries With GROUP BY and HAVING
• The target-list contains (i) attribute names (ii) terms with aggregate operations (e.g.,
MIN (S.age)).
– The attribute list (i) must be a subset of grouping-list. Intuitively, each answer
tuple corresponds to a group, and these attributes must have a single value per group. (A group
is a set of tuples that have the same value for all attributes in grouping-list.)
Conceptual Evaluation
• The cross-product of relation-list is computed, tuples that fail qualification are

discarded, ùnnecessary’ fields are deleted, and the remaining tuples are partitioned into groups
by the value of attributes in grouping-list.
• The group-qualification is then applied to eliminate some groups. Expressions in
group-qualification must have a single value per group!
– In effect, an attribute in group-qualification that is not an argument of an

aggregate op also appears in grouping-list. (SQL does not exploit primary key semantics
here!)
• One answer tuple is generated per qualifying group.
Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors
• Find age of the youngest sailor with age 18, for each rating with at least 2 such sailors
and with every sailor under 60.
• Find age of the youngest sailor with age 18, for each rating with at least 2 sailors
between 18 and 60.
For each red boat, find the number of reservations for this boat Grouping over a join of
three relations.
• What do we get if we remove B.color=‘red’ from the WHERE clause and add a
HAVING clause with this condition?
• What if we drop Sailors and the condition involving S.sid?
• Find age of the youngest sailor with age > 18, for each rating with at least 2 sailors (of
any age)
• Shows HAVING clause can also contain a subquery.
• Compare this with the query where we considered only ratings with 2 sailors over 18!
• What if HAVING clause is replaced by:
– HAVING COUNT(*) >1
• Find those ratings for which the average age is the minimum over all ratings
• Aggregate operations cannot be nested! WRONG:
• Find the average account balance at the Perryridge branch.
Aggregate Functions – Group By
• Find the number of depositors for each branch.
Aggregate Functions – Having Clause
• Find the names of all branches where the average account balance is more than $1,200.
Null Values:
• Field values in a tuple are sometimes unknown (e.g., a rating has not been assigned) or
inapplicable (e.g., no spouse’s name).
– SQL provides a special value null for such situations.
• The presence of null complicates many issues. E.g.:
– Special operators needed to check if value is/is not null.
– Is rating>8 true or false when rating is equal to null? What about AND, OR
and NOT connectives?
– We need a 3-valued logic (true, false and unknown).
– Meaning of constructs must be defined carefully. (e.g., WHERE clause

eliminates rows that don’t evaluate to true.)
– New operators (in particular, outer joins) possible/needed.
Comparision Using Null Values:
• It is possible for tuples to have a null value, denoted by null, for some of their attributes
• null signifies an unknown value or that a value does not exist.

• The predicate is null can be used to check for null values.
– Example: Find all loan number which appear in the loan relation with null
values for amount.
select loan_number
from loan
where amount is null
• The result of any arithmetic expression involving null is null
– Example: 5 + null returns null
• However, aggregate functions simply ignore nulls
– More on next slide
• Null Values and Three Valued Logic
• Any comparison with null returns unknown
– Example: 5 < null or null <> null or null = null
Logical Connectives:AND,OR,NOT
• Three-valued logic using the truth value unknown:
– OR: (unknown or true) = true,

(unknown or false) = unknown
(unknown or unknown) = unknown
– AND: (true and unknown) = unknown,

(false and unknown) = false,
(unknown and unknown) = unknown
– NOT: (not unknown) = unknown
– “P is unknown” evaluates to true if predicate P evaluates to unknown
• Result of where clause predicate is treated as false if it evaluates to unknown
• Null Values and Aggregates
• Total all loan amounts

select sum (amount )
from loan
– Above statement ignores null amounts
– Result is null if there is no non-null amount
• All aggregate operations except count(*) ignore tuples with null values on the
aggregated attributes.
Impact on SQL Constructs:
Outer Joins:
Joined Relations**
• Join operations take two relations and return as a result another relation.
• These additional operations are typically used as subquery expressions in the from
clause
•
• Join condition – defines which tuples in the two relations match, and what attributes
are present in the result of the join.
• Join type – defines how tuples in each relation that do not match any tuple in the other
relation (based on the join condition) are treated.
• Joined Relations – Datasets for Examples
Relation loan
• Joined Relations – Examples
loan inner join borrower on loan.loan_number = borrower.loan_number
Joined Relations – Examples
loan natural inner join borrower

Joined Relations – Examples
• Natural join can get into trouble if two relations have an attribute with
same name that should not affect the join condition
– e.g. an attribute such as remarks may be present in many tables
• Solution:
– loan full outer join borrower using (loan_number)
• Derived Relations
• SQL allows a subquery expression to be used in the from clause
• Find the average account balance of those branches where the average account balance
is greater than $1200.

select branch_name, avg_balance
from (select branch_name, avg (balance)
from account
group by branch_name ) as branch_avg ( branch_name, avg_balance
) where avg_balance > 1200
Note that we do not need to use the having clause, since we compute the temporary
(view) relation branch_avg in the from clause, and the attributes of branch_avg can be used
directly in the where clause.
Complex Integrity Constraints in SQL:
• Integrity Constraints (Review)
• An IC describes conditions that every legal instance of a relation must satisfy.
– Inserts/deletes/updates that violate IC’s are disallowed.
– Can be used to ensure application semantics (e.g., sid is a key), or prevent
inconsistencies (e.g., sname has to be a string, age must be < 200)
• Types of IC’s: Domain constraints, primary key constraints, foreign key constraints,
general constraints.
– Domain constraints: Field values must be of right type. Always enforced.
General Constraints
• Useful when more general ICs than keys are involved.
• Can use queries to express constraint.
• Constraints can be named.

General Constraints
• Useful when more general ICs than keys are involved.
• Can use queries to express constraint.
• Constraints can be named.

Triggers and Active Databases:
• Trigger: procedure that starts automatically if specified changes occur to the DBMS
• Three parts:
– Event (activates the trigger)
– Condition (tests whether the triggers should run)
– Action (what happens if the trigger runs)
• Triggers: Example (SQL:1999)
CREATE TRIGGER youngSailorUpdate
AFTER INSERT ON SAILORS
REFERENCING NEW TABLE NewSailors
FOR EACH STATEMENT

INSERT
INTO YoungSailors(sid, name, age, rating)
SELECT sid, name, age, rating
FROM NewSailors N
WHERE N.age <= 18

Unit-III
Normal Forms
Introduction To Schema Refinement:
The Evils of Redundancy
• Redundancy is at the root of several problems associated with relational schemas:
– redundant storage, insert/delete/update anomalies
• Integrity constraints, in particular functional dependencies, can be used to identify
schemas with such problems and to suggest refinements.
• Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD,
or ACD and ABD).
• Decomposition should be used judiciously:
– Is there reason to decompose a relation?
– What problems (if any) does the decomposition cause?
Problems Caused by Redundancy:
• Storing the same information redundantly, that is, in more than one place within a
database, can lead to several problems:
• Redundant storage: Some information is stored repeatedly.
• Update anomalies: If one copy of such repeated data is updated, an inconsistency
• is created unless all copies are similarly updated.
• Insertion anomalies: It may not be possible to store some information unless
• some other information is stored as well.
• Deletion anomalies: It may not be possible to delete some information without
• losing some other information as well.
• Consider a relation obtained by translating a variant of the Hourly Emps entity set
Ex: Hourly Emps(ssn, name, lot, rating, hourly wages, hours worked)
• The key for Hourly Emps is ssn. In addition, suppose that the hourly wages attribute
• is determined by the rating attribute. That is, for a given rating value, there is only
• one permissible hourly wages value. This IC is an example of a functional dependency.
• It leads to possible redundancy in the relation Hourly Emps

Decompositions:
• Intuitively, redundancy arises when a relational schema forces an association between
attributes that is not natural.
• Functional dependencies (ICs) can be used to identify such situations and to suggest
revetments to the schema.
• The essential idea is that many problems arising from redundancy can be addressed by
rating hourly wages
8 10
5 7
ssn name lot rating hours worked
123-22-3666 Attishoo 48 8 40
231-31-5368 Smiley 22 8 30
131-24-3650 Smethurst 35 5 30
434-26-3751 Guldu 35 5 32
612-67-4134 Madayan 35 8 40
replacing a relation with a collection of smaller relations.
• Each of the smaller relations contains a subset of the attributes of the original relation.
• We refer to this process as decomposition of the larger relation into the smaller relations
• We can deal with the redundancy in Hourly Emps by decomposing it into two relations:
• Hourly Emps2(ssn, name, lot, rating, hours worked)
• Wages(rating, hourly wages)
Problems Related to Decomposition:
• Unless we are careful, decomposing a relation schema can create more problems than it
solves.
• Two important questions must be asked repeatedly:
• 1. Do we need to decompose a relation?
• 2. What problems (if any) does a given decomposition cause?
• To help with the rst question, several normal forms have been proposed for relations.
• If a relation schema is in one of these normal forms, we know that certain kinds of
• problems cannot arise. Considering the n
Functional Dependencies (FDs):
• A functional dependency X Y holds over relation R if, for every allowable instance r
of R:
– t1 r, t2 r, (t1) = (t2) implies (t1) = (t2)
– i.e., given two tuples in r, if the X values agree, then the Y values must also
agree. (X and Y are sets of attributes.)
• An FD is a statement about all allowable relations.
– Must be identified based on semantics of application.
– Given some allowable instance r1 of R, we can check if it violates some FD f,

but we cannot tell if f holds over R!
• K is a candidate key for R means that K R
– However, K R does not require K to be minimal!
Example: Constraints on Entity Set
Consider relation obtained from Hourly_Emps:

– Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked)
• Notation: We will denote this relation schema by listing the attributes: SNLRWH
– This is really the set of attributes {S,N,L,R,W,H}.
– Sometimes, we will refer to all attributes of a relation by using the relation

name. (e.g., Hourly_Emps for SNLRWH)
• Some FDs on Hourly_Emps:
– ssn is the key: S SNLRWH
– rating determines hrly_wages: R W
Constraints on a Relationship Set:
• Suppose that we have entity sets Parts, Suppliers, and Departments, as well as a
relationship set Contracts that involves all of them. We refer to the schema for
Contracts as CQPSD. A contract with contract id
• C species that a supplier S will supply some quantity Q of a part P to a department D.
• We might have a policy that a department purchases at most one part from any given
supplier.
• Thus, if there are several contracts between the same supplier and department,
• we know that the same part must be involved in all of them. This constraint is an FD,
DS ! P.
Reasoning about FDs
• Given some FDs, we can usually infer additional FDs:
– ssn did, did lot implies ssn lot
• An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold.
– = closure of F is the set of all FDs that are implied by F.
• Armstrong’s Axioms (X, Y, Z are sets of attributes):
– Reflexivity: If X → Y, then Y → X
– Augmentation: If X → Y, then XZ → YZ for any Z
– Transitivity: If X → Y and Y → Z, then X → Z

• These are sound and complete inference rules for FDs!
• Couple of additional rules (that follow from AA):
– Union: If X → Y and X → Z, then X → YZ
– Decomposition: If X → YZ, then X → Y and X → Z
• Example: Contracts(cid,sid,jid,did,pid,qty,value), and:
– C is the key: C → CSJDPQV
– Project purchases each part using single contract:
– JP → C
– Dept purchases at most one part from a supplier: S
– D → P
• JP → C, C → CSJDPQV imply JP → CSJDPQV
• SD → P implies SDJ → JP
• SDJ → JP, JP → CSJDPQV imply SDJ → CSJDPQV
• Computing the closure of a set of FDs can be expensive. (Size of closure is exponential
in # attrs!)
• Typically, we just want to check if a given FD X Y is in the closure of a set of FDs F.
An efficient check:
– Compute attribute closure of X (denoted ) wrt F:
• Set of all attributes A such that X A is in
• There is a linear time algorithm to compute this.
– Check if Y is in
• Does F = {A → B, B → C, C D → E } imply A → E?
– i.e, is A → E in the closure ? Equivalently, is E in ?

Closure of a Set of FDs
• The set of all FDs implied by a given set F of FDs is called the closure of F and is
denoted as F+.
• An important question is how we can infer, or compute, the closure of a given set F of
FDs.
• The following three rules, called Armstrong's Axioms, can be applied repeatedly to
infer all FDs implied by a set F of FDs.
• We use X, Y, and Z to denote sets of attributes over a relation schema R:
• Reflexivity: If X Y, then X !Y.
• Augmentation: If X ! Y, then XZ ! YZ for any Z.
• Transitivity: If X ! Y and Y ! Z, then X ! Z.
• Armstrong's Axioms are sound in that they generate only FDs in F+ when applied to a
set F of FDs.
• They are complete in that repeated application of these rules will generate all FDs in
the closure F+.
• It is convenient to use some additional rules while reasoning about F+:
• Union: If X ! Y and X ! Z, then X !YZ.
• Decomposition: If X ! YZ, then X !Y and X ! Z.
• These additional rules are not essential; their soundness can be proved using
Armstrong's Axioms.
Attribute Closure
• If we just want to check whether a given dependency, say, X → Y, is in the closure of a
set F of FDs,
• we can do so eciently without computing F+. We rst compute the attribute closure X+
with respect to F,
• which is the set of attributes A such that X → A can be inferred using the Armstrong
Axioms.
• The algorithm for computing the attribute closure of a set X of attributes is
• closure = X;
repeat until there is no change: {
if there is an FD U → V in F such that U subset of closure,
then set closure = closure union of V}
Normal Forms:
• The normal forms based on FDs are rst normal form (1NF), second normal form (2NF),
third normal form (3NF), and Boyce-Codd normal form (BCNF).
• These forms have increasingly restrictive requirements: Every relation in BCNF is also
in 3NF,
• every relation in 3NF is also in 2NF, and every relation in 2NF is in 1NF.
• A relation
• is in first normal form if every field contains only atomic values, that is, not lists or
sets.
• This requirement is implicit in our defition of the relational model.
• Although some of the newer database systems are relaxing this requirement
• 2NF is mainly of historical interest.
• 3NF and BCNF are important from a database design standpoint.

Normal Forms
• Returning to the issue of schema refinement, the first question to ask is whether any
refinement is needed!
• If a relation is in a certain normal form (BCNF, 3NF etc.), it is known that certain kinds
of problems are avoided/minimized. This can be used to help us decide whether

decomposing the relation will help
• Role of FDs in detecting redundancy:
– Consider a relation R with 3 attributes, ABC.
• No FDs hold: There is no redundancy here.
• Given A,B: Several tuples could have the same A value, and if so,
they’ll all have the same B value!
First Normal Form:
 1NF (First Normal Form)
• a relation R is in 1NF if and only if it has only single-valued attributes (atomic

values)
• EMP_PROJ (SSN, PNO, HOURS, ENAME, PNAME, PLOCATION)
PLOCATION is not in 1NF (multi-valued attrib.)
• solution: decompose the relation
EMP_PROJ2 (SSN, PNO, HOURS, ENAME, PNAME)
LOC (PNO, PLOCATION)
Second Normal Form:
 2NF (Second Normal Form)
• a relation R in 2NF if and only if it is in 1NF and every nonkey column depends
on a key not a subset of a key
• all nonprime attributes of R must be fully functionally dependent on a whole

key(s) of the relation, not a part of the key
• no violation: single-attribute key or no nonprime attribute
 2NF (Second Normal Form)
• violation: part of a key  nonkey
EMP_PROJ2 (SSN, PNO, HOURS, ENAME, PNAME)
SSN  ENAME
PNO  PNAME
EMP_PROJ3 (SSN, PNO, HOURS)
EMP (SSN, ENAME)
PROJ (PNO, PNAME)
Third Normal Form:
 3NF (Third Normal Form)
• a relation R in 3NF if and only if it is in 2NF and every nonkey column does not
depend on another nonkey column
• all nonprime attributes of R must be non-transitively functionally dependent on

a key of the relation
• violation: nonkey  nonkey
 3NF (Third Normal Form)
• SUPPLIER (SNAME, STREET, CITY, STATE, TAX)
SNAME  STREET, CITY, STATE
STATE  TAX (nonkey  nonkey)

SNAME  STATE  TAX (transitive FD)
SUPPLIER2 (SNAME, STREET, CITY, STATE)
TAXINFO (STATE, TAX)
• Boyce-Codd Normal Form (BCNF)
• Relation R with FDs F is in BCNF if, for all X A in
– A → X (called a trivial FD), or
– X contains a key for R.
• In other words, R is in BCNF if the only non-trivial FDs that hold over R are key
constraints.
– No dependency in R that can be predicted using FDs alone.
– If we are shown two tuples that agree upon the X value, we cannot infer the A value in
one tuple from the A value in the other.
– If example relation is in BCNF, the 2 tuples must be identical(since X is a key).
Third Normal Form (3NF)
• Relation R with FDs F is in 3NF if, for all X → A in
– A → X (called a trivial FD), or
– X contains a key for R, or
– A is part of some key for R.
• Minimality of a key is crucial in third condition above!
• If R is in BCNF, obviously in 3NF.

BCNF:
• If R is in 3NF, some redundancy is possible. It is a compromise, used when BCNF not
achievable (e.g., no ``good’’ decomp, or performance considerations).
– Lossless-join, dependency-preserving decomposition of R into a collection of
3NF relations always possible.
Properties of Decompositions :
• Suppose that relation R contains attributes A1 ... An. A decomposition of R consists of
replacing R by two or more relations such that:
– Each new relation scheme contains a subset of the attributes of R (and no

attributes that do not appear in R), and
– Every attribute of R appears as an attribute of one of the new relations.
• Intuitively, decomposing R means we will store instances of the relation schemes

produced by the decomposition, instead of instances of R.
• E.g., Can decompose SNLRWH into SNLRH and RW.
Example Decomposition
• Decompositions should be used only when needed.
– SNLRWH has FDs S SNLRWH and R W
– Second FD causes violation of 3NF; W values repeatedly associated with R values.

Easiest way to fix this is to create a relation RW to store these associations , and to remove W
from the main schema:
i.e., we decompose SNLRWH into SNLRH and RW
• The information to be stored consists of SNLRWH tuples. If we just store the

projections of these tuples onto SNLRH and RW, are there any potential problems that we
should be aware of?
Problems with Decompositions
• There are three potential problems to consider:

– Some queries become more expensive.
• e.g., How much did sailor Joe earn? (salary = W*H)
– Given instances of the decomposed relations, we may not be able to reconstruct

the corresponding instance of the original relation!
• Fortunately, not in the SNLRWH example.
– Checking some dependencies may require joining the instances of the

decomposed relations.
• Fortunately, not in the SNLRWH example.
• Tradeoff: Must consider these issues vs. redundancy.
Lossless Join Decompositions:
• Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if, for every
instance r that satisfies F:
– (r) (r) = r
• It is always true that r (r) (r)
– In general, the other direction does not hold! If it does, the decomposition is
lossless-join.
• Definition extended to decomposition into 3 or more relations in a straightforward way.
• It is essential that all decompositions used to deal with redundancy be lossless! (Avoids
Problem (2).)
• Dependency Preserving Decomposition
• Consider CSJDPQV, C is key, JP C and SD P.
– BCNF decomposition: CSJDQV and SDP
– Problem: Checking JP C requires a join!
Dependency preserving decomposition (Intuitive):
– If R is decomposed into X, Y and Z, and we enforce the FDs that hold on X, on

Y and on Z, then all FDs that were given to hold on R must also hold. (Avoids
Problem (3).)
• Projection of set of FDs F: If R is decomposed into X, ... projection of F onto X
enoted FX ) is the set of FDs U V in F+ (closure of F ) such that U, V are in X.
• Decomposition of R into X and Y is dependency preserving
if (FX union FY ) + = F +
i.e., if we consider only dependencies in the closure F + that can be checked in X
without considering Y, and in Y without considering X, these imply all dependencies in F +.
• Important to consider F +, not F, in this definition:
– ABC, A B, B C, C A, decomposed into AB and BC.
– Is this dependency preserving? Is C A preserved?????
• Dependency preserving does not imply lossless join:
– ABC, A B, decomposed into AB and BC.
• And vice-versa! (Example?)
Decomposition into BCNF
• Consider relation R with FDs F. If X Y violates BCNF, decompose R into R - Y
and XY.
– Repeated application of this idea will give us a collection of relations that are in
BCNF; lossless join decomposition, and guaranteed to terminate.
– e.g., CSJDPQV, key C, JP C, SD P, J S
– To deal with SD P, decompose into SDP, CSJDQV.
– To deal with J S, decompose CSJDQV into JS and CJDQV
• In general, several dependencies may cause violation of BCNF. The order in which we
``deal with’’ them could lead to very different sets of relations!
BCNF and Dependency Preservation
• In general, there may not be a dependency preserving decomposition into BCNF.
– e.g., CSZ, CS Z, Z C
– Can’t decompose while preserving 1st FD; not in BCNF.
• Similarly, decomposition of CSJDQV into SDP, JS and CJDQV is not dependency

preserving (w.r.t. the FDs JP C, SD P and J S).
– However, it is a lossless join decomposition.
– In this case, adding JPC to the collection of relations gives us a dependency

preserving decomposition.
• JPC tuples stored only for checking FD! (Redundancy!)
Decomposition into 3NF
• Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a
lossless join decomp into 3NF (typically, can stop earlier).
• To ensure dependency preservation, one idea:
– If X Y is not preserved, add relation XY.
– Problem is that XY may violate 3NF! e.g., consider the addition of CJP to
`preserve’ JP C. What if we also have J C?
• Refinement: Instead of the given set of FDs F, use a minimal cover for F.
Schema Refinement in Data base Design:
Constraints on an Entity Set
• Consider the Hourly Emps relation again. The constraint that attribute ssn is a key can
be expressed as an FD:
• { ssn }-> { ssn, name, lot, rating, hourly wages, hours worked}
• For brevity, we will write this FD as S -> SNLRWH, using a single letter to denote each
attribute
• In addition, the constraint that the hourly wages attribute is determined by the rating
attribute is an
FD: R -> W.
Constraints on a Relationship Set
• The previous example illustrated how FDs can help to rene the subjective decisions
made during ER design,
• but one could argue that the best possible ER diagram would have led to the same nal
set of relations.
• Our next example shows how FD information can lead to a set of relations that
eliminates some redundancy problems and is unlikely to be arrived at solely through ER

design.
Identifying Attributes of Entities
• in particular, it shows that attributes can easily be associated with the `wrong' entity set
during ER design.
• The ER diagram shows a relationship set called Works In that is similar to the Works
In relationship set
• Using the key constraint, we can translate this ER diagram into two relations:
• Workers(ssn, name, lot, did, since)

Identifying Entity Sets
• Let Reserves contain attributes S, B, and D as before, indicating that sailor S has a
reservation for boat B on day D.
• In addition, let there be an attribute C denoting the credit card to which the reservation
is charged.
• Suppose that every sailor uses a unique credit card for reservations. This constraint is
expressed by the FD S -> C. This constraint indicates that in relation Reserves, we store the
credit card number
for a sailor as often as we have reservations for that
• sailor, and we have redundancy and potential update anomalies.
Multivalued Dependencies:
• Suppose that we have a relation with attributes course, teacher, and book, which we
denote as CTB.
• The meaning of a tuple is that teacher T can teach course C, and book B is a
recommended text for the course.
• There are no FDs; the key is CTB. However, the recommended texts for a course are
independent of the instructor.
There are three points to note here:

• The relation schema CTB is in BCNF; thus we would not consider decomposing it
further if we looked only at the FDs that hold over CTB.
• There is redundancy. The fact that Green can teach Physics101 is recorded once per
recommended text for the course. Similarly, the fact that Optics is a text for Physics101
is recorded once per potential teacher.
• The redundancy can be eliminated by decomposing CTB into CT and CB.
• Let R be a relation schema and let X and Y be subsets of the attributes of R. Intuitively,
• the multivalued dependency X !! Y is said to hold over R if, in every legal
• The redundancy in this example is due to the constraint that the texts for a course are
independent of the instructors, which cannot be epressed in terms of FDs.
• This constraint is an example of a multivalued dependency, or MVD. Ideally, we
should model this situation using two binary relationship sets, Instructors with attributes
CT and Text with attributes CB.
• Because these are two essentially independent relationships, modeling them with a
single ternary relationship set with attributes CTB is inappropriate.
• Three of the additional rules involve only MVDs:
MVD Complementation: If X →→Y, then X →→ R − XY
MVD Augmentation: If X →→ Y and W > Z, then
WX →→ YZ.
• MVD Transitivity: If X →→ Y and Y →→ Z, then
X →→ (Z − Y ).
Fourth Normal Form:
• R is said to be in fourth normal form (4NF) if for every MVD X →→Y that holds over
R, one of the following statements is true:
• Y subset of X or XY = R, or
• X is a superkey.
Join Dependencies:
• A join dependency is a further generalization of MVDs. A join dependency (JD) ∞{
R1,….. Rn } is said to hold over a relation R if R1,…. Rn is a lossless-join

decomposition of R.
• An MVD X ->-> Y over a relation R can be expressed as the join dependency ∞ {
XY,X(R−Y)}
• As an example, in the CTB relation, the MVD C ->->T can be expressed as the join
dependency ∞{ CT, CB}
• Unlike FDs and MVDs, there is no set of sound and complete inference rules for JDs.
Fifth Normal Form:
• A relation schema R is said to be in fth normal form (5NF) if for every JD ∞{ R1,….
Rn } that holds over R, one of the following statements is true:
• Ri = R for some i, or
• The JD is implied by the set of those FDs over R in which the left side is a key for R.
• The following result, also due to Date and Fagin, identies conditions|again, detected
using only FD information|under which we can safely ignore JD information.
• If a relation schema is in 3NF and each of its keys consists of a single attribute,it is also
in 5NF.
Inclusion Dependencies:
• MVDs and JDs can be used to guide database design, as we have seen, although they
are less common than FDs and harder to recognize and reason about.
• In contrast, inclusion dependencies are very intuitive and quite common. However, they
typically have little influence on database design

• The main point to bear in mind is that we should not split groups of attributes that
participate in an inclusion dependency.
• Most inclusion dependencies in practice are key-based, that is, involve only keys.
UNIT-IV
Transaction Management
ACID Properties
Consistency:
Execution of a transaction in isolation (that is, with no other transaction executing concurrently)
preserves the consistency of the database. This is typically the responsibility of the application
programmer who codes the transactions.
Atomicity:
Either all operations of the transaction are reflected properly in the database, or none are. Clearly
lack of atomicity will lead to inconsistency in the database.
Isolation:
When multiple transactions execute concurrently, it should be the case that, for every pair of
transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started, or Tj
started execution after Ti finished. Thus, each transaction is unaware of other transactions
executing concurrently with it. The user view of a transaction system requires the isolation
property, and the property that concurrent schedules take the system from one consistent state to
another. These requirements are satisfied by ensuring that only serializable schedules of
individually consistency preserving transactions are allowed.
Durability:
After a transaction completes successfully, the changes it has made to the database persist, even if
there are system failures.
Need for concurrency control

Ensuring the isolation property of all concurrent transactions is the responsibility of a database
management system. A way to execute concurrent transactions in serially. However, concurrent
execution of transactions provides significant performance benefits.
Transaction and its properties:
Transaction: A transaction is unit of program execution that accesses and updates various data
items. In general, a transaction is initiated by user program through high level data manipulation
language or programming language (Example: SQL, C, Java etc.). where it delimits with start
transaction and end transaction. Now the operations in between these two statements are executed
as a transaction.
Transactions access data using two operations:
 Read(X): Which transfers the data item X from the database to local buffer to execute
the read operation.
 Write(X): Which transfers the data item X from the local buffer of the transaction to
write back to the database.
In real Database system, the write operation temporarily stored in memory and updates later on
disk.
Example:
Bank transactions like credit, debit or transfer of amount from one account to another or updates
on same account.
Let Ti be a transaction that transfers 5000 from account A to account B. Initially in account A
10000 and account B 20000 balance existed. This can be represented as:
Transaction ID List of operations

Ti: Start
read(A);
A:=A-5000;
write(A);
Read(B);
B=B+5000;
write(B);
Stop;
Now the ACID properties should hold by transaction T i :

Consistency: The database is consistent before and after transaction execution of T i. the database
remains consistent with sum of A and B at before and after transfer transaction executed. i.e
Initially before Transaction:

A=10000 and B=20000 A+B
=10000+20000=30000
After Transaction (transfer of 5000 from A to B)
A=10000-5000=5000
Let Failure occurs at this point
Now A+B=5000+20000=25000.
Hence, the sum of database content befoe and after is not same as 30000 and 25000
The sum of A and B is unchanged by the execution of the transaction In

general, consistency requirements include:
 Explicitly specified integrity constraints such as primary keys and foreign k
 Implicit integrity constraints
e.g. sum of balances of all accounts, minus sum of loan amounts must equal value of cash- in-hand
A transaction must see a consistent database. During transaction execution the database may be
temporarily inconsistent. When the transaction completes successfully the database must be
consistent. Erroneous transaction logic can lead to inconsistency.
Atomicity: All operations in the transaction should be executed without any failure. Before
execution of transaction Ti , the A nad B accounts with initial values as 10000 and 20000. Suppose
during the transfer transaction a failure due to power failure, hardware and software errors will
occurs. Suppose, after the write(A) and before write(B), a failure occurs then the values of A and B
are 5000 and 20000. The system destroys 5000 as a result of this transaction. Therefore sum(A+B)
after and before transactions are not consistent, then it leads to inconsistency.
Durability:
The durability property guarantees that, once the transaction completes successfully, all the updates
on the database must be persistent, even if there is a failure after the transaction completes.
Ensuring durability is the responsibility of recovery management component. Hence the user has
been notified about successful completion of transaction, it must be the case with
Initially before Transaction:

A=10000 and B=20000 A+B
=10000+20000=30000
After Transaction (transfer of 5000 from A to B)
A=10000-5000=5000
Let Failure occurs at this point
Now A+B=5000+20000=25000.
Hence, the sum of database content before and after is not same as 30000 and
25000.
no system failure will result no loss of data corresponding to the transfer of funds.
Isolation:
Isolation can be ensured trivially by running transactions serially that is, one after the other.
However, executing multiple transactions concurrently has significant benefits, as we
will see later. For concurrent operations of multiple transactions leads to inconsistent state. Ensuring
isolation is the responsibility of concurrency control component.
Let Ti and Tj are two transactions executed concurrently, their operations interleaved in desirable
way resulting an inconsistent state.
Transaction State:
A transaction must be in one of the following states:
Partially committed Committed
Active
Failed Active
Active State:
The initial state of the transaction while it is executing.
Partially Committed:
After the final statement of the transaction has been executed.
Failed:
The transaction no longer proceed with normal execution, then it is in failed state.
Aborted:
After the transaction has been rolled back and the database has been restored to the prior to the
state of the transaction. Two options after it has been aborted:
 Restart the transaction can be done only if no internal logical error
 Kill the transaction
Committed: After successful completion of the transaction.
Schedule and Recoverability

Schedule – A sequences of instructions that specify the chronological order in which instructions
of concurrent transactions are executed a schedule for a set of transactions must
consist of all instructions of those transaction must preserve the order in which the instructions
appear in each individual transaction.
 A transaction that successfully completes its execution will have a commit instructions as
the last statement by default transaction assumed to execute commit instruction as its last
step.
 A transaction that fails to successfully complete its execution will have an abort instruction
as the last statement.
Concurrent executions:
Transaction processing system will allow multiple transactions to run concurrently. It leads to
several problems like inconsistency of the data. Ensuring consistency of concurrent operations
requires additional work to make serializable. Even though concurrent transactions has two major
reasons:
a) Improved throughput and resource utilization.
b) Reduced waiting time.
Concurrency Control Schemes:

Schedule 1
• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
• A serial schedule in which T1 is followed by T2 :
Schedule 2
• A serial schedule in which T2 is followed by T1 :
Schedule 1 and schedule 2 are serial schedules. Each schedule consists various transactions,
where series of instructions belonging to single transaction appear together in one schedule.
Schedule 3 is example of concurrent transaction. In this two transactions T1 and T2 running
concurrently. In this the OS may execute a part from T 1 and switch to the second transactions T 2
and then switch back to the first transaction for some time and so on with multiple transactions. i.e.
CPU time is shared among all the transactions
Schedule 3
• Let T1 and T2 be the transactions defined previously. The following scheme is not a
serial schedule, but it is equivalent to Schedule 1.
Schedule 4
T1 T2
read(A)
A:=A-50
read(B)
temp=A*0.1
A;=A-temp
write(A)
read(B)
write(A)
read(B)
B:=B+50
write(B)
B:=B+temp
write(B)
In schedule 4, the CPU slicing is in different way to execute the transactions. It leads to the sum of
A and B are different from before and after transactions as 950 and 2100. So this leads to
inconsistent state.
Serializability and schedules

Basic Assumption – Each transaction preserves database consistency.
Thus serial execution of a set of transactions preserves database consistency.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule.
Different forms of schedule equivalence give rise to the notions of:
1. Conflict Serializability:
A schedule is conflict serializable, if it is conflict equivalent to a serial schedule.
Let a schedule S, there are two consecutive operations I i and Ij of transactions Ti and Tj . If Ii and Ij
refers to different data items, then we can swap I i and Ij.
If it refers the same data object then the order of two operations deal with four cases as given
below.
Ii Ij
read(Q) read(Q) The order of Ii and Ij. does not matter
read(Q) write(Q) If Ii comes before Ij then it waits until Ij finish If
Ij comes before Ii then no matter of order
write(Q) read(Q) Same as above

write(Q) write(Q) It does not matter order these two execution.
Shedule- 3
In the above schedule, the write(A) of T 1 conflicts with the read(A) of T2 . Howerver write(A) of T2
does not reflect with read(B) of T1, because the two operations doest not refer the same data item.
T1 T2
read(A)
write(A)
read(A)
read(B)
write(A)
write(B)
read(B)
write(B)
Schedule 5 – schedule 3 after swapping of pair of instructions
T1 T2
read(A)
write(A)
read(B)
write(B)
read(A)
write(A)
read(B)
write(B)
Schedule 6 – A serial schedule euivallent to schedule 3
Conflicting Instructions
• Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there
exists some item Q accessed by both li and lj, and at least one of these instructions wrote
Q.
li = read(Q), lj = read(Q). li and lj don’t conflict.
li = read(Q), lj = write(Q). They conflict.
li = write(Q), lj = read(Q). They conflict
li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a (logical) temporal order between them. If li and lj are
consecutive in a schedule and they do not conflict, their results would remain the same even if they
had been interchanged in the schedule.
Conflict Serializability
• If a schedule S can be transformed into a schedule S´ by a series of swaps of non-
conflicting instructions, we say that S and S´ are conflict equivalent.
• We say that a schedule S is conflict serializable if it is conflict equivalent to a serial
schedule
Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1, by series of
swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable.
Example of a schedule that is not conflict serializable:
We are unable to swap instructions in the above schedule to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.
2. View Serializability:
Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the
following three conditions are met, for each data item Q,
If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’ also transaction Ti
must read the initial value of Q.
If in schedule S transaction Ti executes read(Q), and that value was produced by transaction T j (if
any), then in schedule S’ also transaction Ti must read the value of Q that was produced by the same
write(Q) operation of transaction Tj .
The transaction (if any) that performs the final write(Q) operation in schedule S must also perform
the final write(Q) operation in schedule S’.
As can be seen, view equivalence is also based purely on reads and writes alone.
 A schedule S is view serializable if it is view equivalent to a serial schedule.
 Every conflict serializable schedule is also view serializable.
 Below is a schedule which is view-serializable but not conflict serializable.
 What serial schedule is above equivalent to?

 Every view serializable schedule that is not conflict serializable has blind writes.
 Other Notions of Serializability
The schedule below produces same outcome as the serial schedule < T1, T5 >, yet is not conflict
equivalent or view equivalent to it.
Determining such equivalence requires analysis of operations other than read and write.
Recoverability:
• Recoverable schedule — if a transaction Tj reads a data item previously written by a
transaction Ti , then the commit operation of Ti appears before the commit operation of
Tj.
The following schedule (Schedule 11) is not recoverable if T9 commits immediately
after the read
If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent database
state. Hence, database must ensure that schedules are recoverable.
Cascading Rollbacks:
Cascading rollback – a single transaction failure leads to a series of transaction
rollbacks. Consider the following schedule where none of the transactions has yet
committed (so the schedule is recoverable)
If T10 fails, T11 and T12 must also be rolled back.

• Can lead to the undoing of a significant amount of work
Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions Ti and Tj
such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before
the read operation of Tj.
• Every cascadeless schedule is also recoverable
• It is desirable to restrict the schedules to those that are cascadeless
Concurrency Control
• A database must provide a mechanism that will ensure that all possible schedules are
– either conflict or view serializable, and
– are recoverable and preferably cascadeless
• A policy in which only one transaction can execute at a time generates serial schedules,
but provides a poor degree of concurrency
– Are serial schedules recoverable/cascadeless?
• Testing a schedule for serializability after it has executed is a little too late!
• Goal – to develop concurrency control protocols that will assure serializability.
Implementation of Isolation:
• Schedules must be conflict or view serializable, and recoverable, for the sake of
database consistency, and preferably cascadeless.
• A policy in which only one transaction can execute at a time generates serial schedules,
but provides a poor degree of concurrency.
• Concurrency-control schemes tradeoff between the amount of concurrency they allow
and the amount of overhead that they incur.
• Some schemes allow only conflict-serializable schedules to be generated, while others
allow view-serializable schedules that are not conflict-serializable.
Testing for Serializability:
• Consider some schedule of a set of transactions T1, T2, ..., Tn

• Precedence graph — a direct graph where the vertices are the transactions (names).
• We draw an arc from Ti to Tj if the two transaction conflict, and Ti accessed the data
item on which the conflict arose earlier.
• We may label the arc by the item that was accessed.
Test for Conflict Serializability

• A schedule is conflict serializable if and only if its precedence graph is acyclic.
• Cycle-detection algorithms exist which take order n2 time, where n is the number of
vertices in the graph.
– (Better algorithms take order n + e where e is the number of edges.)
• If precedence graph is acyclic, the serializability order can be obtained by a topological

sorting of the graph.
– This is a linear order consistent with the partial order of the graph.
– For example, a serializability order for Schedule A would be
T5  T1  T3  T2  T4
• Are there others?
Test for View Serializability
 The precedence graph test for conflict serializability cannot be used directly to test for
view serializability.
o Extension to test for view serializability has cost exponential in the size of the precedence graph.
 The problem of checking if a schedule is view serializable falls in the class of NP-
complete problems. Thus existence of an efficient algorithm is extremely unlikely.
 However practical algorithms that just check some sufficient conditions for view
serializability can still be used.
Concurrency Control:
Concurrency Control vs. Serializability Tests
 Concurrency-control protocols allow concurrent schedules, but ensure that the
schedules are conflict/view serializable, and are recoverable and cascadeless .
 Concurrency control protocols generally do not examine the precedence graph as it is
being created
 Instead a protocol imposes a discipline that avoids nonseralizable schedules.
 Different concurrency control protocols provide different tradeoffs between the amount
of concurrency they allow and the amount of overhead that they incur.
 Tests for serializability help us understand why a concurrency control protocol is
correct.
Weak Levels of Consistency
 Some applications are willing to live with weak levels of consistency, allowing
 schedules that are not serializable
o E.g. a read-only transaction that wants to get an approximate total balance of all
 Accounts.
 E.g. database statistics computed for query optimization can be approximate (why?)
o Such transactions need not be serializable with respect to other transactions
 Tradeoff accuracy for performance
 Levels of Consistency in SQL-92
Serializable — default
Repeatable read — only committed records to be read, repeated reads of same record must return
same value. However, a transaction may not be serializable – it may find some records inserted by
a transaction but not find others.
Read committed — only committed records can be read, but successive reads of record may
return different (but committed) values.
Read uncommitted — even uncommitted records may be read.
Transaction Definition in SQL Data manipulation language must include a construct for specifying
the set of actions that comprise a transaction.
• In SQL, a transaction begins implicitly.
• A transaction in SQL ends by:
o Commit work commits current transaction and begins a new one.
o Rollback work causes current transaction to abort.
• In almost all database systems, by default, every SQL statement also commits implicitly
if it executes successfully Implicit commit can be turned off by a database directive
E.g. in JDBC, connection.setAutoCommit(false);
Types of Locks
There are various modes to lock data items. They are
 Shared(S): If a transaction Ti has shared mode lock on data item Q then Ti can
read but not write Q. lock-S(Q) instruction is used in shared mode.
 Exclusive(X): If a transaction has obtained an exclusive mode lock on data item
Q, then Ti can perform both read and write. lock-X(Q) instruction is used to lock
in exclusive mode.
A lock is a mechanism to control concurrent access to a data item. Lock requests are made to
concurrency-control manager. Transaction can proceed only after request is granted.
Lock-compatibility matrix
A transaction may be granted a lock on an item if the requested lock is compatible with
locks already held on the item by other transactions Any number of transactions can hold shared
locks on an item, but if any transaction holds an exclusive on the item no other transaction may
hold any lock on the item. If a lock cannot be granted, the requesting transaction is made to wait till
all incompatible locks held by other transactions have been released. The lock is then granted.
Example of a transaction performing locking:
T1: T2: T3: T4:
lock-X(B); lock-S(A); lock-X(B); lock-S(A);

read (B); read (A); read (B); read (A);
B:=B-50; unlock(A); B:=B-50; lock-S(B);
write(B); write(B); read (B);
unlock(B); lock-S(B); display(A+B);
read (B); lock-X(A); unlock(A);
lock-X(A); unlock(B); read (A); unlock(B);
read (A); display(A+B) A:=A+50;
A:=A+50; write(A);
write(A);
unlock(A); unlock(B);
unlock(A);
Locking as above is not sufficient to guarantee serializability — if A and B get updated in-
between the read of A and B, the displayed sum would be wrong.
A locking protocol is a set of rules followed by all transactions while requesting and releasing
locks. Locking protocols restrict the set of possible schedules.
Consider the partial schedule
Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to release its
lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its lock on A. Such a
situation is called a deadlock. To handle a deadlock one of T3 or T4 must be rolled back and its
locks released. The potential for deadlock exists in most locking protocols. Deadlocks are a
necessary evil.
Starvation is also possible if concurrency control manager is badly designed. For example:
A transaction may be waiting for an X-lock on an item, while a sequence of other transactions
request and are granted an S-lock on the same item. The same transaction is repeatedly rolled back
due to deadlocks. Concurrency control manager can be designed to prevent starvation.
Two phase locking

Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In
this protocol transactions can be serialized in the order in which they commit.
There can be conflict serializable schedules that cannot be obtained if two-phase locking is used.
However, in the absence of extra information (e.g., ordering of access to data), two-hase locking is
needed for conflict serializability in the following sense:
Given a transaction Ti that does not follow two-phase locking, we can find a transaction Tj that
uses two-phase locking, and a schedule for Ti and Tj that is not conflict serializable.
Lock Conversions:
Two-phase locking with lock conversions:

– First Phase:
can acquire a lock-S on item can acquire a lock-
X on item
can convert a lock-S to a lock-X (upgrade)
– Second Phase:
can release a lock-S can release a lock-
X
can convert a lock-X to a lock-S (downgrade)
Two-Phase Locking Protocol
This protocol ensures conflict-serializable schedules.
Phase 1: Growing Phase
• transaction may obtain locks
• transaction may not release locks
Phase 2: Shrinking Phase
• transaction may release locks
• transaction may not obtain locks
The protocol assures serializability. It can be proved that the transactions can be serialized in the
order of their lock points (i.e. the point where a transaction acquired its final lock).
Two-phase locking does not ensure freedom from deadlocks. Cascading roll-back is possible
under two-phase locking. To avoid this, follow a modified protocol called strict two-phase
locking. Here a transaction must hold all its exclusive locks till it commits/
This protocol assures serializability. But still relies on the programmer to insert the various locking
instructions.
Automatic Acquisition of Locks :
A transaction Ti issues the standard read/write instruction, without explicit locking calls.
The operation read(D) is processed as:
if Ti has a lock on D then

read(D) else begin
if necessary wait until no other transaction has a lock-X on D
grant Ti a lock-S on D; read(D)
end write(D) is processed as:
if Ti has a lock-X on D then
write(D) else begin
if necessary wait until no other trans. has any lock on D, if Ti has a lock-S
on D
then
upgrade lock on D to lock-X else
grant Ti a lock-X on D
write(D)
end;
All locks are released after commit or abort Implementation of Locking
A lock manager can be implemented as a separate process to which transactions send lock and
unlock requests
The lock manager replies to a lock request by sending a lock grant messages (or a message
asking the transaction to roll back, in case of a deadlock)
The requesting transaction waits until its request is answered
The lock manager maintains a data-structure called a lock table to record granted locks and
pending requests
The lock table is usually implemented as an in-memory hash table indexed on the name of the
data item being locked
Deadlock: A deadlock is a condition wherein two or more tasks are waiting for each other in
order to be finished but none of the task is willing to give up the resources that other task needs. In
this situation no task ever gets finished and is in waiting state forever.
Time stamp based concurrency control
Each transaction is issued a timestamp when it enters the system. If an old transaction Ti has time-
stamp TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti) <TS(Tj). The
protocol manages concurrent execution such that the time-stamps determine the serializability
order. In order to assure such behavior, the protocol maintains for each data Q two timestamp
values:
• W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q)
successfully.
• R-timestamp(Q) is the largest time-stamp of any transaction that executed read(Q)
successfully.
The timestamp ordering protocol ensures that any conflicting read and write
operations are executed in timestamp order. Suppose a transaction T i issues a read(Q)
o If TS(Ti)  W-timestamp(Q), then Ti needs to read a value of Q that was already
overwritten. Hence, the read operation is rejected, and Ti is rolled back.
o If TS(Ti) W-timestamp(Q), then the read operation is executed, and R-
timestamp(Q) is set to max(R-timestamp(Q), TS(Ti)).Suppose that transaction Ti
issues write(Q).
o If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed
previously, and the system assumed that that value would never be produced.
Hence, the write operation is rejected, and Ti is rolled back.
o If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q.
Hence, this write operation is rejected, and Ti is rolled back.Otherwise, the write
operation is executed, and W-timestamp(Q) is set to TS(Ti).
Recovery Techniques
To see where the problem has occurred we generalize the failure into various categories, as
follows:
Transaction failure:When a transaction is failed to execute or it reaches a point after which it

cannot be completed successfully it has to abort. This is called transaction failure. Where only few
transaction or process are hurt.
Recovery and Atomicity:
Modifying the database without ensuring that the transaction will commit may leave the database
in an inconsistent state.
Consider transaction Ti that transfers $50 from account A to account B; goal is either to
perform all database modifications made by Ti or none at all.
Several output operations may be required for Ti (to output A and B). A failure may occur after one
of these modifications have been made but before all of them are made.
To ensure atomicity despite failures, we first output information describing the modifications to
stable storage without modifying the database itself. Two approaches for recovery are log- based
recovery, and shadow-paging. Assume (initially) that transactions run serially, that is, one after
the other.
Recovery Algorithms
Recovery algorithms are techniques to ensure database consistency and transaction atomicity and
durability despite failures. Recovery algorithms have two parts:
• Actions taken during normal transaction processing to ensure enough information exists
to recover from failures
• Actions taken after a failure to recover the database contents to a state that ensures
atomicity, consistency and durability.
Log-Based Recovery:
A log is kept on stable storage.
The log is a sequence of log records, and maintains a record of update activities on the database.
Log record has 3 fields:
• Transaction Identifier: Unique identifier of the transaction that performed write operation.
• Data item identifier: Unique identification of the data item written
• Old value: Value of the item prior to the write
• New value: Value of the item after write transaction
Various log records are:
• <Ti start> log record Before Ti executes write(X),
• <Ti, X, V1, V2> is written, where V1 is the value of X before the write, and V2 is
the value to be written to X. Log record notes that Ti has performed a write on data
item X j Xj had value V1 before the write, and will have value V2 after the write.
• <Ti commit> Transaction Ti has committed
• <Ti abort> Transaction Ti has aborted
Two approaches using logs
• Deferred database modification
• Immediate database modification
Immediate update
Immediate Database Modification
The immediate database modification scheme allows database updates of an uncommitted
transaction to be made as the writes are issued since undoing may be needed, update logs must
have both old value and new value Update log record must be written before database item is
written. Assume that the log record is output directly to stable storage can be extended to postpone
log record output, so long as prior to execution of an output(B) operation for a data block B, all log
records corresponding to items B must be flushed to stable storage.
• Output of updated blocks can take place at any time before or after transaction commit
• Order in which blocks are output can be different from the order in which they are
written.
Recovery procedure has two operations instead of one:
undo(Ti) restores the value of all data items updated by Ti to their old values, going backwards
from the last log record for Ti
redo(Ti) sets the value of all data items updated by Ti to the new values, going forward from the
first log record for Ti
Both operations must be idempotent, i.e., even if the operation is executed multiple times the
effect is the same as if it is executed once. Needed since operations may get re-executed during
recovery.
When recovering after failure:
Transaction Ti needs to be undone if the log contains the record
<Ti start>, but does not contain the record <Ti commit>.
Transaction Ti needs to be redone if the log contains both the record <Ti start> and the record
<Ti commit>.
Undo operations are performed first, then redo operations.
Example: Immediate Database Modification
Crashes can occur while the transaction is executing the original updates, or while recovery action
is being taken example transactions T0 and T1 (T0 executes before T1):
T0 : T1 :
read (A) read (C)
A: - A - 50 C:-C- 100
Write (A) write (C)
read (B) B:-
B + 50
write (B)
Let accounts A, B and C initially has 1000, 2000 and 700 respectively. The log entry of both the
transactions are:
Below we show the log as it appears at three instances of time. Recovery actions in each case
above are:
• undo (T0): B is restored to 2000 and A to 1000.
• undo (T1) and redo (T0): C is restored to 700, and then A and B are set to 950 and 2050
respectively.
• redo (T0) and redo (T1): A and B are set to 950 and 2050 respectively. Then C is set to 600
Deferred update
Deferred Database Modification
The deferred database modification scheme records all modifications to the log, but defers all
the writes to after partial commit.
Assume that transactions execute serially
• <Ti start>transaction Ti started.
A write(X) operation results in a log record :
• <Ti, X, V> being written, where V is the new value for X
Note: old value is not needed for this scheme
The write is not performed on X at this time, but is deferred.
When Ti partially commits,
• <Ti commit> is written to the log
Finally, the log records are read and used to actually execute the previously deferred writes. During
recovery after a crash, a transaction needs to be redone if and only if both
• <Ti start> and<Ti commit> are there in the log.
Redoing a transaction Ti
• < redoTi> sets the value of all data items updated by the transaction to the new
values.
Crashes can occur while the transaction is executing the original updates, or while recovery action
is being taken example transactions T0 and T1 (T0 executes before T1):
T0 : T1 :
read (A) read (C)
A: - A - 50 C:-C- 100
Write (A) write (C)
read (B) B:-
B + 50
write (B)
Let accounts A,B and C initially has 1000, 2000 and 700 respectively. The log entry of both the
transactions are:
<T0 start>
<T0, A, 950>
<T0, B, 2050>
<T0, commit>
<T1 start>
<T1, C, 600>
<T1, commit>
Shadow paging
Shadow paging is an alternative to log-based recovery; this scheme is useful if transactions execute
serially
Idea: maintain two page tables during the lifetime of a transaction –the current page table, and the
shadow page table
Store the shadow page table in nonvolatile storage, such that state of the database prior to
transaction execution may be recovered.
Shadow page table is never modified during execution
To start with, both the page tables are identical. Only current page table is used for data item
accesses during execution of the transaction.
Whenever any page is about to be written for the first time, A copy of this page is made onto an
unused page.
The current page table is then made to point to the copy
The update is performed on the copy
To commit a transaction :
1. Flush all modified pages in main memory to disk
2. Output current page table to disk
3. Make the current page table the new shadow page table, as follows:
• keep a pointer to the shadow page table at a fixed (known) location on disk.
• to make the current page table the new shadow page table, simply update the
pointer to point to current page table on disk
• Once pointer to shadow page table has been written, transaction is committed.
• No recovery is needed after a crash — new transactions can start right away,
using the shadow page table.
• Pages not pointed to from current/shadow page table should be freed (garbage
collected).
• Advantages of shadow-paging over log-based schemes
o no overhead of writing log records
o recovery is trivial
• Disadvantages :
o Copying the entire page table is very expensive
o Can be reduced by using a page table structured like a B+-tree
o No need to copy entire tree, only need to copy paths in the tree
that lead to updated leaf nodes
o Commit overhead is high even with above extension
o Need to flush every updated page, and page table
o Data gets fragmented (related pages get separated on disk)
o After every transaction completion, the database pages
containing old versions of modified data need to be garbage
collected
o Hard to extend algorithm to allow transactions to run concurrently
 Easier to extend log based schemes

DBMS Lecture Notes

Uploaded by

DBMS Lecture Notes

Uploaded by

LECTURE NOTES

II B. Tech I Semester (JNTUA-R17)

Mrs. G. Indiravathi, Assistant Professor

CHADALAWADA RAMANAMMA ENGINEERING COLLEGE

Department of Computer Science and Engineering

Introduction to Database systems

DATA: - Any factor that can be stored.

Example: text, numbers, images, videos and speech.

Database System Applications

1. Banking: all transactions

DBMS contains information about a particular enterprise

1. Collection of interrelated data

2. Set of programs to access the data

Why Use a DBMS?

• Reduced application development time.

Purpose of Database Systems

• Data redundancy and inconsistency.

Architecture for a database system:

Applications depend on the logical schema

Physical schema: Relations stored as unordered files.

Applications insulated from how data is structured and stored.

Logical data independence: Protection from changes in logical structure of data.

Physical data independence: Protection from changes in physical structure of data.

Overall System Structure

• A transaction is a collection of operations that performs a single logical function in a

• Transaction-management component ensures that the database remains in a consistent

Data storage and Querying:

The storage manager is responsible to the following tasks:

– Interaction with the file manager

1. Parsing and translation

• Alternative ways of evaluating a given query

Database Users and Administrators:

• Sophisticated users – form requests in a database query language

• Coordinates all the activities of the database system

– has a good understanding of the enterprise’s information resources and needs.

Database administrator's duties include:

– Storage structure and access method definition

– Schema and physical organization modification

– Granting users authority to access the database

– Monitoring performance and responding to changes

History of Database Systems:

• 1950s and early 1960s:

• Tapes provide only sequential access

– Punched cards for input

• Late 1960s and 1970s:

– Hard disks allow direct access to data

– Network and hierarchical data models in widespread use

– Ted Codd defines the relational data model

• Would win the ACM Turing Award for this work

• IBM Research begins System R prototype

• UC Berkeley begins Ingres prototype

– High-performance (for the era) transaction processing

– Research relational prototypes evolve into commercial systems

• SQL becomes industry standard

– Parallel and distributed database systems

– Object-oriented database systems

– Large decision support and data-mining applications

– Large multi-terabyte data warehouses

– Emergence of Web commerce

– XML and XQuery standards

– Automated database administration

– Increasing use of highly parallel database systems

Introduction to Database Design:

• Conceptual design: (ER Model is used at this stage.)

– What are the entities and relationships in the enterprise?

– What are the integrity constraints or business rules that hold?

– A database `schema’ in the ER Model can be represented pictorially

– Can map an ER diagram into a relational schema.

Entities and Entity Sets:

Example: specific person, company, event, plant

• Entities have attributes

Example: people have names and addresses

Example: set of all persons, companies, trees, holidays

• An entity is represented by a set of attributes, that is descriptive properties possessed

• Domain – the set of permitted values for each attribute

– Simple and composite attributes.