0% found this document useful (0 votes)
213 views

RDBMS Lesson Notes

1. The document discusses relational database management systems (RDBMS) and provides an overview of database concepts and components. 2. It describes the advantages of using a DBMS over a file system approach for data management, including data independence, concurrency control, and recovery services. 3. The history of database systems is reviewed, from early hierarchical and network models to the current dominance of the relational model standardized with SQL.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
213 views

RDBMS Lesson Notes

1. The document discusses relational database management systems (RDBMS) and provides an overview of database concepts and components. 2. It describes the advantages of using a DBMS over a file system approach for data management, including data independence, concurrency control, and recovery services. 3. The history of database systems is reviewed, from early hierarchical and network models to the current dominance of the relational model standardized with SQL.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 192

RELATIONAL DATABASE MANAGEMENT SYSTEMS

UNIT: 1 DATABASE SYSTEM

1. Overview of database systems: Managing data

2. A historical perspective

3. File systems versus a DBMS

4. Advantages of a DBMS

5. Describing and Storing Data in a DBMS

6. Queries in a DBMS

7. Transaction management

8. Structure of a DBMS.

9. Database design & ER diagrams

10. Entities, Attributes, and Entity Sets

11. Relationships and Relationship Sets

12. Additional feature of the ER model

13. Conceptual Database design with the ER model.


1.1 Overview of database systems: Managing data

Database is a collection of related data and data is a collection of facts and figures that can be processed to produce information.

A Database is a collection of related data organised in a way that data can be easily accessed, managed and updated. Any piece of

information can be a data, for example name of your school. Database is actually a place where related piece of information is

stored and various operations can be performed on it.

A DBMS is software that allows creation, definition and manipulation of database. Dbms is actually a tool used to perform any

kind of operation on data in database. Dbms also provides protection and security to database. It maintains data consistency in

case of multiple users. Here are some examples of popular dbms, MySql, Oracle, Sybase, Microsoft Access and IBM DB2 etc.
The database system can be divided into four components.

 Users: Users may be of various types such as DB administrator, System developer and End users.

 Database application : Database application may be Personal, Departmental, Enterprise and Internal

 DBMS: Software that allow users to define, create and manages database access, Ex: MySql, Oracle etc.

 Database: Collection of logical data.

Functions of DBMS

 Provides data Independence

 Concurrency Control

 Provides Recovery services

 Provides Utility services

 Provides a clear and logical view of the process that manipulates data.

Advantages of DBMS

 Segregation of application program.

 Minimal data duplicacy.

 Easy retrieval of data.

 Reduced development time and maintenance need.


1.Database Design: How can a user describe a real-world enterprise (e.g., a university) in terms of the data stored in a DBMS?

What factors must be considered

in deciding how to organize the stored data?

2. Data Analysis: How can a user answer questions about the enterprise by posing

queries over the data in the DBMS?

3. Concurrency and Robustness: How does a DBMS allow many users to access

data concurrently, and how does it protect the data in the event of system failures?

4. Efficiency and Scalability: How does a DBMS store large datasets and answer

questions against this data efficiently?

1.2 A HISTORICAL PERSPECTIVE

The first general-purpose DBMS was designed by Charles Bachman at General Electric in the early 1960s and was called

the Integrated Data Store. It formed the basis for the network data model, which was standardized by the Conference on Data

Systems Languages (CODASYL) and strongly influenced database systems through the 1960s.

In the late 1960s, IBM developed the Information Management System (IMS) DBMS, used even today in many major

installations. IMS formed the basis for an alternative data representation framework called the hierarchical data model.

In 1960, Charles W. Bachman designed the integrated database system, the “first” DBMS. IBM, not wanting to be left out,

created a database system of its own, known as IMS. 


In 1970, Edgar Codd, at IBM's San Jose Research Laboratory, proposed a new data representation framework called the

relational data model.

In the 1980s, the relational model consolidated its position as the dominant DBMS paradigm, and database systems

continued to gain widespread use. The SQL query language for relational databases, developed as part of IBM’s System R project,

is now the standard query language.

SQL was standardized in the late 1980s, and the current standard, SQL-92, was adopted by the American National

Standards Institute (ANSI) and International Standards Organization (ISO).

In the late 1980s and the 1990s, advances have been made in many areas of database systems. Considerable research has

been carried out into more powerful query languages and richer data models.

DBMSs have entered the Internet Age. While the first generation of Web sites stored their data exclusively in operating

systems, the use of a DBMS to store data that is accessed through a Web browser is becoming widespread. Queries are generated

through Web-accessible forms and answers are formatted using a markup language such as HTML, in order to be easily displayed

in a browser

1.3 FILE SYSTEMS VERSUS A DBMS

We can try to deal with this data management problem by storing the data in a collection of operating system files. This approach

has many drawbacks, including the following:

 We probably do not have 500 GB of main memory to hold all the data. We must therefore store data in a storage device such

as a disk or tape and bring relevant parts into main memory for processing as needed.
 Even if we have 500 GB of main memory, on computer systems with 32-bit addressing, we cannot refer directly to more than

about 4 GB of data! We have to program some method of identifying all data items.

 We have to write special programs to answer each question that users may want to ask about the data. These programs are

likely to be complex because of the large volume of data to be searched.

 We must protect the data from inconsistent changes made by different users accessing the data concurrently. If programs

that access the data are written with such concurrent access in mind, this adds greatly to their complexity.

 We must ensure that data is restored to a consistent state if the system crashes while changes are being made

Basis DBMS Approach File System Approach

Meaning DBMS is a collection of data. In DBMS, The file system is a collection of data. In

the user is not required to write the this system, the user has to write the

procedures. procedures for managing the database.

Sharing of data Due to the centralized approach, data Data is distributed in many files, and it

sharing is easy. may be of different formats, so it isn't

easy to share data.

Data Abstraction DBMS gives an abstract view of data that The file system provides the detail of the

hides the details. data representation and storage of data.


Security and DBMS provides a good protection It isn't easy to protect a file under the file

Protection mechanism. system.

Recovery DBMS provides a crash recovery The file system doesn't have a crash

Mechanism mechanism, i.e., DBMS protects the user mechanism, i.e., if the system crashes

from system failure. while entering some data, then the

content of the file will be lost.

Manipulation DBMS contains a wide variety of The file system can't efficiently store and

Techniques sophisticated techniques to store and retrieve the data.

retrieve the data.

Concurrency DBMS takes care of Concurrent access of In the File system, concurrent access has

Problems data using some form of locking. many problems like redirecting the file

while deleting some information or

updating some information.

Where to use Database approach used in large systems File system approach used in large

which interrelate many files. systems which interrelate many files.

Cost The database system is expensive to The file system approach is cheaper to
design. design.

Data Redundancy Due to the centralization of the database, In this, the files and application programs

and Inconsistency the problems of data redundancy and are created by different programmers so

inconsistency are controlled. that there exists a lot of duplication of

data which may lead to inconsistency.

Structure The database structure is complex to The file system approach has a simple

design. structure.

Data In this system, Data Independence exists, In the File system approach, there exists

Independence and it can be of two types. no Data Independence.

o Logical Data Independence

o Physical Data Independence

Data Models In the database approach, 3 types of data In the file system approach, there is no

models exist: concept of data models exists.

o Hierarchal data models

o Network data models


o Relational data models

Flexibility Changes are often a necessity to the The flexibility of the system is less as

content of the data stored in any system, compared to the DBMS approach.

and these changes are more easily with a

database approach.

Examples Oracle, SQL Server, Sybase etc. Cobol, C++ etc.

ADVANTAGES OF A DBMS

Using a DBMS to manage data has many advantages:

Data independence: Application programs should be as independent as possible from details of data representation and

storage. The DBMS can provide an abstract view of the data to insulate application code from such details.

Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This feature is

especially important if the data is stored on external storage devices.

Data integrity and security: If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on the

data. For example, before inserting salary information for an employee, the DBMS can check that the department budget is not

exceeded. Also, the DBMS can enforce access controls that govern what data is visible to different classes of users.
Data administration: When several users share the data, centralizing the administration of data can offer significant

improvements. Experienced professionals, who understand the nature of the data being managed, and how different groups of

users use it, can be responsible for organizing the data representation to minimize redundancy and for fine-tuning the storage of

the data to make retrieval efficient.

Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in such a manner that users can

think of the data as being accessed by only one user at a time. Further, the DBMS protects users from the effects of system

failures. Reduced application development time: Clearly, the DBMS supports many important functions that are common to many

applications accessing data stored in the DBMS.

1.4 DESCRIBING AND STORING DATA IN A DBMS

A data model is a collection of high-level data description constructs that hide many low-level storage details.

A semantic data model is a more abstract, high-level data model that makes it easier for a user to come up with a good initial

description of the data in an enterprise.

A database design in terms of a semantic model serves as a useful starting point and is subsequently translated into a

database design in terms of the data model the DBMS actually supports.

A widely used semantic data model called the entity-relationship (ER) model allows us to pictorially denote entities and the

relationships among them

1.4.1 The Relational Model

The central data description construct in this model is relation, which can be thought of as a set of records.
A description of data in terms of a data model is called a schema.

The schema for a relation specifies its name, the name of each field or attribute or column.

Example: student information in a university database may be stored in a relation with the following schema (with 5 fields):

Students(sid: string, name: string, login: string, age: integer, gpa: real)

An example instance of the Students relation:

Sid name login age gpa

53666 Jones jones@cs 18 3.4

53588 Smith smith@ee 18 3.2

Each row in the Students relation is a record that describes a student. Every row follows the schema of the Student

relation and schema can therefore be regarded as a template for describing a student.

We can make the description of a collection of students more precise by specifying integrity constraints, which are

conditions that the records in a relation must satisfy.

Other notable models: hierarchial model, network model, object-oriented model, and the object-relational model.
1.4.2 Levels of Abstraction in a DBMS

A data definition language (DDL) is used to define the external and conceptual schemas.

Information about conceptual, external, and physical schemas is stored in the system catalogs.Any given database

has exactly one conceptual schema and one physical schema because it has just one set of stored relations, but it may have

several external schemas, each tailored to a particular group of users.

Conceptual Schema

The conceptual schema (sometimes called the logical schema) describes the stored data in terms of the data model of the

DBMS. Relations contain information about entities and relationships

Physical Schema

The physical schema specifies additional storage detail, summarizes how the relations described in conceptual schema are

actually stored on secondary storage devices such as disks and tapes.


Decide what file organizations to use to store the relations, then create indexes to speed up data retrieval operations.

External Schema

External schemas allow data access to be customized and authorized at the level of individual user or groups of users.

Each external schema consists of a collection of views and relations from the conceptual schema.

A view is conceptually a relation, but the records in a view are not stored in the DBMS. The records are computed using a

definition for the view, in terms of relations stored in the DBMS.

The external schema design is guided by the end user requirements.

1.4.3 Data Independence

Data independence is achieved through the use of the three levels of data abstraction; in particular, the conceptual schema and

the external schema provide distinct benefits in this area.

Logical data Independence:

Users can be shielded from changes in the logical structure of the data, or changes in the choice of relations to be stored.

Example: Student_public, Student_private => create levels using views in external schema

Physical data independence:

The conceptual schema insulates users from changes in the physical storage of the data.
The conceptual schema hides details such as how the data is actually laid out on disk, the file structure, and the choice

of indexes.

1.5 QUERIES IN A DBMS

A DBMS provides a specialized language, called the query language, in which queries can be posed.

Relational calculus is a formal query language based on mathematical logic, and queries in this language have an intuitive, precise

meaning.

Relational algebra is another formal query language, based on a collection of operators for manipulating relations, which is

equivalent in power to the calculus

SQL

o SQL stands for Structured Query Language. It is used for storing and managing data in relational database management

system (RDMS).

o It is a standard language for Relational Database System. It enables a user to create, read, update and delete relational

databases and tables.

o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as their standard database language.

o SQL allows users to query the database in a number of ways, using English-like statements.

Rules:

SQL follows the following rules:


o Structure query language is not case sensitive. Generally, keywords of SQL are written in uppercase.

o Statements of SQL are dependent on text lines. We can use a single SQL statement on one or multiple text line.

o Using the SQL statements, you can perform most of the actions in a database.

o SQL depends on tuple relational calculus and relational algebra.

SQL process:

o When an SQL command is executing for any RDBMS, then the system figure out the best way to carry out the request and

the SQL engine determines that how to interpret the task.

o In the process, various components are included. These components can be optimization Engine, Query engine, Query

dispatcher, classic, etc.

o All the non-SQL queries are handled by the classic query engine, but SQL query engine won't handle logical files.

SQL Commands

o SQL commands are instructions. It is used to communicate with the database. It is also used to perform specific tasks,

functions, and queries of data.

o SQL can perform various tasks like create a table, add data to tables, drop the table, modify the table, set permission for

users.
Types of SQL Commands

There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.

1. Data Definition Language (DDL)

o DDL changes the structure of the table like creating a table, deleting a table, altering a table, etc.

o All the command of DDL are auto-committed that means it permanently save all the changes in the database.
Here are some commands that come under DDL:

o CREATE

o ALTER

o DROP

o TRUNCATE

a. CREATE It is used to create a new table in the database.

Syntax:

2. Data Manipulation Language

o DML commands are used to modify the database. It is responsible for all form of changes in the database.

o The command of DML is not auto-committed that means it can't permanently save all the changes in the database. They can

be rollback.

Here are some commands that come under DML:

o INSERT

o UPDATE

o DELETE
3. Data Control Language

DCL commands are used to grant and take back authority from any database user.

Here are some commands that come under DCL:

o Grant

o Revoke

4. Transaction Control Language

TCL commands can only use with DML commands like INSERT, DELETE and UPDATE only.

These operations are automatically committed in the database that's why they cannot be used while creating tables or dropping

them.

Here are some commands that come under TCL:

o COMMIT

o ROLLBACK

o SAVEPOINT
5.Data Query Language

DQL is used to fetch the data from the database.

It uses only one command:

o SELECT

a. SELECT: This is the same as the projection operation of relational algebra. It is used to select the attribute based on the

condition described by WHERE clause.

A DBMS enables users to create, modify, and query data through a data manipulation language (DML). Thus, the query language

is only one part of the DML, which also provides constructs to insert, delete, and modify data.

1.6 TRANSACTION MANAGEMENT

A transaction is any one execution of a user program in a DBMS. (Executing the same program several times will generate several

transactions.)

Concurrent Execution of Transaction

An important task of a DBMS is to schedule concurrent accesses to data so that each user can safely ignore the fact that others

are accessing the data concurrently

A locking protocol is a set of rules to be followed by each transaction (and enforced by the DBMS), in order to ensure that even

though actions of several transactions might be interleaved, the net effect is identical to executing all transactions in some serial

order.
A lock is a mechanism used to control access to database objects. Two kinds of locks are commonly supported by a DBMS:

shared locks on an object can be held by two different transactions at the same time, but an exclusive lock on an object ensures

that no other transactions hold any lock on this object

Incomplete Transactions and System Crashes

If the system crashes just after making the change in the database but before the change is recorded in the log, the DBMS would

be unable to detect and undo this change. This property is called Write-Ahead Log or WAL. To ensure this property, the DBMS

must be able to selectively force a page in memory to disk.

The time required to recover from a crash can be reduced by periodically forcing some information to disk; this periodic operation

is called a checkpoint.
1.7 STRUCTURE OF A DBMS

When a user issues a query, the parsed query is presented to a query optimizer, which uses information about how the data is

stored to produce an efficient execution plan for evaluating the query.


An execution plan is a blueprint for evaluating a query, and is usually represented as a tree of relational operators (with

annotations that contain additional detailed information about which access methods to use, etc.).

This layer includes a variety of software for supporting the concept of a file, which, in a DBMS, is a collection of pages or a

collection of records. This layer typically supports a heap file, or file of unordered pages, as well as indexes.

The files and access methods layer code sits on top of the buffer manager, which brings pages in from disk to main memory as

needed in response to read requests.

The lowest layer of the DBMS software deals with management of space on disk, where the data is stored. Higher layers allocate,

deallocate, read, and write pages through (routines provided by) this layer, called the disk space manager.

DBMS components associated with concurrency control and recovery include the transaction manager, which ensures that

transactions request and release locks according to a suitable locking protocol and schedules the execution transactions; the lock

manager, which keeps track of requests for locks and grants locks on database objects when they become available; and the

recovery manager, which is responsible for maintaining a log, and restoring the system to a consistent state after a crash.

1.8 ENTITY-RELATIONSHIP MODEL:

The entity-relationship (ER) data model allows us to describe the data involved in a real-world enterprise in terms of objects and

their relationships and is widely used to develop an initial database design

ER model

o ER model stands for an Entity-Relationship model. It is a high-level data model. This model is used to define the data

elements and relationship for a specified system.


o It develops a conceptual design for the database. It also develops a very simple and easy to design view of data.

o In ER modelling, the database structure is portrayed as a diagram called an entity-relationship diagram.

For example, suppose we design a school database. In this database, the student will be an entity with attributes like address,

name, id, age, etc. The address can be another entity with attributes like city, street name, pin code, etc and there will be a

relationship between them.


1.9 E-R Diagram

ER-Diagram is a visual representation of data that describes how data is related to each other.

Component of ER Diagram.

Symbols and Notations


1.10 ENTITIES, ATTRIBUTES, AND ENTITY SETS

It is often useful to identify a collection of similar entities. Such a collection is called an entity set.

An entity is described using a set of attributes. All entities in a given entity set have the same attributes
A key is a minimal set of attributes whose values uniquely identify an entity in the set. There could be more than one candidate

key; if so, we designate one of them as the primary key.

. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be taken as an entity.

Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't contain any key attribute of its own. The

weak entity is represented by a double rectangle.

Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.

Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a primary key. The key attribute is

represented by an ellipse with the text underlined.


Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued attribute. The double oval is used to

represent multivalued attribute.

For example, a student can have more than one phone number.
Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It can be represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute like Date of birth.

1.11 RELATIONSHIPS AND RELATIONSHIP SETS

A relationship is an association among two or more entities. For example, we may have the relationship. As with entities, we may

wish to collect a set of similar relationships into a relationship set.


A relationship can also have descriptive attributes. Descriptive attributes are used to record information about the relationship,

rather than about any one of the participating entities.

An instance of a relationship set is a set of relationships

One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known as one to one relationship.

For example, A female can marry to one male, and a male can marry to one female.

One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on the right associates with the

relationship then this is known as a one-to-many relationship.

For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.
Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on the right associates with the

relationship then it is known as a many-to-one relationship.

For example, Student enrolls for only one course, but a course can have many students.

Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an entity on the right associates with the

relationship then it is known as a many-to-many relationship.

For example, Employee can assign by many projects and project can have many employees.
1.13 CONCEPTUAL DATABASE DESIGN WITH THE ER MODEL

Developing an ER diagram presents several choices, including the following:

Should a concept be modeled as an entity or an attribute?

Should a concept be modeled as an entity or a relationship


UNIT – II

 Relational Model:

 Integrity constraints over relations

 Enforcing integrity constraints

 Querying relational data

 Logical database design : ER to Relational

 Introduction to Views

 Destroying / Altering Tables & Views.

 Relational Algebra and Calculus: Relational Algebra – Relational Calculus


UNIT – II

THE RELATIONAL MODEL

The relational model is very simple and elegant; a database is a collection of one or more relations, where each relation is a table

with rows and columns.

The main construct for representing data in the relational model is a relation. A relation consists of a relation schema and a

relation instance. The relation instance The Relational Model 53 is a table, and the relation schema describes the column heads

for the table.

We first describe the relation schema and then the relation instance. The schema specifies the relation’s name, the name of each

field (or column, or attribute), and the domain of each field. A domain is referred to in a relation schema by the domain name and

has a set of associated values

An instance of a relation is a set of tuples, also called records, in which each tuple has the same number of fields as the relation

schema. A relation instance can be thought of as a table in which each tuple is a row, and all rows have the same number of

fields.
A relational database is a collection of relations with distinct relation names. The relational database schema is the collection of

schemas for the relations in the database.

INTEGRITY CONSTRAINTS OVER RELATIONS

ntegrity Constraints

o Integrity constraints are a set of rules. It is used to maintain the quality of information.

o Integrity constraints ensure that the data insertion, updating, and other processes have to be performed in such a way that

data integrity is not affected.

o Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraint


ls

Domain constraints

o Domain constraints can be defined as the definition of a valid set of values for an attribute.

o The data type of domain includes string, character, integer, time, date, currency, etc. The value of the attribute must be

available in the corresponding domain.

Example:
Entity integrity constraints

o The entity integrity constraint states that primary key value can't be null.

o This is because the primary key value is used to identify individual rows in relation and if the primary key has a null value,

then we can't identify those rows.

o A table can contain a null value other than the primary key field.
Example:

Referential Integrity Constraints

o A referential integrity constraint is specified between two tables.

o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of Table 2, then every value of the

Foreign Key in Table 1 must be null or be available in Table 2.


Example:

Key constraints

o Keys are the entity set that is used to identify an entity within its entity set uniquely.

o An entity set can have multiple keys, but out of which one key will be the primary key. A primary key can contain a unique

and null value in the relational table.


Example:

ENFORCING INTEGRITY CONSTRAINT

ICs are specified when a relation is created and enforced when a relation is modified. The impact of domain, PRIMARY KEY, and

UNIQUE constraints is straightforward: if an insert, delete, or update command causes a violation, it is rejected.

The following insertion violates the primary key constraint because there is already a tuple with the sid 53688, and it will be

rejected by the DBMS:

INSERT INTO Students (sid, name, login, age, gpa) VALUES (53688, ‘Mike’, ‘mike@ee’, 17, 3.4)

The following insertion violates the constraint that the primary key cannot contain null:

INSERT INTO Students (sid, name, login, age, gpa) VALUES (null, ‘Mike’, ‘mike@ee’, 17, 3.4)

QUERYING RELATIONAL DATA

A relational database query (query, for short) is a question about the data, and the answer consists of a new relation containing

the result.

A query language is a specialized language for writing queries


SELECT * FROM Students S WHERE S.age < 18

The symbol * means that we retain all fields of selected tuples in the result.

To understand this query, think of S as a variable that takes on the value of each tuple in Students, one tuple after the other.

The condition S.age < 18 in the WHERE clause specifies that we want to select only tuples in which the age field has a value less

than 18.

LOGICAL DATABASE DESIGN: ER TO RELATIONAL

The ER model is convenient for representing an initial, high-level database design

Entity Sets to Tables An entity set is mapped to a relation in a straightforward way: Each attribute of the entity set becomes an

attribute of the table.

Note that we know both the domain of each attribute and the (primary) key of an entity set.

CREATE TABLE Employees ( ssn CHAR(11), name CHAR(30), number INTEGER, PRIMARY KEY (ssn) ).

A relationship set, like an entity set, is mapped to a relation in the relational model

To represent a relationship, we must be able to identify each participating entity and give values to the descriptive attributes of

the relationship.

Thus, the attributes of the relation include: The primary key attributes of each participating entity set, as foreign key fields. The

descriptive attributes of the relationship set.


Translating Relationship Sets with Key Constraints If a relationship set involves n entity sets and some m of them are linked via

arrows in the ER diagram, the key for any one of these m entity sets constitutes a key for the relation to which the relationship set

is mapped.
INTRODUCTION TO VIEWS A view is a table whose rows are not explicitly stored in the database but are computed as needed

from a view definition.

Consider the Students and Enrolled relations. Suppose that we are often interested in finding the names and student identifiers of

students who got a grade of B in some course, together with the cid for the course

CREATE VIEW B-Students (name, sid, course) AS SELECT S.sname, S.sid, E.cid FROM Students S, Enrolled E WHERE S.sid =

E.sid AND E.grade = ‘B

The view B-Students has three fields called name, sid, and course with the same domains as the fields sname and sid in Students

and cid in Enrolled. (If the optional arguments name, sid, and course are omitted from the CREATE VIEW statement, the column

names sname, sid, and cid are inherited.

Views, Data Independence, Security

The physical schema for a relational database describes how the relations in the conceptual schema are stored, in terms of the file

organizations and indexes used.


The conceptual schema is the collection of schemas of the relations stored in the database

While some relations in the conceptual schema can also be exposed to applications, i.e., be part of the external schema of the

database, additional relations in the external schema can be defined using the view mechanism.

The view mechanism thus provides the support for logical data independence in the relational model. That is, it can be used to

define relations in the external schema that mask changes in the conceptual schema of the database from applications

Updates on Views

The SQL-92 standard allows updates to be specified only on views that are defined on a single base table using just selection and

projection, with no use of aggregate operations. Such views are called updatable views.

DESTROYING/ALTERING TABLES AND VIEWS

We decide that we no longer need a base table and want to destroy it (i.e., delete all the rows and remove the table definition

information), we can use the DROP TABLE command.

ALTER TABLE modifies the structure of an existing table. To add a column called maiden-name to Students,

for example, we would use the following command:

ALTER TABLE Students ADD COLUMN maiden-name CHAR(10)

The definition of Students is modified to add this column, and all existing rows are padded with null values in this column.

ALTER TABLE can also be used to delete columns and to add or drop integrity constraints on a table;
RELATIONAL ALGEBRA

Relational algebra is a procedural query language. It gives a step by step process to obtain the result of the query. It uses

operators to perform queries.

Types of Relational operation

Select Operation:

o The select operation selects tuples that satisfy a given predicate.

o It is denoted by sigma (σ).

1. Notation:  σ p(r)  

Where:
σ is used for selection prediction

r is used for relation

p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These relational can use as relational

operators like =, ≠, ≥, <, >, ≤.

For example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300


Input:

1. σ BRANCH_NAME="perryride" (LOAN)  

Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

Project Operation:

o This operation shows the list of those attributes that we wish to appear in the result. Rest of the attributes are eliminated

from the table.

o It is denoted by ∏.

1. Notation: ∏ A1, A2, An (r)   

Where

A1, A2, A3 is used as an attribute name of relation r.


Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:

1. ∏ NAME, CITY (CUSTOMER)  

Output:

NAME CITY

Jones Harrison
Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

 Union Operation:

o Suppose there are two tuples R and S. The union operation contains all the tuples that are either in R or S or both in R & S.

o It eliminates the duplicate tuples. It is denoted by ∪.

1. Notation: R ∪ S   

A union operation must hold the following condition:

o R and S must have the attribute of the same number.

o Duplicate tuples are eliminated automatically.

Example:
DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23
Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)  

Output:

CUSTOMER_NAME

Johnson

Smith

Hayes
Turner

Jones

Lindsay

Jackson

Curry

Williams

Set Intersection:

o Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in both R & S.

o It is denoted by intersection ∩.

1. Notation: R ∩ S   

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)  
Output:

CUSTOMER_NAME

Smith

Jones

5. Set Difference:

o Suppose there are two tuples R and S. The set intersection operation contains all tuples that are in R but not in S.

o It is denoted by intersection minus (-).

1. Notation: R - S  

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)  

Output:

CUSTOMER_NAME

Jackson
Hayes

Willians

Curry

Cartesian product

o The Cartesian product is used to combine each row in one table with each row in the other table. It is also known as a cross

product.

o It is denoted by X.

1. Notation: E X D  

Example:

EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C
3 John B

DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal

Input:

1. EMPLOYEE X DEPARTMENT  

Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales
1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:

The rename operation is used to rename the output relation. It is denoted by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to STUDENT1.

1. ρ(STUDENT1, STUDENT)  

RELATIONAL CALCULUS
o Relational calculus is a non-procedural query language. In the non-procedural query language, the user is concerned with

the details of how to obtain the end results.

o The relational calculus tells what to do but never explains how to do.

Types of Relational calculus:

1. Tuple Relational Calculus (TRC)

o The tuple relational calculus is specified to select the tuples in a relation. In TRC, filtering variable uses the tuples of a

relation.
o The result of the relation can have one or more tuples.

Notation:

1. {T | P (T)}   or {T | Condition (T)}   

Where

T is the resulting tuples

P(T) is the condition used to fetch T.

For example:

1. { T.name | Author(T) AND T.article = 'database' }  

OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from Author who has written an

article on 'database'.

TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal Quantifiers (∀).

For example:

1. { R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}  

Output: This query will yield the same result as the previous one.
2. Domain Relational Calculus (DRC)

o The second form of relation is known as Domain relational calculus. In domain relational calculus, filtering variable uses the

domain of attributes.

o Domain relational calculus uses the same operators as tuple calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not).

o It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.

Notation:

1. { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}  

Where

a1, a2 are attributes

P stands for formula built by inner attributes

For example:

1. {< article, page, subject > |  ∈ javatpoint ∧ subject = 'database'}  

Output: This query will yield the article, page, and subject from the relational javatpoint, where the subject is a database.
UNIT- III

 SQL: Queries, Programming, Triggers: The form of a basic SQL Query

 UNION, INTERSECT and EXCEPT

 Nested Queries – Aggregate operators

 Null values –Complex integrity constraints in SQL

 Triggers & Active data bases.

 Transaction Management-Overview: The ACID Properties

 Transactions & Schedules

 Concurrent execution of Transactions

 Lock-based concurrency control

 Performance of Locking

 Transaction support in SQL.


Unit-3

SQL Commands

o SQL commands are instructions. It is used to communicate with the database. It is also used

to perform specific tasks, functions, and queries of data.

o SQL can perform various tasks like create a table, add data to tables, drop the table, modify

the table, set permission for users.

Types of SQL Commands

There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.
1. Data Definition Language (DDL)

o DDL changes the structure of the table like creating a table, deleting a table, altering a table,

etc.

o All the command of DDL are auto-committed that means it permanently save all the changes

in the database.
Here are some commands that come under DDL:

o CREATE

o ALTER

o DROP

o TRUNCATE

a. CREATE It is used to create a new table in the database.

Syntax:

1. CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);

Example:

1. CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100), DOB DATE);

b. DROP: It is used to delete both the structure and record stored in the table.

Syntax

1. DROP TABLE table_name;

Example
1. DROP TABLE EMPLOYEE;

c. ALTER: It is used to alter the structure of the database. This change could be either to modify

the characteristics of an existing attribute or probably to add a new attribute.

Syntax:

To add a new column in the table

1. ALTER TABLE table_name ADD column_name COLUMN-definition;

To modify existing column in the table:

1. ALTER TABLE table_name MODIFY(column_definitions....);

EXAMPLE

1. ALTER TABLE STU_DETAILS ADD(ADDRESS VARCHAR2(20));

2. ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));

d. TRUNCATE: It is used to delete all the rows from the table and free the space containing the

table.

Syntax:

1. TRUNCATE TABLE table_name;
Example:

1. TRUNCATE TABLE EMPLOYEE;

2. Data Manipulation Language

o DML commands are used to modify the database. It is responsible for all form of changes in

the database.

o The command of DML is not auto-committed that means it can't permanently save all the

changes in the database. They can be rollback.

Here are some commands that come under DML:

o INSERT

o UPDATE

o DELETE

a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of a table.

Syntax:

1. INSERT INTO TABLE_NAME

2. (col1, col2, col3,.... col N)
3. VALUES (value1, value2, value3, .... valueN);

Or

1. INSERT INTO TABLE_NAME

2. VALUES (value1, value2, value3, .... valueN);

For example:

1. INSERT INTO javatpoint (Author, Subject) VALUES ("Sonoo", "DBMS");

b. UPDATE: This command is used to update or modify the value of a column in the table.

Syntax:

1. UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHERE CONDITIO

N]

For example:

1. UPDATE students

2. SET User_Name = 'Sonoo'

3. WHERE Student_Id = '3'

c. DELETE: It is used to remove one or more row from a table.


Syntax:

1. DELETE FROM table_name [WHERE condition];

For example:

1. DELETE FROM javatpoint

2. WHERE Author="Sonoo";

3. Data Control Language

DCL commands are used to grant and take back authority from any database user.

Here are some commands that come under DCL:

o Grant

o Revoke

a. Grant: It is used to give user access privileges to a database.

Example

1. GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_USER;

b. Revoke: It is used to take back permissions from the user.


Example

1. REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;

4. Transaction Control Language

TCL commands can only use with DML commands like INSERT, DELETE and UPDATE only.

These operations are automatically committed in the database that's why they cannot be used

while creating tables or dropping them.

Here are some commands that come under TCL:

o COMMIT

o ROLLBACK

o SAVEPOINT

a. Commit: Commit command is used to save all the transactions to the database.

Syntax:

1. COMMIT;

Example:
1. DELETE FROM CUSTOMERS

2. WHERE AGE = 25;

3. COMMIT;

b. Rollback: Rollback command is used to undo transactions that have not already been saved to

the database.

Syntax:

1. ROLLBACK;

Example:

1. DELETE FROM CUSTOMERS

2. WHERE AGE = 25;

3. ROLLBACK;

c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling back the

entire transaction.

Syntax:

1. SAVEPOINT SAVEPOINT_NAME;
5. Data Query Language

DQL is used to fetch the data from the database.

It uses only one command:

o SELECT

a. SELECT: This is the same as the projection operation of relational algebra. It is used to select

the attribute based on the condition described by WHERE clause.

Syntax:

1. SELECT expressions

2. FROM TABLES

3. WHERE conditions;

For example:

1. SELECT emp_name

2. FROM employee

3. WHERE age > 20;
Differences Between UNION EXCEPT and INTERSECT Operators in SQL Server

The set operators work on complete rows of the queries, so the results of the queries must have the same column name, same

column order and the types of columns must be compatible. There are the following 4 set operators in SQL Server:

1. UNION: Combine two or more result sets into a single set, without duplicates.

2. UNION ALL: Combine two or more result sets into a single set, including all duplicates.

3. INTERSECT: Takes the data from both result sets which are in common.

4. EXCEPT: Takes the data from the first result set, but not in the second result set (i.e. no matching to each other)

Rules on Set Operations:

1. The result sets of all queries must have the same number of columns.

2. In every result set the data type of each column must be compatible (well matched) to the data type of its corresponding

column in other result sets.

3. In order to sort the result, an ORDER BY clause should be part of the last select statement. The column names or aliases

must be found out by the first select statement.

Understand the differences between these operators with examples.

Use below SQL Script to create and populate the two tables that we are going to use in our examples.

CREATE TABLE TableA

ID INT,
Name VARCHAR(50),

Gender VARCHAR(10),

Department VARCHAR(50)

GO

INSERT INTO TableA VALUES(1, 'Pranaya', 'Male','IT')

INSERT INTO TableA VALUES(2, 'Priyanka', 'Female','IT')

INSERT INTO TableA VALUES(3, 'Preety', 'Female','HR')

INSERT INTO TableA VALUES(3, 'Preety', 'Female','HR')

GO

Fetch the records:

SELECT * FROM TableA

CREATE TABLE TableB

ID INT,
Name VARCHAR(50),

Gender VARCHAR(10),

Department VARCHAR(50)

GO

INSERT INTO TableB VALUES(2, 'Priyanka', 'Female','IT')

INSERT INTO TableB VALUES(3, 'Preety', 'Female','HR')

INSERT INTO TableB VALUES(4, 'Anurag', 'Male','IT')

GO

Fetch the records:

SELECT * FROM TableB

UNION Operator:

The Union operator will return all the unique rows from both the queries. Notice that the duplicates are removed from the result

set. 

SELECT ID, Name, Gender, Department FROM TableA

UNION

SELECT ID, Name, Gender, Department FROM TableB


Result: 

  

UNION ALL Operator:

The UNION ALL operator returns all the rows from both the queries, including the duplicates. 

SELECT ID, Name, Gender, Department FROM TableA

UNION ALL

SELECT ID, Name, Gender, Department FROM TableB

Result:

  

INTERSECT Operator:

The INTERSECT operator retrieves the common unique rows from both the left and the right query. Notice the duplicates are

removed. 

SELECT ID, Name, Gender, Department FROM TableA

INTERSECT

SELECT ID, Name, Gender, Department FROM TableB

Result:  

  

EXCEPT Operator:

The EXCEPT operator will return unique rows from the left query that aren’t present in the right query’s results. 
SELECT ID, Name, Gender, Department FROM TableA

EXCEPT

SELECT ID, Name, Gender, Department FROM TableB

Result: 

  

If you want the rows that are present in Table B but not in Table A, reverse the queries. 

SELECT ID, Name, Gender, Department FROM TableB

EXCEPT

SELECT ID, Name, Gender, Department FROM TableA

Result: 

  

Nested Query

A nested query is a query that has another query embedded within it. The embedded query is called a subquery.

A subquery typically appears within the WHERE clause of a query. It can sometimes appear in the FROM clause or HAVING

clause.
Example

Let’s learn about nested queries with the help of an example.

Find the names of employee who have regno=103

The query is as follows −

select E.ename from employee E where E.eid IN (select S.eid from salary S where S.regno=103);

SQL Aggregate Functions

o SQL aggregation function is used to perform the calculations on multiple rows of a single

column of a table. It returns a single value.

o It is also used to summarize the data.

Types of SQL Aggregation Function


1. COUNT FUNCTION

o COUNT function is used to Count the number of rows in a database table. It can work on

both numeric and non-numeric data types.

o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified table.

COUNT(*) considers duplicate and Null.

Syntax

1. COUNT(*)

2. or

3. COUNT( [ALL|DISTINCT] expression )
Sample table:

PRODUCT_MAST

PRODUCT COMPANY QTY RATE COST

Item1 Com1 2 10 20

Item2 Com2 3 25 75

Item3 Com1 2 30 60

Item4 Com3 5 10 50

Item5 Com2 2 20 40

Item6 Cpm1 3 25 75

Item7 Com1 5 30 150

Item8 Com1 3 10 30

Item9 Com2 2 25 50
Item10 Com3 4 30 120

Example: COUNT()

1. SELECT COUNT(*)

2. FROM PRODUCT_MAST;

Output:

10

Example: COUNT with WHERE

1. SELECT COUNT(*)

2. FROM PRODUCT_MAST;

3. WHERE RATE>=20;

Output:

Example: COUNT() with DISTINCT

1. SELECT COUNT(DISTINCT COMPANY)

2. FROM PRODUCT_MAST;
Output:

Example: COUNT() with GROUP BY

1. SELECT COMPANY, COUNT(*)

2. FROM PRODUCT_MAST

3. GROUP BY COMPANY;

Output:

Com1 5

Com2 3

Com3 2

Example: COUNT() with HAVING

1. SELECT COMPANY, COUNT(*)

2. FROM PRODUCT_MAST

3. GROUP BY COMPANY

4. HAVING COUNT(*)>2;

Output:
Com1 5

Com2 3

2. SUM Function

Sum function is used to calculate the sum of all selected columns. It works on numeric fields only.

Syntax

1. SUM()

2. or

3. SUM( [ALL|DISTINCT] expression )

Example: SUM()

1. SELECT SUM(COST)

2. FROM PRODUCT_MAST;

Output:

670

Example: SUM() with WHERE

1. SELECT SUM(COST)
2. FROM PRODUCT_MAST

3. WHERE QTY>3;

Output:

320

Example: SUM() with GROUP BY

1. SELECT SUM(COST)

2. FROM PRODUCT_MAST

3. WHERE QTY>3

4. GROUP BY COMPANY;

Output:

Com1 150

Com2 170

Example: SUM() with HAVING

1. SELECT COMPANY, SUM(COST)

2. FROM PRODUCT_MAST

3. GROUP BY COMPANY
4. HAVING SUM(COST)>=170;

Output:

Com1 335

Com3 170

3. AVG function

The AVG function is used to calculate the average value of the numeric type. AVG function returns

the average of all non-Null values.

Syntax

1. AVG()

2. or

3. AVG( [ALL|DISTINCT] expression )

Example:

1. SELECT AVG(COST)

2. FROM PRODUCT_MAST;

Output:
67.00

4. MAX Function

MAX function is used to find the maximum value of a certain column. This function determines the

largest value of all selected values of a column.

Syntax

1. MAX()

2. or

3. MAX( [ALL|DISTINCT] expression )

Example:

1. SELECT MAX(RATE)

2. FROM PRODUCT_MAST;

30

5. MIN Function

MIN function is used to find the minimum value of a certain column. This function determines the

smallest value of all selected values of a column.


Syntax

1. MIN()

2. or

3. MIN( [ALL|DISTINCT] expression )

Example:

1. SELECT MIN(RATE)

2. FROM PRODUCT_MAST;

Output:

10

Trigger

A trigger is a procedure which is automatically invoked by the DBMS in response to changes to the database, and is specified by

the database administrator (DBA). A database with a set of associated triggers is generally called an active database.

Parts of trigger
A triggers description contains three parts, which are as follows −

 Event − An event is a change to the database which activates the trigger.

 Condition − A query that is run when the trigger is activated is called as a condition.

 Action −A procedure which is executed when the trigger is activated and its condition is true.

Use of trigger

Triggers may be used for any of the following reasons −

 To implement any complex business rule, that cannot be implemented using integrity constraints.

 Triggers will be used to audit the process. For example, to keep track of changes made to a table.

 Trigger is used to perform automatic action when another concerned action takes place.

Types of triggers

The different types of triggers are explained below −

 Statement level trigger − It is fired only once for DML statement irrespective of number of rows affected by statement.

Statement-level triggers are the default type of trigger.

 Before-triggers − At the time of defining a trigger we can specify whether the trigger is to be fired before a command like

INSERT, DELETE, or UPDATE is executed or after the command is executed. Before triggers are automatically used to check
the validity of data before the action is performed. For instance, we can use before trigger to prevent deletion of rows if

deletion should not be allowed in a given case.

 After-triggers − It is used after the triggering action is completed. For example, if the trigger is associated with the INSERT

command then it is fired after the row is inserted into the table.

 Row-level triggers − It is fired for each row that is affected by DML command. For example, if an UPDATE command

updates 150 rows then a row-level trigger is fired 150 times whereas a statement-level trigger is fired only for once.

Create database trigger

To create a database trigger, we use the CREATE TRIGGER command. The details to be given at the time of creating a trigger are

as follows −

 Name of the trigger.

 Table to be associated with.

 When trigger is to be fired: before or after.

 Command that invokes the trigger- UPDATE, DELETE, or INSERT.

 Whether row-level triggers or not.

 Condition to filter rows.

 PL/SQL block is to be executed when trigger is fired.

The syntax to create database trigger is as follows −


CREATE [OR REPLACE] TRIGGER triggername

{BEFORE|AFTER}

{DELETE|INSERT|UPDATE[OF COLUMNS]} ON table

[FOR EACH ROW {WHEN condition]]

[REFERENCE [OLD AS old] [NEW AS new]]

BEGIN

PL/SQL BLOCK

END.
ACID Properties in DBMS

DBMS is the management of data that should remain integrated when any changes are done in it. It is because if the integrity of

the data is affected, whole data will get disturbed and corrupted. Therefore, to maintain the integrity of the data, there are four

properties described in the database management system, which are known as the ACID properties. The ACID properties are

meant for the transaction that goes through a different group of tasks, and there we come to see the role of the ACID properties.

In this section, we will learn and understand about the ACID properties. We will learn what these properties stand for and what

does each property is used for. We will also understand the ACID properties with the help of some examples.

ACID Properties

The expansion of the term ACID defines for:


1) Atomicity: The term atomicity defines that the data remains atomic. It means if any operation is performed on the data, either

it should be performed or executed completely or should not be executed at all. It further means that the operation should not

break in between or execute partially. In the case of executing operations on the transaction, the operation should be completely

executed and not partially.

Example: If Remo has account A having $30 in his account from which he wishes to send $10 to Sheero's account, which is B. In

account B, a sum of $ 100 is already present. When $10 will be transferred to account B, the sum will become $110. Now, there

will be two operations that will take place. One is the amount of $10 that Remo wants to transfer will be debited from his account

A, and the same amount will get credited to account B, i.e., into Sheero's account. Now, what happens - the first operation of debit

executes successfully, but the credit operation, however, fails. Thus, in Remo's account A, the value becomes $20, and to that of

Sheero's account, it remains $100 as it was previously present.


In the above diagram, it can be seen that after crediting $10, the amount is still $100 in account B. So, it is not an atomic

transaction.

The below image shows that both debit and credit operations are done successfully. Thus the transaction is atomic.
Consistency: The word consistency means that the value should remain preserved always. In DBMS, the integrity of the data

should be maintained, which means if a change in the database is made, it should remain preserved always. In the case of

transactions, the integrity of the data is very essential so that the database remains consistent before and after the transaction.

The data should always be correct.

Example:
 Isolation: The term 'isolation' means separation. In DBMS, Isolation is the property of a database where no data should affect the

other one and may occur concurrently. In short, the operation on one database should begin when the operation on the first

database gets complete. It means if two operations are being performed on two different databases, they may not affect the value of

one another. In the case of transactions, when two or more transactions occur simultaneously, the consistency should remain

maintained. Any changes that occur in any particular transaction will not be seen by other transactions until the change is not

committed in the memory.


Durability: Durability ensures the permanency of something. In DBMS, the term durability ensures that the data after the

successful execution of the operation becomes permanent in the database. The durability of the data should be so perfect that

even if the system fails or leads to a crash, the database still survives. However, if gets lost, it becomes the responsibility of the

recovery manager for ensuring the durability of the database. For committing the values, the COMMIT command must be used

every time we make changes.

Therefore, the ACID property of DBMS plays a vital role in maintaining the consistency and availability of data in the database.
Transaction Management in DBMS

A transaction is a set of logically related operations. For example, you are transferring money from your bank account to your

friend’s account, the set of operations would be like this:

Simple Transaction Example

1. Read your account balance

2. Deduct the amount from your balance

3. Write the remaining balance to your account

4. Read your friend’s account balance

5. Add the amount to his account balance

6. Write the new updated balance to his account

This whole set of operations can be called a transaction. Although I have shown you read, write and update operations in the

above example but the transaction can have operations like read, write, insert, update, delete.

In DBMS, we write the above 6 steps transaction like this:

Lets say your account is A and your friend’s account is B, you are transferring 10000 from A to B, the steps of the transaction are:

1. R(A);

2. A = A - 10000;

3. W(A);

4. R(B);
5. B = B + 10000;

6. W(B);

In the above transaction R refers to the Read operation and W refers to the write operation.

Transaction failure in between the operations

Now that we understand what is transaction, we should understand what are the problems associated with it.

The main problem that can happen during a transaction is that the transaction can fail before finishing the all the operations in

the set. This can happen due to power failure, system crash etc. This is a serious problem that can leave database in an

inconsistent state. Assume that transaction fail after third operation (see the example above) then the amount would be deducted

from your account but your friend will not receive it.

To solve this problem, we have the following two operations

Commit: If all the operations in a transaction are completed successfully then commit those changes to the database

permanently.

Rollback: If any of the operation fails then rollback all the changes done by previous operations.

We know that transactions are set of instructions and these instructions perform operations on database. When multiple

transactions are running concurrently then there needs to be a sequence in which the operations are performed because at a time

only one operation can be performed on the database. This sequence of operations is known as Schedule.
Types of Schedules in DBMS

We have various types of schedules in DBMS. Lets discuss them one by one.

Serial Schedule

In Serial schedule, a transaction is executed completely before starting the execution of another transaction. In other words, you

can say that in serial schedule, a transaction does not start execution until the currently running transaction finished execution.

This type of execution of transaction is also known as non-interleaved execution.


Serial Schedule example

Here R refers to the read operation and W refers to the write operation. In this example, the transaction T2 does not start

execution until the transaction T1 is finished.

T1 T2

---- ----

R(A)

R(B)

W(A)

commit

R(B)

R(A)

W(B)

commit

Strict Schedule

In Strict schedule, if the write operation of a transaction precedes a conflicting operation (Read or Write operation) of another

transaction then the commit or abort operation of such transaction should also precede the conflicting operation of other

transaction.

Lets take an example.


Strict Schedule example

Lets say we have two transactions Ta and Tb. The write operation of transaction Ta precedes the read or write operation of

transaction Tb, so the commit or abort operation of transaction Ta should also precede the read or write of Tb.

Ta Tb

----- -----

R(X)

R(X)

W(X)

commit

W(X)

R(X)

commit

Concurrency Control in DBMS

When more than one transactions are running simultaneously there are chances of a conflict to occur which can leave database to

an inconsistent state. 

Solution of Conflicts: Locks

Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate lock on it. There are two types of

lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the data item can only read by the transaction.

o It can be shared between the transactions because when the transaction holds a lock, then it can't update the data on the

data item.

2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as well as written by the transaction.

o This lock is exclusive, and in this lock, multiple transactions do not modify the same data simultaneously.

There are four types of lock protocols available:

1. Simplistic lock protocol

It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow all the transactions to get the

lock on the data before insert or delete or update on it. It will unlock the data item after completing the transaction.

2. Pre-claiming Lock Protocol

o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which they need locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock on all those data items.

o If all the locks are granted then this protocol allows the transaction to begin. When the transaction is completed then it

releases all the lock.

o If all the locks are not granted then this protocol allows the transaction to rolls back and waits until all the locks are

granted.

3. Two-phase locking (2PL)

o The two-phase locking protocol divides the execution phase of the transaction into three parts.

o In the first part, when the execution of the transaction starts, it seeks permission for the lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is started as soon as the transaction releases its

first lock.

o In the third phase, the transaction cannot demand any new locks. It only releases the acquired locks.

There are two phases of 2PL:

Growing phase: In the growing phase, a new lock on the data item may be acquired by the transaction, but none can be released.

Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released, but no new locks can be

acquired.
In the below example, if lock conversion is allowed then the following phase can happen:

1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.

2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.

Example:
The following way shows how unlocking and locking work with 2-PL.

Transaction T1:

o Growing phase: from step 1-3

o Shrinking phase: from step 5-7

o Lock point: at 3

Transaction T2:

o Growing phase: from step 2-6

o Shrinking phase: from step 8-9

o Lock point: at 6

4. Strict Two-phase locking (Strict-2PL)

o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks, the transaction continues to

execute normally.

o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after using it.

o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at a time.

o Strict-2PL protocol does not have shrinking phase of lock release.


Unit:4 NORMAL FORMS AND SECURITY

Schema Refinement and Normal forms: Introduction to Schema refinement – Functional dependencies – Reasoning about

functional dependencies – Normal forms –Properties of Decompositions – Normalization – Schema Refinement in data base

design – other kinds of dependencies. Security : Introduction to Database security -Access control – Discretionary Access control

– Mandatory Access control – Additional issues to security. Concurrency control : 2PL, serializability and Recoverability –

Introduction to Lock Management - Lock Conversions –Specialized Locking techniques - Concurrency control without locking.

Database security:
Unit – 4 :NORMAL FORMS AND SECURITY

Schema Refinement: The Schema Refinement refers to refine the schema by using some technique. The best technique of schema

refinement is decomposition.

Normalisation or Schema Refinement is a technique of organizing the data in the database. It is a systematic approach of

decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies.

Redundancy refers to repetition of same data or duplicate copies of same data stored in different locations.

Anomalies: Anomalies refers to the problems occurred after poorly planned and normalised databases where all the data is stored

in one table which is sometimes called a flat file database.

Anomalies or problems facing without normalization(problems due to redundancy) : Anomalies refers to the problems occurred

after poorly planned and unnormalised databases where all the data is stored in one table which is sometimes called a flat file

database.

Let us consider such type of schema – Here all the data is stored in a single table which causes redundancy of data or say

anomalies as SID and Sname are repeated once for same CID .

Let us discuss anomalies one by one. Due to redundancy of data we may get the following problems, those are-

1.insertion anomalies : It may not be possible to store some information unless some other information is stored as well.

2.redundant storage: some information is stored repeatedly

3.update anomalies: If one copy of redundant data is updated, then inconsistency is created unless all redundant copies of data

are updated.
4.deletion anomalies: It may not be possible to delete some information without losing some other information as well.

TO AVOID REDUNDANCY and problems due to redundancy, we use refinement technique called DECOMPOSITION.

Decomposition:- Process of decomposing a larger relation into smaller relations. Each of smaller relations contain subset of

attributes of original relation.

Functional dependencies: Functional dependency is a relationship that exist when one attribute uniquely determines another

attribute.

Functional dependency is a form of integrity constraint that can identify schema with redundant storage problems and to suggest

refinement.

A functional dependency AB in a relation holds true if two tuples having the same value of attribute A also have the same value of

attribute B IF t1.X=t2.X then t1.Y=t2.Y where t1,t2 are tuples and X,Y are attributes

What is Functional Dependency

Functional dependency in DBMS, as the name suggests is a relationship between attributes of a table dependent on each other.

Introduced by E. F. Codd, it helps in preventing data redundancy and gets to know about bad designs.

To understand the concept thoroughly, let us consider P is a relation with attributes A and B. Functional Dependency is

represented by -> (arrow sign)

Then the following will represent the functional dependency between attributes with an arrow sign −

A -> B
Example

The following is an example that would make it easier to understand functional dependency −

We have a <Department> table with two attributes − DeptId and DeptName.

DeptId = Department ID

DeptName = Department Name

The DeptId is our primary key. Here, DeptId uniquely identifies the DeptName attribute. This is because if you want to know the

department name, then at first you need to have the DeptId.

DeptId DeptName
001 Finance

002 Marketing

003 HR

Therefore, the above functional dependency between DeptId and DeptName can be determined as DeptId is functionally

dependent on DeptName −

DeptId -> DeptName


Types of Functional dependency

1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.

o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.  

2. {Employee_id, Employee_Name}   →    Employee_Id is a trivial functional dependency as   

3. Employee_Id is a subset of {Employee_Id, Employee_Name}.  
4. Also, Employee_Id → Employee_Id and Employee_Name   →    Employee_Name are trivial dependencies too.  

2. Non-trivial functional dependency

o A → B has a non-trivial functional dependency if B is not a subset of A.

o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID   →    Name,  

2. Name   →    DOB  

Normalization

A large database defined as a single relation may result in data duplication. This repetition of data may result in:

o Making relations very large.

o It isn't easy to maintain and update data as it would involve searching many records in relation.

o Wastage and poor utilization of disk space and resources.

o The likelihood of errors and inconsistencies increases.

What is Normalization?

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirable

characteristics like Insertion, Update, and Deletion Anomalies.

o Normalization divides the larger table into smaller and links them using relationships.

o The normal form is used to reduce redundancy from the database table.

Data modification anomalies can be categorized into three types:

o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a relationship due to lack of data.

o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results in the unintended loss of

some other important data.

o Updatation Anomaly: The update anomaly is when an update of a single data value requires multiple rows of data to be

updated.

Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal forms apply to individual relations. The relation is

said to be in particular normal form if it satisfies constraints.


Normal Description

Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional

dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.


4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued

dependency.

Advantages of Normalization

o Normalization helps to minimize data redundancy.

o Greater overall database organization.

o Data consistency within the database.

o Much more flexible database design.

o Enforces the concept of relational integrity.

First Normal Form (1NF)

o A relation will be 1NF if it contains an atomic value.

o It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.

o First normal form disallows the multi-valued attribute, composite attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385, UP

9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab

8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.

o In the second normal form, all non-key attributes are fully functional dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more

than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key.

That's why it violates the rule for 2NF.


To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math
83 Computer

Third Normal Form (3NF)

o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.

o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.

o If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X → Y.

1. X is a super key.

2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston


444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on  

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes

(EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary

key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal
Boyce Codd normal form (BCNF)

o BCNF is the advance version of 3NF. It is stricter than 3NF.

o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.

o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

1. EMP_ID  →  EMP_COUNTRY  
2. EMP_DEPT  →   {DEPT_TYPE, EMP_DEPT_NO}  

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283


Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

1. EMP_ID   →    EMP_COUNTRY  

2. EMP_DEPT   →   {DEPT_TYPE, EMP_DEPT_NO}  

Candidate keys:
For the first table: EMP_ID

For the second table: EMP_DEPT

For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Security : Introduction to Database security

Database security:

Database security is the technique that protects and secures the database against intentional or accidental threats.

Security concerns will be relevant not only to the data resides in an organization's database: the breaking of security may harm

other parts of the system, which may ultimately affect the database structure.

Consequently, database security includes hardware parts, software parts, human resources, and data. To efficiently do the uses of

security needs appropriate controls, which are distinct in a specific mission and purpose for the system.
The requirement for getting proper security while often having been neglected or overlooked in the past days; is now more and

more thoroughly checked by the different organizations.

We consider database security about the following situations:

 Theft and fraudulent.

 Loss of confidentiality or secrecy.

 Loss of data privacy.

 Loss of data integrity.

 Loss of availability of data.

These listed circumstances mostly signify the areas in which the organization should focus on reducing the risk that is the chance

of incurring loss or damage to data within a database.

In some conditions, these areas are directly related such that an activity that leads to a loss in one area may also lead to a loss in

another since all of the data within an organization are interconnected.

What is a Threat?

Any situation or event, whether intentionally or incidentally, can cause damage, which can reflect an adverse effect on the

database structure and, consequently, the organization.


A threat may occur by a situation or event involving a person or the action or situations that are probably to bring harm

to an organization and its database.

The degree that an organization undergoes as a result of a threat's following which depends upon some aspects, such as the

existence of countermeasures and contingency plans.

Let us take an example where you have a hardware failure that occurs corrupting secondary storage; all processing activity must

cease until the problem is resolved.

Computer-Based Controls

The different forms of countermeasure to threats on computer systems range from physical controls to managerial procedures. In

spite of the range of computer-based controls that are preexisting, it is worth noting that, usually, the security of a DBMS is

merely as good as that of the operating system, due to the close association among them.

Most of the computer-based database security are listed below:

 Access authorization.

 Access controls.

 Views.

 Backup and recovery of data.

 Data integrity.

 Encryption of data.

 RAID technology.
What is Access Controls?

The usual way of supplying access controls to a database system is dependent on the granting and revoking of privileges within

the database.

A privilege allows a user to create or access some database object or to run some specific DBMS utilities.

Privileges are granted users to achieve the tasks required for those jobs.

The database provides various types of access controls:

 Discretionary Access Control (DAC)

 Mandatory Access Control (MAC)

Discretionary access control 

(DAC) is an identity-based access control model that provides users a certain amount of control over their data. Data owners (or

any users authorized to control data) can define access permissions for specific users or groups of users. 

What is mandatory access control?

 Mandatory access control (MAC) is a model of access control where the operating system provides users with access based on

data confidentiality and user clearance levels. In this model, access is granted on a need to know basis: users have to prove a need

for information before gaining access


Backup and Recovery

Every Database Management System should offer backup facilities to help with the recovery of a database after a failure. It is

always suitable to make backup copies of the database and log files at the regular period and for ensuring that the copies are in a

secure location. In the event of a failure that renders the database unusable, the backup copy and the details captured in the log

file are used to restore the database to the latest possible consistent state.

Why is access control important?

 Access control regulates which users, applications, and devices can view, edit, add, and delete resources in an organization’s

environment. Controlling access is one of the key practices to protect sensitive data from theft, misuse, abuse, and any other

threats. There are two levels of access control: physical and logical.

Concurrency Control in Database Management System is a procedure of managing simultaneous operations without conflicting

with each other. It ensures that Database transactions are performed concurrently and accurately to produce correct results

without violating data integrity of the respective Database.

Concurrent access is quite easy if all users are just reading data. There is no way they can interfere with one another. Though for

any practical Database, it would have a mix of READ and WRITE operations and hence the concurrency is a challenge.

DBMS Concurrency Control is used to address such conflicts, which mostly occur with a multi-user system. Therefore,

Concurrency Control is the most important element for proper functioning of a Database Management System where two

or more database transactions are executed simultaneously, which require access to the same data.
Two Phase Locking Protocol

Two Phase Locking Protocol also known as 2PL protocol is a method of concurrency control in DBMS that ensures serializability

by applying a lock to the transaction data which blocks other transactions to access the same data simultaneously. Two Phase

Locking protocol helps to eliminate the concurrency problem in DBMS.

This locking protocol divides the execution phase of a transaction into three different parts.

 In the first phase, when the transaction begins to execute, it requires permission for the locks it needs.

 The second part is where the transaction obtains all the locks. When a transaction releases its first lock, the third phase

starts.

 In this third phase, the transaction cannot demand any new locks. Instead, it only releases the acquired locks.
Serializability

Serializability is the classical concurrency scheme. It ensures that a schedule for executing concurrent transactions is equivalent

to one that executes the transactions serially in some order. It assumes that all accesses to the database are done using read and

write operations. A schedule is called ``correct'' if we can find a serial schedule that is ``equivalent'' to it. Given a set of

transactions T1...Tn, two schedules S1 and S2 of these transactions are equivalent if the following conditions are satisfied:

Read-Write Synchronization: If a transaction reads a value written by another transaction in one schedule, then it also does so in

the other schedule.

Write-Write Synchronization: If a transaction overwrites the value of another transaction in one schedule, it also does so in the

other schedule.

These two properties ensure that there can be no difference in the effects of the two schedules. As an example, consider the

schedule in Figure 1.

It is equivalent to a schedule in which T2 is executed after T1.

Recoverability.

Recoverable Schedules-
 If in a schedule,

 A transaction performs a dirty read operation from an uncommitted transaction

 And its commit operation is delayed till the uncommitted transaction either commits or roll backs

then such a schedule is called as a Recoverable Schedule.

 Types of Recoverable Schedules-

 A recoverable schedule may be any one of these kinds-

1. Cascading Schedule

2. Cascadeless Schedule

3. Strict Schedule

Cascading Schedule-
 If in a schedule, failure of one transaction causes several other dependent transactions to rollback or abort, then such a schedule

is called as a Cascading Schedule or Cascading Rollback or Cascading Abort.

 It simply leads to the wastage of CPU time.

Here,

 Transaction T2 depends on transaction T1.

 Transaction T3 depends on transaction T2.

 Transaction T4 depends on transaction T3.

In this schedule,

 The failure of transaction T1 causes the transaction T2 to rollback.

 The rollback of transaction T2 causes the transaction T3 to rollback.

 The rollback of transaction T3 causes the transaction T4 to rollback.

Such a rollback is called as a Cascading Rollback.

 NOTE- 

If the transactions T2, T3 and T4 would have committed before the failure of transaction T1, then the schedule would have been

irrecoverable. 

Cascadeless Schedule-
 If in a schedule, a transaction is not allowed to read a data item until the last transaction that has written it is committed or

aborted, then such a schedule is called as a Cascadeless Schedule.

In other words,

 Cascadeless schedule allows only committed read operations.

 Therefore, it avoids cascading roll back and thus saves CPU time.

 Example-

 
NOTE-

 Cascadeless schedule allows only committed read operations.

 However, it allows uncommitted write operations.

 
Example-

 
Concurrency Control

Concurrency Control is the management procedure that is required for controlling concurrent execution of the operations that

take place on a database.


Concurrent Execution in DBMS

o In a multi-user system, multiple users can access and use the same database at one time, which is known as the

concurrent execution of the database. It means that the same database is executed simultaneously on a multi-user system by

different users.

Problems with Concurrent Execution

In a database transaction, the two main operations are READ and WRITE operations. So, there is a need to manage these two

operations in the concurrent execution of the transactions as if these operations are not performed in an interleaved manner, and

the data may become inconsistent.

Introduction to Lock Management

Lock-Based Protocol

In this type of protocol, any transaction cannot read or write data until it acquires an appropriate lock on it. There are two types of

lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the data item can only read by the transaction.

o It can be shared between the transactions because when the transaction holds a lock, then it can't update the data on the

data item.
2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as well as written by the transaction.

o This lock is exclusive, and in this lock, multiple transactions do not modify the same data simultaneously.

There are four types of lock protocols available:

1. Simplistic lock protocol

It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow all the transactions to get the

lock on the data before insert or delete or update on it. It will unlock the data item after completing the transaction.

2. Pre-claiming Lock Protocol

o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which they need locks.

o Before initiating an execution of the transaction, it requests DBMS for all the lock on all those data items.

o If all the locks are granted then this protocol allows the transaction to begin. When the transaction is completed then it

releases all the lock.

o If all the locks are not granted then this protocol allows the transaction to rolls back and waits until all the locks are

granted.

3.Two-phase locking (2PL)

o The two-phase locking protocol divides the execution phase of the transaction into three parts.
o In the first part, when the execution of the transaction starts, it seeks permission for the lock it requires.

o In the second part, the transaction acquires all the locks. The third phase is started as soon as the transaction releases its

first lock.

o In the third phase, the transaction cannot demand any new locks. It only releases the acquired lock

There are two phases of 2PL:

Growing phase: In the growing phase, a new lock on the data item may be acquired by the transaction, but none can be released.

Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released, but no new locks can be

acquired.
4.Strict Two-phase locking (Strict-2PL)

o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks, the transaction continues to

execute normally.

o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after using it.

o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at a time.

o Strict-2PL protocol does not have shrinking phase of lock release.

It does not have cascading abort as 2PL does.

Lock conversion

Changing the mode of a lock that is already held is called lock conversion.


Lock conversion occurs when a process accesses a data object on which it already holds a lock, and the access mode requires a

more restrictive lock than the one already held. A process can hold only one lock on a data object at any given time, although it

can request a lock on the same data object many times indirectly through a query.

Some lock modes apply only to tables, others only to rows, blocks, or data partitions. For rows or blocks, conversion usually

occurs if an X lock is needed and an S or U lock is held.

IX and S locks are special cases with regard to lock conversion. Neither is considered to be more restrictive than the other, so if

one of these locks is held and the other is required, the conversion results in a SIX (Share with Intent Exclusive) lock. All other

conversions result in the requested lock mode becoming the held lock mode if the requested mode is more restrictive.

A dual conversion might also occur when a query updates a row. If the row is read through index access and locked as S, the table

that contains the row has a covering intention lock. But if the lock type is IS instead of IX, and the row is subsequently changed,

the table lock is converted to an IX and the row lock is converted to an X.

Specialized Locking techniques


The concurrency control technique in which the value of the lock variable is manipulated is called locking. The technique of

locking is one way to ensure Serializability in DBMS.

In DBMS, locking is the responsibility of a subsystem called lock manager.

Types of Locking Techniques

Binary Locks

A binary lock has two states or values associated with each data item. These values are:

1. Locked – 1

2. Unlocked – 0
If a data item is locked, then it cannot be accessed by other transactions i.e., other transactions are forced to wait until the lock is

released by the previous transaction.

But, if a data item is in the unlocked state, then, it can be accessed by any transaction and on access the lock value is set to

locked state.

These locks are applied and removed using Lock () and Unlock () operation respectively.

In binary locks,at a particular point in time, only one transaction can hold a lock on the data item.Noother transaction will be able

to access the same data concurrently. Hence, Binary locks are very simple to apply but are not used practically.

Shared / Exclusive Locks

In shared locks, multiple users are allowed to access the same data item with a read lock which is shared by them. But, in case

when a transaction needs to write a data item, then an exclusive lock is applied on that data item. So here, we classify the

locks as:

 Shared Locks

 Exclusive Locks
Shared Locks

Shared locks are applied to a data item when the transaction requests a read operation on the data item. A shared lock

will allow multiple transactions to only read the data item concurrently.

As these locks are applied on read operation, they will not compromise on the consistency of the database.

Exclusive Locks

Exclusive locks on the other hand are applied on the transactions which request a write operation on the data item.

The transaction which is modifying the data item requests an exclusive lock on the data item and hence any other transaction

which needs access to the data item has to wait until the lock applied by the previous transaction has been released by it.

But when exclusive locks are applied there are situations when a transaction enters into a wait state indefinitely. Such a state

where a transaction cannot come out of the wait state is known as a deadlock.

Two Phase Locking

The Two Phase Locking Techniques guarantee Serializability in DBMS. A transaction is said to follow Two Phase Locking Protocol

if all locking operations in the transaction precede the first unlock operation.

In this, locks are applied in two phases:


 Growing Phase

 Shrinking Phase

Growing Phase

This phase is also known as the first phase or the expanding phase. It is in this phase that the transaction acquires all

the locks needed by it but it cannot release any locks here.

Shrinking Phase

This phase is also known as the second phase or the contracting phase. Here a transaction is not allowed to acquire any

new locks but it can release the existing locks it holds. The Two Phase Locking Protocol helps solve problems of lost update,

inconsistent analysis or dirty read too.


UNIT -V

Unit:5 DISTRIBUTED DATABASE 10Hours Parallel & Distributed databases: Introduction – Architecture for parallel

databases – Parallel Query evaluation – Parallelizing individual operations –Parallel Query Optimization – Introduction to

distributed Databases – Distributed DBMS architecture sorting data in a distributed DBMS. Object Database Systems:

Motivation Example – Structured data types – Operation on structured data types – Encapsulation & ADTS – Inheritance -

Objects, OIDS and Reference Types - Database design for and ORDBMS – OODBMS – Comparing RDBMS, OODBMS and

ORDBMS.
Unit – 5: DISTRIBUTED DATABASE

Parallel & Distributed databases:

Introduction of Parallel Databases:

Nowadays organizations need to handle a huge amount of data with a high transfer rate. For such requirements, the client-

server or centralized system is not efficient. With the need to improve the efficiency of the system, the concept of the parallel

database comes in picture. A parallel database system seeks to improve the performance of the system through parallelizing

concept.

Need:

Multiple resources like CPUs and Disks are used in parallel. The operations are performed simultaneously, as opposed to serial

processing. A parallel server can allow access to a single database by users on multiple machines. It also performs many

parallelization operations like data loading, query processing, building indexes, and evaluating queries.

Advantages:

Here, we will discuss the advantages of parallel databases. Let’s have a look.

1. Performance Improvement – 

By connecting multiple resources like CPU and disks in parallel we can significantly increase the performance of the

system.

 
2. High availability – 

In the parallel database, nodes have less contact with each other, so the failure of one node doesn’t cause for failure of the

entire system. This amounts to significantly higher database availability.

3. Proper resource utilization – 

Due to parallel execution, the CPU will never be ideal. Thus, proper utilization of resources is there.

4. Increase Reliability – 

When one site fails, the execution can continue with another available site which is having a copy of data. Making the

system more reliable.

Performance Measurement of Databases:

Here, we will emphasize the performance measurement factor-like Speedup and Scale-up. Let’s understand it one by one with

the help of examples.

Speedup –

The ability to execute the tasks in less time by increasing the number of resources is called Speedup.

Types of Parallel Database Architecture

A parallel DBMS is a DBMS that runs across multiple processors or CPUs and is mainly designed to execute query operations

in parallel, wherever possible. The parallel DBMS link a number of smaller machines to achieve the same throughput as

expected from a single large machine.


In Parallel Databases, mainly there are three architectural designs for parallel DBMS. They are as follows:

1. Shared Memory Architecture

2. Shared Disk Architecture

3. Shared Nothing Architecture

Let’s discuss them one by one:

1. Shared Memory Architecture- In Shared Memory Architecture, there are multiple CPUs that are attached to an

interconnection network. They are able to share a single or global main memory and common disk arrays. It is to be noted that,

In this architecture, a single copy of a multi-threaded operating system and multithreaded DBMS can support these multiple

CPUs. Also, the shared memory is a solid coupled architecture in which multiple CPUs share their memory. It is also known

as Symmetric multiprocessing (SMP). This architecture has a very wide range which starts from personal workstations that

support a few microprocessors in parallel via RISC.


Shared Memory Architecture

Advantages :

1. It has high-speed data access for a limited number of processors.

2. The communication is efficient.

Disadvantages :

1. It cannot use beyond 80 or 100 CPUs in parallel.

2. The bus or the interconnection network gets block due to the increment of the large number of CPUs.

2. Shared Disk Architectures :

In Shared Disk Architecture, various CPUs are attached to an interconnection network. In this, each CPU has its own memory
and all of them have access to the same disk. Also, note that here the memory is not shared among CPUs therefore each node

has its own copy of the operating system and DBMS. Shared disk architecture is a loosely coupled architecture optimized for

applications that are inherently centralized. They are also known as clusters.

Shared Disk Architecture

Advantages :

1. The interconnection network is no longer a bottleneck each CPU has its own memory.
2. Load-balancing is easier in shared disk architecture.

3. There is better fault tolerance.

Disadvantages :

1. If the number of CPUs increases, the problems of interference and memory contentions also increase.

2. There’s also exists a scalability problem.

3, Shared Nothing Architecture :

Shared Nothing Architecture is multiple processor architecture in which each processor has its own memory and disk storage.

In this, multiple CPUs are attached to an interconnection network through a node. Also, note that no two CPUs can access the

same disk area. In this architecture, no sharing of memory or disk resources is done. It is also known as  Massively parallel

processing (MPP).
Shared Nothing Architecture

Advantages :

1. It has better scalability as no sharing of resources is done

2. Multiple CPUs can be added

Disadvantages:

1. The cost of communications is higher as it involves sending of data and software interaction at both ends
2. The cost of non-local disk access is higher than the cost of shared disk architectures.

Note that this technology is typically used for very large databases that have the size of 10 12 bytes or TB or for the system that

has the process of thousands of transactions per second.

Parallel Query evaluation : Parallelism in a query allows us to parallel execution of multiple queries by decomposing them

into the parts that work in parallel. This can be achieved by shared-nothing architecture. Parallelism is also used in fastening

the process of a query execution as more and more resources like processors and disks are provided. We can achieve

parallelism in a query by the following methods:

1. I/O parallelism

2. Intra-query parallelism

3. Inter-query parallelism

4. Intra-operation parallelism

5. Inter-operation parallelism

1. I/O parallelism:

It is a form of parallelism in which the relations are partitioned on multiple disks a motive to reduce the retrieval time of

relations from the disk. Within, the data inputted is partitioned and then processing is done in parallel with each partition. The

results are merged after processing all the partitioned data. It is also known as data-partitioning.
Hash partitioning has the advantage that it provides an even distribution of data across the disks and it is also best suited for

those point queries that are based on the partitioning attribute. It is to be noted that partitioning is useful for the sequential

scans of the entire table placed on ‘n‘ number of disks and the time taken to scan the relationship is approximately 1/n of the

time required to scan the table on a single disk system. We have four types of partitioning in I/O parallelism:  

 Hash partitioning – 

As we already know, a Hash Function is a fast, mathematical function. Each row of the original relationship is hashed on

partitioning attributes.  For example, let’s assume that there are 4 disks disk1, disk2, disk3, and disk4 through which the

data is to be partitioned. Now if the Function returns 3, then the row is placed on  disk3.

 Range partitioning – 

In range partitioning, it issues continuous attribute value ranges to each disk.  For example, we have 3 disks numbered 0,

1, and 2 in range partitioning, and may assign relation with a value that is less than 5 to disk0, values between 5-40 to

disk1, and values that are greater than 40 to disk2.  It has some advantages, like it involves placing shuffles containing

attribute values that fall within a certain range on the disk. See figure 1: Range partitioning given below:
 Round-robin partitioning – 

In Round Robin partitioning, the relations are studied in any order.  The ith tuple is sent to the disk number(i % n). So,

disks take turns receiving new rows of data. This technique ensures the even distribution of tuples across disks and is

ideally suitable for applications that wish to read the entire relation sequentially for each query.

 Schema partitioning – 

In schema partitioning, different tables within a database are placed on different disks. See figure 2 below:

 
figure – 2

2. Intra-query parallelism : 

 Intra-query parallelism refers to the execution of a single query in a parallel process on different CPUs using a shared-nothing

paralleling architecture technique. This uses two types of approaches:

 First approach – 

In this approach, each CPU can execute the duplicate task against some data portion.

 Second approach –

In this approach, the task can be divided into different sectors with each CPU executing a distinct subtask.

3. Inter-query parallelism :

In Inter-query parallelism, there is an execution of multiple transactions by each CPU. It is called parallel transaction

processing. DBMS uses transaction dispatching to carry inter query parallelism. We can also use some different methods, like

efficient lock management. In this method, each query is run sequentially, which leads to slowing down the running of long

queries. In such cases, DBMS must understand the locks held by different transactions running on different processes. Inter

query parallelism on shared disk architecture performs best when transactions that execute in parallel do not accept the same

data. Also, it is the easiest form of parallelism in DBMS, and there is an increased transaction throughput.

4. Intra-operation parallelism :

Intra-operation parallelism is a sort of parallelism in which we parallelize the execution of each individual operation of a task

like sorting, joins, projections, and so on. The level of parallelism is very high in intra-operation parallelism. This type of

parallelism is natural in database systems. Let’s take an SQL query example:  


SELECT * FROM Vehicles ORDER BY Model_Number; 

In the above query, the relational operation is sorting and since a relation can have a large number of records in it, the

operation can be performed on different subsets of the relation in multiple processors, which reduces the time required to sort

the data.

5. Inter-operation parallelism :

When different operations in a query expression are executed in parallel, then it is called inter-operation parallelism. They are

of two types –

 Pipelined parallelism –

 In pipeline parallelism, the output row of one operation is consumed by the second operation even before the first operation

has produced the entire set of rows in its output. Also, it is possible to run these two operations simultaneously on different

CPUs, so that one operation consumes tuples in parallel with another operation, reducing them. It is useful for the small

number of CPUs and avoids writing of intermediate results to disk.

 Independent parallelism –

In this parallelism, the operations in query expressions that are not dependent on each other can be executed in parallel.

This parallelism is very useful in the case of the lower degree of

Parallelism.
Parallelizing individual operations

Bulk Loading and Scanning

The two simple operations: scanning a relation and loading a relation. Pages can be read in parallel while scanning a relation,

and the retrieved tuples can then be merged, if the relation is partitioned across several disks. More generally, the idea also

applies when retrieving all tuples that meet a selection condition. If hashing or range partitioning is used, selection queries can be

answered by going to just those processors that contain relevant tuples.

Sorting

A simple idea is to let each CPU sort the part of the relation that is on its local disk and to then merge these sorted sets of tuples.

The degree of parallelism is likely to be limited by the merging phase.

A better idea is to redistribute all tuples in the relation using range partitioning. For example, if we want to sort a collection of

employee tuples by salary, salary values range from 10 to 210, and we have 20 processors, we could send all tuples with salary

values in the range 10 to 20 to the processor, all in the range 21 to 30 to the second processor, and so on. (Prior to the

redistribution, while tuples are distributed across the processors, we cannot assume that they are distributed according to salary

ranges.)

Each processor then sorts the tuples assigned to it, using some sequential sorting algorithm. For example, a processor can
collect tuples until its memory is full, then sort these tuples and write out a run, until all incoming tuples have been written to

such sorted runs on the local disk. These runs can then be merged to create the sorted version of the set of tuples assigned to this

processor. The entire sorted relation can be retrieved by visiting the processors in an order corresponding to the ranges assigned to

them and simply scanning the tuples.

The basic challenge in parallel sorting is to do the range partitioning so that each processor receives roughly the same number

of tuples; otherwise, a processor that receives a disproportionately large number of tuples to sort becomes a bottleneck and limits

the scalability of the parallel sort. One good approach to range partitioning is to obtain a sample of the entire relation by taking

samples at each processor that initially contains part of the relation. The (relatively small) sample is sorted and used to identify

ranges with equal numbers of tuples. This set of range values, called a splitting vector, is then distributed to all processors and

used to range partition the entire relation.

A particularly important application of parallel sorting is sorting the data entries in tree-structured indexes. Sorting data entries

can significantly speed up the process of bulk-loading an index.

Joins

Suppose that we want to join two relations, say, A and B,on the age attribute. We assume that they are initially distributed across

several disks in some way that is not useful for the join operation, that is, the initial partitioning is not based on the join attribute.

The basic idea for joining A and B in parallel is to decompose the join into a collection of k smaller joins. We can decompose the

join by partitioning both A and B into a collection of k logical buckets or partitions. By using the same partitioning function for
both A and B, we ensure that the union of the k smaller joins computes the join of A and B; this idea is similar to intuition behind

the partitioning phase of a sequential hash join. Because A and B are initially distributed across several processors, the

partitioning step can itself be done in parallel at these processors. At each processor, all local tuples are retrieved and hashed into

one of k partitions, with the same hash function used at all sites, of course.

Alternatively, we can partition A and B by dividing the range of the join attribute age into k disjoint subranges and placing A and

B tuples into partitions according to the subrange to which their age values belong. For example, suppose that we have 10

processors, the join attribute is age, with values from 0 to 100. Assuming uniform distribution, A and B tuples with 0 age < 10 go

to processor 1, 10 age < 20 go to processor 2, and so on. This approach is likely to be more susceptible than hash partitioning to

data skew (i.e., the number of tuples to be joined can vary widely across partitions), unless the subranges are carefully

determined.

If range partitioning is used, the algorithm outlined above leads to a parallel version of a sort-merge join, with the advantage that

the output is available in sorted order. If hash partitioning is used, we obtain a parallel version of a hash join.

Parallel query optimization

What is parallel query optimization?

Parallel query optimization is the process of analyzing a query and choosing the best combination of parallel and serial access

methods to yield the fastest response time for the query. Parallel query optimization is an extension of the serial optimization
strategies discussed in earlier chapters. In addition to the costing performed for serial query optimization, parallel optimization

analyzes the cost of parallel access methods for each combination of join orders, join types, and indexes. The optimizer can choose

any combination of serial and parallel access methods to create the fastest query plan.

When is optimization performed?

The optimizer considers parallel query plans only when Adaptive Server and the current session are properly configured for

parallelism, as described in “Controlling the degree of parallelism”.

If both the Adaptive Server and the current session are configured for parallel queries, then all queries within the session are

eligible for parallel query optimization. Individual queries can also attempt to enforce parallel query optimization by using the

optimizer hint parallel N for parallel or parallel 1 for serial.

If the Adaptive Server or the current session is not configured for parallel queries, or if a given query uses optimizer hints to

enforce serial execution, then the optimizer considers serial access methods; the parallel access methods described in this chapter

are not considered.

Adaptive Server does not execute parallel queries against system tables.
Introduction to distributed DatabasesTypes: 

Overview

A distributed database is a database that is not limited to one computer system. It is like a database that consists of two or more

files located in different computers or sites either on the same network or on an entirely different network. Instead of storing all of

the data in one database, data is divided and stored at different locations or sites which do not share any physical component.

Need of Distributed Database

Let's start with the databases and their types,

 A database is an structured collection of information. The data can be easily accessed, managed, modified, updated,

controlled, and organized in a database.

 Databases can be broadly classified into two types, namely Distributed and Centralized databases. The question here is

why do we even need a distributed database?. Let's assume for a moment that we have only centralized databases.

o We will be inserting all the data into one single database. Making it too large so that it will take a lot of time to query a

single piece of record.

o Once a fault occurs, we no longer be able to serve user requests as we have only one database.

o No scaling is possible even if we wanted to and availability is also less which in turn affects the throughput.
Distributed databases resolve various issues, such as availability, fault tolerance, throughput, latency, scalability, and many

other problems that can arise from using a single machine and a single database. That's why we need distributed

databases. Let's discuss them in detail.

Distributed Databases

 A distributed database is a database that is not limited to one computer system. It is like a database that consists of two or

more files located in different computers or sites either on the same network or on an entirely different network.

 These sites do not share any physical component. Distributed databases are needed when a particular data in the database

needs to be accessed by various users globally. It needs to be handled in such a way that for a user it always looks like one

single database.

 By contrast, a Centralized database consists of a single database file located at one site using a single network.

 Below is a reference diagram for distributed databases.


 hough there are many distributed databases to choose from, some examples of distributed databases include Apache

Ignite, Apache Cassandra, Apache HBase, Amazon SimpleDB, Clusterpoint, and FoundationDB.

Features of Distributed Databases

In general, distributed databases include the following features:

1. Location independency: Data is independently stored at multiple sites and managed by independent Distributed database

management systems (DDBMS).

2. Network linking: All distributed databases in a collection are linked by a network and communicate with each other.
3. Distributed query processing: Distributed query processing is the procedure of answering queries (which means mainly

read operations on large data sets) in a distributed environment.

o Query processing involves the transformation of a high-level query (e.g., formulated in SQL) into a query execution

plan (consisting of lower-level query operators in some variation of relational algebra) as well as the execution of this

plan.

4. Hardware independent: The different sites where data is stored are hardware-independent. There is no physical contact

between these distributed databases which is accomplished often through virtualization.

5. Distributed transaction management: Distributed database provides a consistent distribution through commit

protocols, distributed recovery methods, and distributed concurrency control techniques in case of many transaction failures.

Distributed Database Architecture

Types of Distributed Database

There are two types of distributed databases:

 Homogenous distributed database.

 Heterogeneous distributed database.
Homogenous Distributed Database

 A Homogenous distributed database is a network of identical databases stored on multiple sites. All databases stores data

identically, the operating system, DDBMS and the data structures used – all are same at all sites, making them easy to

manage.

 Below is a diagram for the same,


Heterogeneous Distributed Database

 It is the opposite of a Homogenous distributed database. It uses different schemas, operating systems, DDBMS, and different

data models causing it difficult to manage.

 In the case of a Heterogeneous distributed database, a particular site can be completely unaware of other sites. This

causes limited cooperation in processing user requests, this is why translations are required to establish communication

between sites.

 Below is a diagram for the same,


Client/Server Database Architecture

A database server is the Oracle software managing a database, and a client is an application that requests information from a

server. Each computer in a network is a node that can host one or more databases. Each node in a distributed database system

can act as a client, a server, or both, depending on the situation.

In Figure 31-2, the host for the hq database is acting as a database server when a statement is issued against its local data (for

example, the second statement in each transaction issues a statement against the local dept table), but is acting as a client when

it issues a statement against remote data (for example, the first statement in each transaction is issued against the remote

table emp in the sales database).
A client can connect directly or indirectly to a database server. A direct connection occurs when a client connects to a server and

accesses information from a database contained on that server. For example, if you connect to the hq database and access

the dept table on this database as in Figure 31-2, you can issue the following:

SELECT * FROM dept;

This query is direct because you are not accessing an object on a remote database.
In contrast, an indirect connection occurs when a client connects to a server and then accesses information contained in a

database on a different server. For example, if you connect to the hq database but access the emp table on the

remote sales database as in Figure 31-2, you can issue the following:

SELECT * FROM emp@sales;

This query is indirect because the object you are accessing is not on the database to which you are directly connected.

Structured data types

Structured data is the data which conforms to a data model, has a well define structure, follows a consistent order and can be

easily accessed and used by a person or a computer program.

Structured data is usually stored in well-defined schemas such as Databases. It is generally tabular with column and rows that

clearly define its attributes.

SQL (Structured Query language) is often used to manage structured data stored in databases.

Characteristics of Structured Data:

 Data conforms to a data model and has easily identifiable structure

 Data is stored in the form of rows and columns

Example : Database

 Data is well organised so, Definition, Format and Meaning of data is explicitly known

 Data resides in fixed fields within a record or file


 Similar entities are grouped together to form relations or classes

 Entities in the same group have same attributes

 Easy to access and query, So data can be easily used by other programs

 Data elements are addressable, so efficient to analyse and process

Sources of Structured Data:

 SQL Databases

 Spreadsheets such as Excel

 OLTP Systems

 Online forms

 Sensors such as GPS or RFID tags

 Network and Web server logs

 Medical devices

Advantages of Structured Data:

 Structured data have a well defined structure that helps in easy storage and access of data

 Data can be indexed based on text string as well as attributes. This makes search operation hassle-free

 Data mining is easy i.e knowledge can be easily extracted from data

 Operations such as Updating and deleting is easy due to well structured form of data

 Business Intelligence operations such as Data warehousing can be easily undertaken

 Easily scalable in case there is an increment of data


 Ensuring security to data is easy

Operation on Structured Data :-

Structured data can be manipulated using built in methods for types defined using type constructor. These methods are similar to

operations used for data types (atomic) of traditional RDBMS.

1. Operations on Arrays

Array is used in the same manner as in traditional RDBMS. ‘Array index’ method is used to return the number of elements in the

array for example. Suppose we want to find those projects whose clips contain more than 10 items or objects then following query

can be used :

SELECT P.Pname, P.Projno

FROM project P, Clip C

WHERE CARDINALITY (C.Objects)>10 AND C.Projno = P.Projno

The above query select project name and projectno from “PROJECT” whose clips contain more then 10 items which can calculated

by using CARDINALITY operation.

2. Operations on Rows

Row type is a collection of fields values whose each fields can be accessed by the same traditional notation for example, address-

t.city specify the attribute ‘city’ of the type address-t. When operation is applied on collection of rows then result obtained is also a

collection of values.

If a column or field whose type is ROW (f1t1, f2t2,…….fntn) and c1 fk gives us a list of values whose type is tk. If c1 is a set of rows
or a bag of rows then c1 fk give us a set of values of type tk.

Consider ‘Emp-Dept’ schema in which we have to find the names of those employees who resides in ‘Malviya Nagar’ of ‘New Delhi’.

SELECT E Empno,E.Name

FROM Emp E

WHERE E.Address.area =’Malviya Nagar’ AND E.Address.city=’New Delhi’

AND E.Address.city = ‘New Delhi’

Encapsulation & ADTS

Encapsulation

Encapsulation gathers the data and methods of an object and puts them into a package, creating a well defined boundary around

the object.  Encapsulation is often referred to as information hiding, and encapsulation can be used to restrict which users and

what operations can be performed against the data inside the object.

Classes provide encapsulation or information hiding by access control.  A class will grant or deny access to its objects using

the public and private access specifiers.  Public members define an interface between a class and the users of that class.  Public

members can be accessed by any function in a program.  Objects can contain both public and private variables,

the public variables are used with the objects methods or interfaces.


Private variables are only known to the object, and cannot be accessed by an interface.  For example a private method might be

used to compute an internal value.

Abstract Data Type (ADT):

Data types such as int, float, double, long, etc. are considered to be in-built data types and we can perform basic operations

with them such as addition, subtraction, division, multiplication, etc. Now there might be a situation when we need operations

for our user-defined data type which have to be defined. These operations can be defined only as and when we require them.

So, in order to simplify the process of solving problems, we can create data structures along with their operations, and such

data structures that are not in-built are known as Abstract Data Type (ADT).

Abstract Data type (ADT) is a type (or class) for objects whose behavior is defined by a set of values and a set of operations. The

definition of ADT only mentions what operations are to be performed but not how these operations will be implemented. It does

not specify how data will be organized in memory and what algorithms will be used for implementing the operations. It is called

“abstract” because it gives an implementation-independent view. 

The process of providing only the essentials and hiding the details is known as abstraction.  
The user of data type does not need to know how that data type is implemented, for example, we have been using Primitive

values like int, float, char data types only with the knowledge that these data type can operate and be performed on without

any idea of how they are implemented. 

Inheritance

OOP is a paradigm that helps programmers structure their code so that they are easy to maintain, understand, and override. The

four pillars of the OOP paradigm are Abstraction, Encapsulation, Inheritance and Polymorphism.


What is Inheritance?

The term 'Inheritance' is derived from the word "Inherit" which means, "to derive any quality, features or characteristics from

family". Therefore, "Inheritance is defined as a mechanism where the sub or child class inherits the properties and

characteristics of the super class or other derived classes. It also supports additional features of extracting properties

from the child class and using it into other derived classes."

In computer programming, Inheritance is one of the most important concepts followed by Abstraction, Encapsulation and

Polymorphism in the Object Oriented Programming (OOPS) Paradigm. This mechanism allows the new objects to take on the

properties of existing objects. Inheritance builds relationships within the classes; therefore, relating two or more classes to each

other and establishing class hierarchies. Rather than writing the same code, again and again, the programmers can use the code

that is already created by inheriting its class.

Understanding Inheritance

To dig deep more into inheritance, firstly we need to understand the two types of programming classes

1. Sub Class

2. Super Class

An inherited class is known as Sub Class or Derived Class. Sub Class inherits the properties or characteristics from Base Class,

also popularly known as Super class.


In the programming world, Sub Class is also referred to as a child class, whereas the superclass is referred to as parent class.

Objects, OIDS and Reference Types

What is Object?

Object consists of entity and attributes which can describe the state of real world object and action associated with that object.

Characteristics of Object

Some important characteristics of an object are:

1. Object name

 The name is used to refer different objects in the program.

2. Object identifier

 This is the system generated identifier which is assigned, when a new object is created.

3. Structure of object

 Structure defines, how the object is constructed using constructor.

 In object oriented database the state of complex object can be constructed from other objects by using certain type of constructor.

 The formal way of representing objects as (i,c,v) where 'i' is object identifier, 'c' is type constructor and 'v' is current value of an

object.

4. Transient object
 In OOPL, objects which are present only at the time of execution are called as transient object.

For example: Variables in OOPL

5. Persistent objects

 An object which exists even after the program is completely executed (or terminated), is called as persistent objects. Object-

oriented databases can store objects in secondary memory.

Object identity

 Every object has unique identity. In an object oriented system, when object is created OID is assigned to it.

 In RDBMS OID is value based and primary key is used to provide uniqueness of each table in relation. Primary key is unique only

for that relation and not for the entire system. Primary key is chosen from the attributes of the relation which makes object

independent on the object state.

 In OODBMS OID are variable name or pointer.

Properties of OID

1. Uniqueness: OID cannot be same to every object in the system and it is generated automatically by the system.

2. Invariant: OID cannot be changed throughout its entire lifetime.

3.Invisible: OID is not visible to user.


Attributes

Attributes are nothing but the properties of objects in the system.

Example: Employee can have attribute 'name' and 'address' with assigned values as:

Attribute Value

Name Radha

Address Pune

ID 07

Type of Attributes

The three types of attributes are as follows:

1. Simple attributes

Attributes can be of primitive data type such as, integer, string, real etc. which can take literal value.

Example: 'ID' is simple attribute and value is 07.

2. Complex attributes

Attributes which consist of collections or reference of other multiple objects are called as complex attributes.

Example: Collection of Employees consists of many employee names.


3. Reference attributes

Attributes that represent a relationship between objects and consist of value or collection of values are called as reference

attributes.

Example: Manager is reference of staff object.

OODBMS:

 Object oriented database systems are alternative to relational database and other database systems.

 In object oriented database, information is represented in the form of objects.

 Object oriented databases are exactly same as object oriented programming languages. If we can combine the features of

relational model (transaction, concurrency, recovery) to object oriented databases, the resultant model is called as object oriented

database model.
Features of OODBMS

In OODBMS, every entity is considered as object and represented in a table. Similar objects are classified to classes and

subclasses and relationship between two object is maintained using concept of inverse reference.

Some of the features of OODBMS are as follows:

1. Complexity

OODBMS has the ability to represent the complex internal structure (of object) with multilevel complexity.
2. Inheritance

Creating a new object from an existing object in such a way that new object inherits all characteristics of an existing object.

3. Encapsulation

It is an data hiding concept in OOPL which binds the data and functions together which can manipulate data and not visible to

outside world.

4. Persistency

OODBMS allows to create persistent object (Object remains in memory even after execution). This feature can automatically solve

the problem of recovery and concurrency.

 1. RDBMS :

RDBMS stands for Relational Database Management System .

In this database management, the data is organized into the related tables. To access the database it uses  Structured Query

Language (SQL) . This model is based on the mathematical theory of relational algebra and calculus. The original concept for the

model is proposed by Dr. E.F. Codd in a 1970. After some time the model was classified by defining twelve rules which are

known as Codd’s rule.

2. ORDBMS :

ORDBMS stands for Object-Relational Database Management System .


It provides all the facilities of RDBMS with the additional support of object oriented concepts. The concept of classes, objects

and inheritance are supported in this database. It is present in the ground level between the RDBMS and OODBMS. In this

data can be manipulated by using any query language. It is complex because it has to take care of both Relational database

concepts and as well as Object Oriented concepts.

S.No. RDBMS ORDBMS

RDBMS is a Relational ORDBMS is a Object Oriented Relational

Database Management System Database Management System based on

based on the Relational model the Relational as well as Object Oriented

1. of data. database model.

2. It follows table structure, it is It is same as RDBMS but it has some


S.No. RDBMS ORDBMS

simple to use and easy to extra confusing extensions because of

understand. the Object Oriented concepts.

It has no extensibility and

3. content. It is only limited to the new data-types.

Since RDBMS is old so, it is It is developing so it is immature in

4. very mature. nature.

In this, there is extensive

supply of tools and trained It can take the advances of RDBMS tools

5. developers. and developers.

It has poor support for Object- It supports the features of object-

6. Oriented programming. oriented programming.

7. It supports Structured Query It supports Object Query Language


S.No. RDBMS ORDBMS

Language (SQL). (OQL).

RDMS is used for traditional

applications tasks such as data

administration and data ORDMS is used for applications with

8. processing. complex objects.

It is capable of handling only It is also capable of handling the

9. simple data. complex data.

MS SQL server, MySQL, SQLite,

MariaDB are examples of

10 RDBMS. PostgreSQL is an example of ORDBMS.

Difference between RDBMS and OODBMS


RDBMS and OODBMS are database management systems. RDBMS uses tables to represent data and their relationships whereas

OODBMS represents data in form of objects similar to Object Oriented Programming.

Following are the important differences between RDBMS and OODBMS.

Sr. Key RDBMS OODBMS

No.

Definition RDBMS stands for Relational OODBMS stands for Object Oriented

1 DataBase Management DataBase Management System.

System.

Data Data is stored as entities Data is stored as objects.


2
Management defined in tabular format.

Data RDBMS handles simple data. OODBMS handles large and complex
3
Complexity data.

Term An entity refers to collection An class refers to group of objects

4 of similar items having same having common relationships,

definition. behaviors and properties.

5 Data RDBMS handles only data. OODBMS handles both data and
Sr. Key RDBMS OODBMS

No.

Handling functions operating on that data.

Objective To keep data independent To implement data encapsulation.


6
from application program.

Key A primary key identifies in Object Id, OID represents an object


7
object in a table uniquely. uniquely in group of objects.
r. No. Key RDBMS OODBMS

Definition RDBMS stands for OODBMS stands for

Relational Object Oriented

1 DataBase DataBase Management

Management System.

System.

Data Data is stored as Data is stored as

2 Management entities defined in objects.

tabular format.

Data RDBMS handles OODBMS handles

3 Complexity simple data. large and complex

data.

4 Term An entity refers to An class refers to

collection of group of objects having

similar items common relationships,

having same behaviors and


r. No. Key RDBMS OODBMS

definition. properties.

Data RDBMS handles OODBMS handles both

5 Handling only data. data and functions

operating on that data.

Objective To keep data To implement data

independent from encapsulation.


6
application

program.

Key A primary key Object Id, OID

identifies in object represents an object


7
in a table uniquely in group of

uniquely. objects.

You might also like