0% found this document useful (0 votes)
21 views

DBMS Notes

DBMS Notes

Uploaded by

yashlanjewar370
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

DBMS Notes

DBMS Notes

Uploaded by

yashlanjewar370
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

UNIT ONE

Introduction to DBMS:
A Database Management System (DBMS) is a software system that is designed to manage and
organize data in a structured manner. It allows users to create, modify, and query a database, as
well as manage the security and access controls for that database.
DBMS provides an environment to store and retrieve the data in convenient and efficient
manner.
Key Features of DBMS:
● Data modeling: A DBMS provides tools for creating and modifying data models, which
define the structure and relationships of the data in a database.
● Data storage and retrieval: A DBMS is responsible for storing and retrieving data from
the database, and can provide various methods for searching and querying the data.
● Concurrency control: A DBMS provides mechanisms for controlling concurrent access to
the database, to ensure that multiple users can access the data without conflicting with each
other.
● Data integrity and security: A DBMS provides tools for enforcing data integrity and
security constraints, such as constraints on the values of data and access controls that
restrict who can access the data.
● Backup and recovery: A DBMS provides mechanisms for backing up and recovering the
data in the event of a system failure.
● DBMS can be classified into two types: Relational Database Management System
(RDBMS) and Non-Relational Database Management System (NoSQL or Non-SQL)
● RDBMS: Data is organized in the form of tables and each table has a set of rows and
columns. The data are related to each other through primary and foreign keys.
● NoSQL: Data is organized in the form of key-value pairs, documents, graphs, or column-
based. These are designed to handle large-scale, high-performance scenarios.
A database is a collection of interrelated data which helps in the efficient retrieval, insertion, and
deletion of data from the database and organizes the data in the form of tables, views, schemas,
reports, etc. For Example, a university database organizes the data about students, faculty, admin
staff, etc. which helps in the efficient retrieval, insertion, and deletion of data from it.

Definition of DBMS:

i. Database Management Systems (DBMS) are software systems used to store, retrieve, and run
queries on data. A DBMS serves as an interface between an end-user and a database,
allowing users to create, read, update, and delete data in the database.
ii. DBMS – is a computerized solution that helps store information in a manner that is easy to
read, edit, delete, and scale, with the primary objective of drawing correlations, powering
analysis, and supporting data-driven workflows. It has a backend storage system as well as a
front-end user interface.

Traditional File System



File system is collection of data. In this system, user has to write procedures for
managing database. It provides details of data representation and storage of data. In this –
● Data is stored in files.
● Each file has specific format.
● Programs that use these files depend on knowledge about that format.
● In earlier days, database applications were built on top of file systems.
This approach is mostly obsolete but –
● Understanding problems inherent in file based systems may prevent us from repeating
these problems in our database system.
● Understanding how file system works is extremely useful when converting a file-based
system to a database system.
Basically, it is a collection of application programs that performs services for end users such
as production of reports. Each file defines and manages its own data.
It doesn’t have a crash mechanism i.e., if system crashes while entering some data, then
content of file will be lost. This is disadvantage of traditional file based system. Also, it is
very difficult to protect a file under the file system. This system can’t efficiently store and
retrieve data.
Advantages of Traditional File System :
● File processing cost less and can be more speed than database.
● File processing design approach was well suited to mainframe hardware and batch input.
● Companies mainly use file processing to handle large volumes of structured data on a
regular basis.
● It can be more efficient and cost less than DBMS in certain situations.
● Design is simple.
● Customization is easy and efficient.
Disadvantages of Traditional File System :
● Data redundancy and inconsistency.
● Difficulty in accessing data.
● Data isolation – multiple files and formats.
● Integrity problems
● Unauthorized access is not restricted.
● It co-ordinates only physical access.

File System V/S DBMS


Basis DBMS Approach File System Approach

Meaning DBMS is a collection of data. In The file system is a collection of


DBMS, the user is not required to data. In this system, the user has to
write the procedures. write the procedures for managing
the database.
Sharing of data Due to the centralized approach, data Data is distributed in many files,
sharing is easy. and it may be of different formats,
so it isn't easy to share data.

Data Abstraction DBMS gives an abstract view of data The file system provides the detail
that hides the details. of the data representation and
storage of data.

Security and DBMS provides a good protection It isn't easy to protect a file under
Protection mechanism. the file system.

Recovery DBMS provides a crash recovery The file system doesn't have a
Mechanism mechanism, i.e., DBMS protects the crash mechanism, i.e., if the system
user from system failure. crashes while entering some data,
then the content of the file will be
lost.

Manipulation DBMS contains a wide variety of The file system can't efficiently
Techniques sophisticated techniques to store and store and retrieve the data.
retrieve the data.

Concurrency DBMS takes care of Concurrent In the File system, concurrent


Problems access of data using some form of access has many problems like
locking. redirecting the file while deleting
some information or updating
some information.

Where to use Database approach used in large File system approach used in large
systems which interrelate many files. systems which interrelate many
files.

Cost The database system is expensive to The file system approach is


design. cheaper to design.

Data Redundancy Due to the centralization of the In this, the files and application
and Inconsistency database, the problems of data programs are created by different
redundancy and inconsistency are programmers so that there exists a
controlled. lot of duplication of data which
may lead to inconsistency.

Structure The database structure is complex to The file system approach has a
design. simple structure.

Data In this system, Data Independence In the File system approach, there
Independence exists, and it can be of two types. exists no Data Independence.
o Logical Data Independence
o Physical Data Independence
Integrity Integrity Constraints are easy to Integrity Constraints are difficult to
Constraints apply. implement in file system.

Data Models In the database approach, 3 types of In the file system approach, there is
data models exist: no concept of data models exists.
o Hierarchal data models
o Network data models
o Relational data models

Flexibility Changes are often a necessity to the The flexibility of the system is less
content of the data stored in any as compared to the DBMS
system, and these changes are more approach.
easily with a database approach.

Examples Oracle, SQL Server, Sybase etc. Cobol, C++ etc.

Objectives Of DBMS

DBMS has many objectives to increase the efficiency and organized handling of data in an
organization. The main objectives of DBMS are –

Data Availability:
DBMS ensures that the format of data should be meaningful and at a reasonable cost so that
a wide range of users can access it.

Data Organization and Structuring:


DBMS structures and efficiently organize the data to eliminate data complexity and provide
a framework for relationships between different data entities.

Data Integrity and Consistency:


DBMS uses integrity constraints like primary keys, and foreign keys to maintain the
integrity and accuracy of data.

Data Security and Authorization:


To maintain the authority of data and protect it from unauthorized access, DBMS validates
the user and gives permissions to only authorized users to view, modify, and delete.
Data Independence:
DBMS allows easier modification of databases without affecting the applications.

Three Level Architecture of DBMS:

The 3-level architecture is a commonly used architectural approach in Database


Management Systems (DBMSs) for the design and development of applications that work
with databases. The 3-level architecture divides an application’s components into three tiers
or layers. Each layer has its own set of responsibilities.
DBMS 3-level architecture divides the complete system into three inter-related but
independent modules as shown below:

● Physical Level: At the physical level, the information about the location of database
objects in the data store is kept. Various users of DBMS are unaware of the locations of
these objects. In simple terms, physical level of a database describes how the data is
being stored in secondary storage devices like disks and tapes and also gives insights on
additional storage details.
● Conceptual Level: At conceptual level, data is represented in the form of various
database tables. For Example, STUDENT database may contain STUDENT and
COURSE tables which will be visible to users but users are unaware of their storage.
Also referred as logical schema, it describes what kind of data is to be stored in the
database.
● External Level: An external level specifies a view of the data in terms of conceptual
level tables. Each external level view is used to cater to the needs of a particular
category of users. For Example, FACULTY of a university is interested in looking
course details of student, STUDENTS are interested in looking at all details related to
academics, accounts, courses and hostel details as well. So, different views can be
generated for different users. The main focus of external level is data abstraction.
Data Independence
Data independence means a change of data at one level should not affect another level. Two
types of data independence are present in this architecture:
● Physical Data Independence: Any change in the physical location of tables and indexes
should not affect the conceptual level or external view of data. This data independence is
easy to achieve and implemented by most of the DBMS.
● Conceptual Data Independence: The data at conceptual level schema and external
level schema must be independent. This means a change in conceptual schema should
not affect external schema. e.g.; Adding or deleting attributes of a table should not affect
the user’s view of the table. But this type of independence is difficult to achieve as
compared to physical data independence because the changes in conceptual schema are
reflected in the user’s view.

Component of DBMS:

Hardware, Software, Data, Database Access Language, Procedures and Users all together
form the components of a DBMS.
Let us discuss the components one by one clearly.
Hardware
The hardware is the actual computer system used for keeping and accessing the database.
The conventional DBMS hardware consists of secondary storage devices such as hard disks.
Databases run on the range of machines from micro computers to mainframes.
Software
Software is the actual DBMS between the physical database and the users of the system. All
the requests from the user for accessing the database are handled by DBMS.
Data
It is an important component of the database management system. The main task of DBMS
is to process the data. Databases are used to store the data, retrieved, and updated to and
from the databases.
Users
There are a number of users who can access or retrieve the data on demand using the
application and the interfaces provided by the DBMS.
The users of the database can be classified into different groups −
● Native Users
● Online Users
● Sophisticated Users
● Specialized Users
● Application Users
● DBA- Database Administrator

The components of DBMS are given below in pictorial form –


What is a database administrator
(DBA)

A database administrator (DBA) is the


information technician
responsible for directing
and performing all activities related to maintaining and securing a
successful database environment. A DBA ensures an organization's databases and related
applications operate functionally and efficiently.

DBAs frequently design and develop new features and debug issues, while working
alongside a diverse group of individuals such as developers, data architects and business
stakeholders. Therefore, they're expected to be proficient in both technical and business
skills.

What are the roles and responsibilities of a DBA?

A DBA assumes many different roles and responsibilities within the IT department
involving database systems and applications.

The roles and responsibilities of a DBA typically include the following:

● Database deployment and management. The DBA keeps databases and


applications running up to performance, availability and recoverability standards
while handling additional functions. When adopting a new DBMS, the DBA is
responsible for designing, setting up and maintaining the database system. Often that
includes installing the DBMS and setting up the IT infrastructure to enable
applications to access databases. For a cloud database implementation, the DBA isn't
responsible for installation but must orchestrate the proper configuration, access and
deployment options for their organization's use of the cloud database.

● Documentation and training. The DBA must also establish policies and procedures
pertaining to the management, security, maintenance and use of the database
management system. The DBA group creates training materials and instructs
employees and developers on the proper access and usage of the DBMS.
● Troubleshooting. When problems arise, the DBA is the focal point for resolution,
including troubleshooting, root cause analysis, fine tuning and optimizing the
performance of tasks and programs that access the database. The DBA must be
capable of performing root cause analysis -- identifying the cause of the problem so it
can be resolved. This task requires the ability to locate bottlenecks and points of
contention, monitor workload and throughput, review SQL performance and
optimization, monitor storage space and fragmentation and view and manage the
system and DBMS resource use.

● Backup and disaster recovery. The DBA is responsible for ensuring that databases
and data are backed up appropriately and can be recovered correctly and quickly in
the event of failure. The DBA also ensures that databases are protected and secured,
enacts measures to maintain the database integrity in terms of data accuracy and
makes sure unauthorized users can't access the data.

● Collaboration and integration. The DBA frequently gets pulled into other projects
as a subject matter expert on the database. Because databases are at the center of most
modern application development projects, the DBA participates in database
integration and use in a variety of IT projects. This exposure to many different
technologies and experiences can make the DBA a valuable IT technician not just for
database-related issues, but for other technologies as well.

● Efficient database design. Database administrators work with application


developers to ensure accurate and efficient application design for database access.
DBA tasks include interfacing SQL with traditional programming languages,
selecting the type of SQL to use, using middleware and APIs such as REST, Open
Database Connectivity, Java Database Connectivity and SQLJ effectively, defining
transactions and determining the appropriate use of frameworks such as Java
Enterprise Edition and .NET. DBAs might also be asked to modify or write
application code to help development projects.

● Optimizing database performance. Database performance is the optimization of


resource usage to increase throughput and minimize contention, enabling the largest
possible workload to be processed. Ensuring efficient performance of the database
and applications that access it is a core function of database administration.

● Staying current. As the central point of contact for information about the company's
database management systems, DBAs must keep up to date on the latest versions and
capabilities of each DBMS.

● Using the right tools. Administering a DBMS usually requires the use of other tools
than those provided with the DBMS. DBAs must understand the strengths and
weaknesses of native DBA tools, develop a strategy for addressing the weaknesses
and use tools that improve the performance, availability, administration and recovery
of the databases they manage.

● User access and support. A DBA creates new user accounts and provides access
permissions. They also create and manage database reports, visualizations and
dashboards to help users easily understand the status of their database.

● Communication. DBAs must have good communication skills. They must be able to
converse and work with application programmers, business end users, IT and
business managers, data analysts and other DBAs.

What are the different types of DBAs?


Many different types of DBAs exist, the most common type being the general-purpose
DBA, who performs all types of administrative and data-related work. However, it isn't
uncommon for DBAs to focus on specific problem domains. A DBA, for example, might
focus entirely on database design, perhaps broken into logical design and physical design;
specialize in building systems; concentrate on maintaining and tuning existing systems; or
center their attention on narrow areas of database management and administration.
Within larger organizations, DBA responsibilities typically are split into separate types of
roles. Beyond general-purpose, the primary roles include system DBA, database architect,
database analyst, application DBA, task-oriented DBA, performance analyst, data
warehouse administrator and cloud DBA.

System DBA

This role focuses on technical, rather than business, issues. The system DBA is
knowledgeable in the arcane technical details of how the database is installed, configured
and modified. Typical tasks center on the physical installation and performance of the
DBMS software and can include the following:
● Installing new software versions and applying fixes.
● Setting and tuning system parameters.
● Tuning the operating system, network and transaction processors to work with the
DBMS.
● Ensuring appropriate storage and memory are available for the DBMS.
System DBAs are rarely involved with the actual database and application set up. They
might get involved in application tuning when operating system parameters or complex
DBMS parameters need to be altered.

Database architect

The primary responsibility of this role is the design and set up of new databases. The
database architect designs new databases and database structures for new and existing
applications and is rarely involved in the maintenance and tuning of established databases
and applications. Typical tasks include the following:
● Modelling logical data.
● Translating logical data models into a physical database design.
● Analysing data access requirements to ensure optimal database design and efficient
SQL access.
● Creating backup and recovery strategies for new databases.
Database analyst

Sometimes junior DBAs are referred to as database analysts. The database analyst's role
might be like that of the database architect. The database analyst designation can be another
name for a database administrator.

Application DBA

An application DBA focuses on database design and the ongoing database support and
administration for a specific application or subset of applications. The application DBA is
likely an expert in writing and debugging complex SQL. They understand the best ways to
incorporate database requests into application programs. Application DBAs typically are
also responsible for managing and refreshing test data for application development teams.

Not every organization has an application DBA on staff. In that case, the general-purpose
DBA supports specific applications while also maintaining the organization's database
environment. But even with an application DBA on board, general-purpose DBAs are still
required to support the overall database environment and infrastructure.

Task-oriented DBA

This specialized DBA focuses on a specific administrative task and is uncommon outside of
large IT shops. A backup and recovery DBA, for example, would be task-oriented to ensure
the organization's databases are recoverable, including creating backup plans, building and
testing backup scripts, testing recovery scripts, and driving recovery tasks when required.
The backup and recovery DBA also participates in building and testing disaster contingency
plans for the company's databases.

Performance analyst

As the most common task-oriented DBA, the performance analyst focuses entirely on
monitoring databases and improving the performance of applications that access them. A
performance analyst is an expert in SQL coding for performance and is knowledgeable in
designing and building high-performance databases. Performance analysts should have a
deep understanding of the DBMS, collaborate with other DBAs to execute changes when
required and communicate with application developers in their language to facilitate
appropriate program changes for performance.
Data warehouse administrator

This fully capable DBA has the knowledge and skills to monitor and support the data
warehouse environment. Data warehouse administrators understand the differences between
a database that supports online transaction processing and a data warehouse and must have
experience in the following areas:
● Business intelligence and query tools.
● Specialized database design for data warehousing.
● ETL skills.
● Knowledge of data warehousing technologies, such as online analytical
processing and star schema.
Cloud DBA

As companies increasingly migrate workloads to the cloud, the cloud DBA has become
more popular and performs many of the same tasks as a general-purpose DBA but for cloud
database executions on services like Amazon Web Services and Microsoft Azure. The cloud
DBA understands the services the cloud provider offers, including backup and security, to
set up databases in the cloud. Cloud DBAs need to be aware of latency, fault tolerance and
especially cost management because adding data or workloads to a cloud setup can
significantly increase costs.

Data modelers

Although an efficient database design must include data modelling, this isn't always
considered the job of a DBA, since they're more focused on technology than the business.
However, DBAs should know the essentials of data modelling, including how to build and
manage data models to illustrate the relationships between data components.

Data modelers create the data models and computer databases needed to transform intricate
organizational data into functional computer systems. They also oversee the information
flow across different departments within a business using relational, dimensional
and NoSQL databases.

Different Types of Database Users

A Database User is defined as a person who interacts with data daily, updating, reading, and
modifying the given data. Database users can access and retrieve data from the database
through the Database Management System (DBMS) applications and interfaces.

Types of Database Users

Database users are categorized based on their interaction with the database. There are seven
types of database users in DBMS. Below mentioned are the types of database users:
1. Database Administrator (DBA)
A Database Administrator (DBA) is a person/team who defines the schema and also
controls the 3 levels of the database. The DBA will then create a new account ID and
password for the user if he/she needs to access the database. DBA is also responsible for
providing security to the database and he allows only authorized users to access/modify the
database. DBA is responsible for problems such as security breaches and poor system
response time.
● DBA also monitors the recovery and backup and provides technical support.
● The DBA has a DBA account in the DBMS which is called a system or superuser
account.
● DBA repairs damage caused due to hardware and/or software failures.
● DBA is the one having privileges to perform DCL (Data Control
Language) operations such as GRANT and REVOKE, to allow/restrict a particular
user from accessing the database.

2. Naive / Parametric End Users


Parametric End Users are the unsophisticated who don’t have any DBMS knowledge but
they frequently use the database applications in their daily life to get the desired results. For
example, Railway’s ticket booking users are naive users. Clerks in any bank is a naive user
because they don’t have any DBMS knowledge but they still use the database and perform
their given task.

3. A System Analyst
A system Analyst is a user who analyses the requirements of parametric end users. They
check whether all the requirements of end users are satisfied.

4. Sophisticated Users
Sophisticated users can be engineers, scientists, business analyst, who are familiar with the
database. They can develop their own database applications according to their requirement.
They don’t write the program code but they interact the database by writing SQL queries
directly through the query processor.

5. Database Designers
Data Base Designers are the users who design the structure of database which includes
tables, indexes, views, triggers, stored procedures and constraints which are usually
enforced before the database is created or populated with data. He/she controls what data
must be stored and how the data items to be related. It is the responsibility of Database
Designers to understand the requirements of different user groups and then create a design
which satisfies the need of all the user groups.

6. Application Programmers
Application Programmers also referred as System Analysts or simply Software Engineers,
are the back-end programmers who writes the code for the application programs. They are
the computer professionals. These programs could be written in Programming languages
such as Visual Basic, Developer, C, FORTRAN, COBOL etc. Application programmers
design, debug, test, and maintain set of programs called “canned transactions” for the Naive
(parametric) users in order to interact with database.

7. Casual Users / Temporary Users


Casual Users are the users who occasionally use/access the database but each time when
they access the database they require the new information, for example, Middle or higher
level manager.

8. Specialized users
Specialized users are sophisticated users who write specialized database application that
does not fit into the traditional data-processing framework. Among these applications are
computer aided-design systems, knowledge-base and expert systems etc.

Data Models in DBMS

A Data Model in Database Management System (DBMS) is the concept of tools that are
developed to summarize the description of the database. Data Models provide us with a
transparent picture of data which helps us in creating an actual database. It shows us from
the design of the data to its proper implementation of data.

Types of Relational Models

1. Conceptual Data Model

2. Representational Data Model

3. Physical Data Model

It is basically classified into 3 types:-


1. Conceptual Data Model

The conceptual data model describes the database at a very high level and is useful to
understand the needs or requirements of the database. It is this model, that is used in the
requirement-gathering process i.e. before the Database Designers start making a particular
database. One such popular model is the entity/relationship model (ER model). The E/R
model specializes in entities, relationships, and even attributes that are used by database
designers. In terms of this concept, a discussion can be made even with non-computer
science(non-technical) users and stakeholders, and their requirements can be understood.
Entity-Relationship Model (ER Model): It is a high-level data model which is used to
define the data and the relationships between them. It is basically a conceptual design of
any database which is easy to design the view of data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can be a name, place, object,
class, etc. These are represented by a rectangle in an ER Diagram.
2. Attributes: An attribute can be defined as the description of the entity. These are
represented by Ellipse in an ER Diagram. It can be Age, Roll Number, or Marks for a
Student.
3. Relationship: Relationships are used to define relations among different entities.
Diamonds and Rhombus are used to show Relationships.
Characteristics of a conceptual data model
● Offers Organization-wide coverage of the business concepts.
● This type of Data Models are designed and developed for a business audience.
● The conceptual model is developed independently of hardware specifications like
data storage capacity, location or software specifications like DBMS vendor and
technology. The focus is to represent data as a user will see it in the “real world.”
Conceptual data models known as Domain models create a common vocabulary for all
stakeholders by establishing basic concepts and scope

2. Representational Data Model


This type of data model is used to represent only the logical part of the database and does
not represent the physical structure of the database. The representational data model allows
us to focus primarily, on the design part of the database. A popular representational model
is a Relational model. The relational Model consists of Relational Algebra and Relational
Calculus. In the Relational Model, we basically use tables to represent our data and the
relationships between them. It is a theoretical concept whose practical implementation is
done in Physical Data Model. The advantage of using a Representational data model is to
provide a foundation to form the base for the Physical model

3. Physical Data Model


The physical Data Model is used to practically implement Relational Data Model.
Ultimately, all data in a database is stored physically on a secondary storage device such as
discs and tapes. This is stored in the form of files, records, and certain other data structures.
It has all the information on the format in which the files are present and the structure of the
databases, the presence of external data structures, and their relation to each other. Here, we
basically save tables in memory so they can be accessed efficiently. In order to come up
with a good physical model, we have to work on the relational model in a better
way. Structured Query Language (SQL) is used to practically implement Relational
Algebra.
This Data Model describes HOW the system will be implemented using a specific DBMS
system. This model is typically created by DBA and developers. The purpose is actual
implementation of the database.
Characteristics of a physical data model:
● The physical data model describes data need for a single project or application
though it may be integrated with other physical data models based on project scope.
● Data Model contains relationships between tables that which addresses cardinality
and nullability of the relationships.
● Developed for a specific version of a DBMS, location, data storage or technology to
be used in the project.
● Columns should have exact datatypes, lengths assigned and default values.
● Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are
defined

Some Other Data Models


1. Hierarchical Model
The hierarchical Model is one of the oldest models in the data model which was developed
by IBM, in the 1950s. In a hierarchical model, data are viewed as a collection of tables, or
we can say segments that form a hierarchical relation. In this, the data is organized into a
tree-like structure where each record consists of one parent record and many children. Even
if the segments are connected as a chain-like structure by logical associations, then the
instant structure can be a fan structure with multiple branches. We call the illogical
associations as directional associations.

2. Network Model
The Network Model was formalized by the Database Task group in the 1960s. This model
is the generalization of the hierarchical model. This model can consist of multiple parent
segments and these segments are grouped as levels but there exists a logical association
between the segments belonging to any level. Mostly, there exists a many-to-many logical
association between any of the two segments.
3. Object-Oriented Data Model
In the Object-Oriented Data Model, data and their relationships are contained in a single
structure which is referred to as an object in this data model. In this, real-world problems
are represented as objects with different attributes. All objects have multiple relationships
between them. Basically, it is a combination of Object Oriented programming and a
Relational Database Model.

4. Float Data Model


The float data model basically consists of a two-dimensional array of data models that do
not contain any duplicate elements in the array. This data model has one drawback it cannot
store a large amount of data that is the tables can not be of large size.

5. Context Data Model


The Context data model is simply a data model which consists of more than one data model.
For example, the Context data model consists of ER Model, Object-Oriented Data Model,
etc. This model allows users to do more than one thing which each individual data model
can do.

6. Semi-Structured Data Model


Semi-Structured data models deal with the data in a flexible way. Some entities may have
extra attributes and some entities may have some missing attributes. Basically, you can
represent data here in a flexible way.
Advantages of Data Models
1. Data Models help us in representing data accurately.
2. It helps us in finding the missing data and also in minimizing Data Redundancy.
3. Data Model provides data security in a better way.
4. The data model should be detailed enough to be used for building the physical
database.
5. The information in the data model can be used for defining the relationship between
tables, primary and foreign keys, and stored procedures.
Disadvantages of Data Models
1. In the case of a vast database, sometimes it becomes difficult to understand the data
model.
2. You must have the proper knowledge of SQL to use physical models.
3. Even smaller change made in structure require modification in the entire application.
4. There is no set data manipulation language in DBMS.
5. To develop Data model one should know physical data stored characteristics.

1. Hierarchical Data Model:


Hierarchical data model is the oldest type of the data model. It was developed by IBM in
1968. It organizes data in the tree-like structure. Hierarchical model consists of the
following :
● It contains nodes which are connected by branches.
● The topmost node is called the root node.
● If there are multiple nodes appear at the top level, then these can be called as root
segments.
● Each node has exactly one parent.
● One parent may have many child.

2. Network Data Model:


It is the advance version of the hierarchical data model. To organize data it uses directed
graphs instead of the tree-structure. In this child can have more than one parent. It uses the
concept of the two data structures i.e. Records and Sets.

In the above figure, Project is the root node which has two children i.e. Project 1 and
Project 2. Project 1 has 3 children and Project 2 has 2 children. Total there are 5 children
i.e. Department A, Department B and Department C, they are network related children as
we said that this model can have more than one parent. So, for the Department B and
Department C have two parents i.e. Project 1 and Project 2.

Difference between Hierarchical Data Model and Network Data Model :


S.
No. Hierarchical Data Model Network Data Model

In this model, you could create a network


In this model, to store data hierarchy
1. that shows how data is related to each
method is used.
other.

It implements 1:1, 1:n and also many to


2. It implements 1:1 and 1:n relations.
many relations.

To organize records, it uses tree


3. To organize records, it uses graphs.
structure.

Records are linked with the help of Records are linked with the help of linked
4.
pointers. list.

Insertion anomaly exits in this model


5. i.e. child node cannot be inserted There is no insertion anomaly.
without the parent node.

Deletion anomaly exists in this model


6. i.e. it is difficult to delete the parent There is no deletion anomaly.
node.

It is used to access the data which is It is used to access the data which is
7.
complex and asymmetric. complex and symmetric.

When update operation is performed, it


No such problem exists because of the
suffers from inconsistency problem
8. single occurrence of records while
because of the existence of multiple
updating.
instances of child records.
S.
No. Hierarchical Data Model Network Data Model

There is partial data independence in this


9. This model lacks data independence.
model.

Less flexible in comparison to the


10. It is flexible.
relational model.

When you are searching for a record Searching for a record is easy because of
11. then firstly you need to visit parent the availability of multiple access paths to
record before retrieving a child record. reach data item.

Example- IBM’s IMS (Information


Example- Oracle. SQL Server, Sybase
12. Management System) implement this
DBMS implement this model.
model.
UNIT TWO

Introduction of ER Model
● Peter Chen developed the ER diagram in 1976 .The ER model was created to provide a
simple and understandable model for representing the structure and logic of databases. It
has since evolved into variations such as the Enhanced ER Model and the Object
Relationship Model
● The Entity Relational Model is a model for identifying entities to be represented in the
database and representation of how those entities are related. The ER data model specifies
enterprise schema that represents the overall logical structure of a database graphically.
● The Entity Relationship Diagram explains the relationship among the entities present in
the database. ER models are used to model real-world objects like a person, a car, or a
company and the relation between these real-world objects. In short, the ER Diagram is the
structural format of the database.

Why Use ER Diagrams In DBMS?


● ER diagrams are used to represent the E-R model in a database, which makes them easy to
convert into relations (tables).
● ER diagrams provide the purpose of real-world modeling of objects which makes them
intently useful.
● ER diagrams require no technical knowledge and no hardware support.
● These diagrams are very easy to understand and easy to create even for a native user.
● It gives a standard solution for visualizing the data logically.

Symbols Used in ER Model

ER Model is used to model the logical view of the system from a data perspective which consists
of these symbols:
● Rectangles: Rectangles represent Entities in the ER Model.
● Ellipses: Ellipses represent Attributes in the ER Model.
● Diamond: Diamonds represent Relationships among Entities.
● Lines: Lines represent attributes to entities and entity sets with other relationship types.
● Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
● Double Rectangle: Double Rectangle represents a Weak Entity.

Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented
as rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be
taken as an entity.

a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't contain any
key attribute of its own. The weak entity is represented by a double rectangle.

2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It
represents a primary key. The key attribute is represented by an ellipse with
the text underlined.

b. Composite Attribute
An attribute that composed of many other attributes is known as a composite
attribute. The composite attribute is represented by an ellipse, and those
ellipses are connected with an ellipse.

c. Multi-valued Attribute
An attribute can have more than one value. These attributes are known as a
multi-valued attribute. The double oval is used to represent multi-valued
attribute.

For example, a student can have more than one phone number.

d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived
attribute. It can be represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from
another attribute like Date of birth.

3. Relationship
A relationship is used to describe the relation between entities. Diamond or
rhombus is used to represent the relationship.

Types of relationship are as follows:


a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it
is known as one to one relationship.

1 Has 1
Person Passport

b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of
an entity on the right associates with the relationship then this is known as a
one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done
by the only specific scientist.

c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of
an entity on the right associates with the relationship then it is known as a
many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many
students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one
instance of an entity on the right associates with the relationship then it is
known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have
many employees.

Cardinality
Cardinality means how the entities are arranged to each other or what is the
relationship structure between entities in a relationship set. In a Database
Management System, Cardinality represents a number that denotes how many
times an entity is participating with another entity in a relationship set. The
Cardinality of DBMS is a very important attribute in representing the structure
of a Database. In a table, the number of rows or tuples represents the
Cardinality.

Cardinality Ratio
Cardinality ratio is also called Cardinality Mapping, which represents the
mapping of one entity set to another entity set in a relationship set. We
generally take the example of a binary relationship set where two entities are
mapped to each other. Cardinality is very important in the Database of various
businesses. For example, if we want to track the purchase history of each
customer then we can use the one-to-many cardinality to find the data of a
specific customer. The Cardinality model can be used in Databases by
Database Managers for a variety of purposes, but corporations often use it to
evaluate customer or inventory data.
here are four types of Cardinality Mapping in Database Management Systems:
1. One to one
2. Many to one
3. One to many
4. Many to many

One to One
One to one cardinality is represented by a 1:1 symbol. In this, there is at most
one relationship from one entity to another entity. There are a lot of examples
of one-to-one cardinality in real life databases.
For example, one student can have only one student id, and one student id can
belong to only one student. So, the relationship mapping between student and
student id will be one to one cardinality mapping.
Another example is the relationship between the director of the school and the
school because one school can have a maximum of one director, and one
director can belong to only one school.

Many to One Cardinality:


In many to one cardinality mapping, from set 1, there can be multiple sets that
can make relationships with a single entity of set 2. Or we can also describe it
as from set 2, and one entity can make a relationship with more than one entity
of set 1.
One to one Cardinality is the subset of Many to one Cardinality. It can be
represented by M:1.
For example, there are multiple patients in a hospital who are served by a
single doctor, so the relationship between patients and doctors can be
represented by Many to one Cardinality.

One to Many Cardinalities:


In One-to-many cardinality mapping, from set 1, there can be a maximum
single set that can make relationships with a single or more than one entity of
set 2. Or we can also describe it as from set 2, more than one entity can make
a relationship with only one entity of set 1.
One to one cardinality is the subset of One-to-many Cardinality. It can be
represented by 1: M.
For Example, in a hospital, there can be various compounders, so the
relationship between the hospital and compounders can be mapped through
One-to-many Cardinality.

Many to Many Cardinalities:


In many, many cardinalities mapping, there can be one or more than one entity
that can associate with one or more than one entity of set 2. In the same way
from the end of set 2, one or more than one entity can make a relation with one
or more than one entity of set 1.
It is represented by M: N or N: M.
One to one cardinality, One to many cardinalities, and Many to one cardinality
is the subset of the many to many cardinalities.
For Example, in a college, multiple students can work on a single project, and
a single student can also work on multiple projects. So, the relationship
between the project and the student can be represented by many to many
cardinalities.

Appropriate Mapping Cardinality


Evidently, the real-world context in which the relation set is modeled
determines the Appropriate Mapping Cardinality for a specific
relation set.
o We can combine relational tables with many involved tables if

the Cardinality is one-to-many or many-to-one.


o One entity can be combined with a relation table if it has a one-

to-one relationship and total participation, and two entities can


be combined with their relation to form a single table if both of
them have total participation.
o We cannot mix any two tables if the Cardinality is many-to-

many.

Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the
table. It is also used to establish and identify relationships
between tables.

For example, ID is used as a key in the Student table because it is


unique for each student. In the PERSON table, passport_number,
license_number, SSN are keys since they are unique for each person.

Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of

an entity uniquely. An entity can contain multiple keys, as we


saw in the PERSON table. The key which is most suitable from
those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is

unique for each employee. In the EMPLOYEE table, we can


even select License_Number and Passport_Number as primary
keys since they are also unique.
o For each entity, the primary key selection is based on

requirements and developers.

2. Candidate key
o A candidate key is an attribute or set of attributes that can

uniquely identify a tuple.


o Except for the primary key, the remaining attributes are

considered a candidate key. The candidate keys are as strong as


the primary key.
For example: In the EMPLOYEE table, id is best suited for the
primary key. The rest of the attributes, like SSN, Passport_Number,
License_Number, etc., are considered a candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super
key is a superset of a candidate key.

For example: In the above EMPLOYEE table, for(EMPLOEE_ID,


EMPLOYEE_NAME), the name of two employees can be the same,
but their EMPLYEE_ID can't be the same. Hence, this combination
can also be a key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID,
EMPLOYEE-NAME), etc.
4. Foreign key
o Foreign keys are the column of the table used to point to the

primary key of another table.


o Every employee works in a specific department in a company,

and employee and department are two different entities. So we


can't store the department's information in the employee table.
That's why we link these two tables through the primary key of
one table.
o We add the primary key of the DEPARTMENT table,

Department_Id, as a new attribute in the EMPLOYEE table.


o In the EMPLOYEE table, Department_Id is the foreign key, and

both the tables are related.


5. Alternate key
There may be one or more attributes or a combination of attributes
that uniquely identify each tuple in a relation. These attributes or
combinations of the attributes are called the candidate keys. One key
is chosen as the primary key from these candidate keys, and the
remaining candidate key, if it exists, is termed the alternate key. In
other words, the total number of the alternate keys is the total
number of candidate keys minus the primary key. The alternate key
may or may not exist. If there is only one candidate key in a relation,
it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and
PAN_No, that act as candidate keys. In this relation, Employee_Id is
chosen as the primary key, so the other candidate key, PAN_No, acts
as the Alternate key.

6. Composite key
Whenever a primary key consists of more than one attribute, it is
known as a composite key. This key is also known as Concatenated
Key.
For example, in employee relations, we assume that an employee
may be assigned multiple roles, and an employee may work on
multiple projects simultaneously. So the primary key will be
composed of all three attributes, namely Emp_ID, Emp_role, and
Proj_ID in combination. So these attributes act as a composite key
since the primary key comprises more than one attribute.

7. Artificial key
The key created using arbitrarily assigned data are known as artificial
keys. These keys are created when a primary key is large and complex
and has no relationship with many other relations. The data values of
the artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID,
Emp_role, and Proj_ID, is large in employee relations. So it would be
better to add a new virtual attribute to identify each tuple in the
relation uniquely.

What is Strong Entity?


A strong entity is not dependent on any other entity in the schema. A strong
entity will always have a primary key. Strong entities are represented by a
single rectangle. The relationship of two strong entities is represented by a
single diamond. Various strong entities, when combined together, create a
strong entity set.

What is Weak Entity?


A weak entity is dependent on a strong entity to ensure its existence. Unlike a
strong entity, a weak entity does not have any primary key. It instead has a
partial discriminator key. A weak entity is represented by a double
rectangle. The relation between one strong and one weak entity is represented
by a double diamond. This relationship is also known as identifying
relationship.

Difference between Strong and Weak Entity


Strong Entity Weak Entity

Strong entity always has a While a weak entity has a partial


primary key. discriminator key.

Strong entity is not dependent on


Weak entity depends on strong entity.
any other entity.

Strong entity is represented by a Weak entity is represented by a double


single rectangle. rectangle.

While the relation between one strong and


Two strong entity’s relationship is
one weak entity is represented by a double
represented by a single diamond.
diamond.

Strong entities have either total


A weak entity has a total participation
participation or partial
constraint.
participation.

Types of Attributes

1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called
the key attribute. For example, Roll_No will be unique for each student. In ER
diagram, the key attribute is represented by an oval with underlying lines.

Key Attribute

2. Composite Attribute
An attribute composed of many other attributes is called a composite
attribute. For example, the Address attribute of the student Entity type consists
of Street, City, State, and Country. In ER diagram, the composite attribute is
represented by an oval comprising of ovals.
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example,
Phone_No (can be more than one for a given student). In ER diagram, a
multivalued attribute is represented by a double oval.

4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known
as a derived attribute. e.g.; Age (can be derived from DOB). In ER diagram,
the derived attribute is represented by a dashed oval.

The Complete Entity Type Student with its Attributes can be represented as:
ER Design Issues
1) Use of Entity Set vs Attributes
The use of an entity set or attribute depends on the structure of the real-
world enterprise that is being modeled and the semantics associated with
its attributes. It leads to a mistake when the user use the primary key of an
entity set as an attribute of another entity set. Instead, he should use the
relationship to do so. Also, the primary key attributes are implicit in the
relationship set, but we designate it in the relationship sets.

2) Use of Entity Set vs. Relationship Sets


It is difficult to examine if an object can be best expressed by an entity
set or relationship set. To understand and determine the right use, the
user need to designate a relationship set for describing an action that
occurs in-between the entities. If there is a requirement of representing
the object as a relationship set, then its better not to mix it with the entity
set.
3) Use of Binary vs n-ary Relationship Sets
Generally, the relationships described in the databases are binary
relationships. However, non-binary relationships can be represented by
several binary relationships. For example, we can create and represent a
ternary relationship 'parent' that may relate to a child, his father, as well
as his mother. Such relationship can also be represented by two binary
relationships i.e, mother and father, that may relate to their child. Thus,
it is possible to represent a non-binary relationship by a set of distinct
binary relationships.
4) Placing Relationship Attributes
The cardinality ratios can become an affective measure in the placement
of the relationship attributes. So, it is better to associate the attributes of
one-to-one or one-to-many relationship sets with any participating entity
sets, instead of any relationship set. The decision of placing the specified
attribute as a relationship or entity attribute should possess the
charactestics of the real world enterprise that is being modeled.

For example, if there is an entity which can be determined by the


combination of participating entity sets, instead of determine it as a
separate entity. Such type of attribute must be associated with the many-
to-many relationship sets.
Extended E-R Features:
Generalization
Generalization is the process of extracting common properties from a set of
entities and creating a generalized entity from it. It is a bottom-up approach in
which two or more entities can be generalized to a higher-level entity if they
have some attributes in common. For Example, STUDENT and FACULTY
can be generalized to a higher-level entity called PERSON as shown in Figure
1. In this case, common attributes like P_NAME, and P_ADD become part of
a higher entity (PERSON), and specialized attributes like S_FEE become part
of a specialized entity (STUDENT).
Generalization is also called as ‘Bottom-up approach”.
Specialization
In specialization, an entity is divided into sub-entities based on its
characteristics. It is a top-down approach where the higher-level
entity is specialized into two or more lower-level entities. For
Example, an EMPLOYEE entity in an Employee management system
can be specialized into DEVELOPER, TESTER, etc. as shown in
Figure 2. In this case, common attributes like E_NAME, E_SAL, etc.
become part of a higher entity (EMPLOYEE), and specialized
attributes like TES_TYPE become part of a specialized entity
(TESTER).
Specialization is also called as “Top-Down approach”.
Inheritance: It is an important feature of generalization and
specialization
● Attribute inheritance: allows lower level entities to inherit the
attributes of higher level entities and vice versa.
● in diagram: Car entity is an inheritance of Vehicle entity ,So Car
can acquire attributes of Vehicle example: car can
acquire Model attribute of Vehicle.
● Participation inheritance: In participation inheritance,
relationships involving higher level entity set also inherited by
lower level entity and vice versa.
● in diagram: Vehicle entity has an relationship with Cycle entity
,So Cycle entity can acquire attributes of lower level entities
i.e Car and Bus since it is inheritance of Vehicle.

Aggregation
An ER diagram is not capable of representing the relationship
between an entity and a relationship which may be required in some
scenarios. In those cases, a relationship with its corresponding entities
is aggregated into a higher-level entity. Aggregation is an abstraction
through which we can represent relationships as higher-level entity
sets.
For Example, an Employee working on a project may require some
machinery. So, REQUIRE relationship is needed between the
relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities
EMPLOYEE and PROJECT is aggregated into a single entity and
relationship REQUIRES is created between the aggregated entity and
MACHINERY.
How to Convert ER Diagrams to Tables in DBMS
Step 1: Analyze the Entities
● Identify each distinct entity in the diagram represented by a rectangle.
● Examine the attributes associated with each entity which are its
characteristics represented by ellipses within the rectangle.
Step 2: Create Separate Tables for Each Entity
● For each entity, create a corresponding table in the database with
each attribute as a column in the table.
● Assign the primary key as a column in the respective table.
Step 3: Map Relationships
● For one-to-many relationships, add a foreign key in the “many” table
referencing the “one” table’s primary key.
● For many-to-many, create a linking table with foreign keys to both
related tables.
Step 4: Normalize the Tables
● Analyze the tables for redundancies and potential data inconsistencies.
● Apply normalization techniques like decomposition to split tables and
eliminate redundancy, improving data integrity and reducing storage
space.
Step 5: Verify and Refine
● Review the created tables and relationships for accuracy and
efficiency.
● Refine the schema as needed based on further analysis or specific
requirements of your database.
o Entity type becomes a table.
In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE
forms individual tables.

o All single-valued attribute becomes a column for the table.


In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the
column of STUDENT table. Similarly, COURSE_NAME and COURSE_ID
form the column of COURSE table and so on.

o A key attribute of the entity type represented by the primary key.


In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and
LECTURE_ID are the key attribute of the entity.

o The multi-valued attribute is represented by a separate table.


In the student table, a hobby is a multi-valued attribute. So it is not possible to
represent multiple values in a single column of STUDENT table. Hence we
create a table STUD_HOBBY with column name STUDENT_ID and
HOBBY. Using both the column, we create a composite key.

o Composite attribute represented by components.


In the given ER diagram, student address is a composite attribute. It contains
CITY, PIN, DOOR#, STREET, and STATE. In the STUDENT table, these
attributes can merge as an individual column.
o Derived attributes are not considered in the table.
In the STUDENT table, Age is the derived attribute. It can be calculated at any
point of time by calculating the difference between current date and Date of
Birth.

Entity in DBMS
An entity is referred to as an object or thing that exists in the real world. For
example, customer, car, pen, etc.

Entities are stored in the database, and they should be distinguishable, i.e., they
should be easily identifiable from the group. For example, a group of pens that
are from the same company cannot be identified, so they are only objects, but
pens with different colors become unique and will be called an entity like a red
pen, green pen, blue pen, black pen, etc.

In a group of pens, we can easily identify any pen because of its different
colors, so a pen of different colors is an entity.

For extracting data from the database, each data must be unique in its own way
so that it becomes easier to differentiate between them. Distinct and unique
data is known as an entity.
An entity has some attributes which depict the entity's characteristics. For
example, an entity "Student" has attributes such as "Student_roll_no",
"Student_name", "Student_subject", and "Student_marks".

Example of Entity in DBMS in tabular form:


Student_rolln Student_nam Student_subje Student_mark
o e ct s

1 Robert English 85

2 Parker Mathematics 75

3 Harry Science 80

4 George Geography 70

Some entities are related to other entities in the table. For example, the
"Student" entity is related to the "University" entity. The ERD (Entity
Relationship Diagram) model comes to light to visually show the relationship
between several entities.

Kinds of Entity:

There are two kinds of entities, which are as follows:

1. Tangible Entity:

It is an entity in DBMS, which is a physical object that we can touch or see. In


simple words, an entity that has a physical existence in the real world is called
a tangible entity.

For example, in a database, a table represents a tangible entity because it


contains a physical object that we can see and touch in the real world. It
includes colleges, bank lockers, mobiles, cars, watches, pens, paintings, etc.

2. Intangible Entity:

It is an entity in DBMS, which is a non-physical object that we cannot see or


touch. In simple words, an entity that does not have any physical existence in
the real world is known as an intangible entity.

For example, a bank account logically exists, but we cannot see or touch it.
Entity Type:

A collection of entities with general characteristics is known as an entity type.

For example, a database of a corporate company has entity types such as


employees, departments, etc. In DBMS, every entity type contains a set of
attributes that explain the entity.

The Employee entity type can have attributes such as name, age, address,
phone number, and salary.

The Department entity type can have attributes such as name, number, and
location in the department.

Kinds of Entity Type


There are two kinds of entity type, which are as follows:

1. Strong Entity Type: It is an entity that has its own existence and is
independent.

The entity relationship diagram represents a strong entity type with the help of
a single rectangle. Below is the ERD of the strong entity type:

In the above example, the "Customer" is the entity type with attributes such
as ID, Name, Gender, and Phone Number. Customer is a strong entity type as
it has a unique ID for each customer.

2. Weak Entity Type: It is an entity that does not have its own existence and
relies on a strong entity for its existence.
The Entity Relationship Diagram represents the weak entity type using
double rectangles. Below is the ERD of the weak entity type:

In the above example, "Address" is a weak entity type with attributes such as
House No., City, Location, and State.

The relationship between a strong and a weak entity type is known as an


identifying relationship.

Using a double diamond, the Entity-Relationship Diagram represents a


relationship between the strong and the weak entity type.

Let us see an example of the relationship between the Strong entity type and
weak entity type with the help of ER Diagram:

Entity Set

An entity set is a group of entities of the same entity type.

For example, an entity set of students, an entity set of motorbikes, an entity of


smartphones, an entity of customers, etc.

Entity sets can be classified into two types:

1. Strong Entity Set:


In a DBMS, a strong entity set consists of a primary key.

For example, an entity of motorbikes with the attributes, motorbike's


registration number, motorbike's name, motorbike's model, and motorbike's
colour.

Below is the representation of a strong entity set in tabular form:

Example
of Entity

Relationship Diagram representation of the above strong entity set:

2. Weak Entity Set:

In a DBMS, a weak entity set does not contain a primary key.


For example, An entity of smartphones with its attributes, phone's name,
phone's colour, and phone's RAM.

Below is the representation of a weak entity set in tabular form:

Example of Entity Relationship Diagram representation of the above weak


entity set:

How to Draw an Entity Relation Diagram (ERD)

A step-by-step process to draw an entity relation diagram (ERD) is:

Step 1: Identifying Entities

Determine the main objects you want to represent in the database. Eg, “students”,
“courses”, or “products”.

Step 2: Defining Attributes


Identify the properties(attributes) of properties of each entity. These attributes provide
more details about an entity.

Step 3: Specifing Relationships

Create relationships between entities to specify how entities interact with each other.
Relationships are verbs like “teaches”, “studies”, or “sells”.

Step 4: Drawing Entities

Draw entities as rectangle and write the name.

Step 5: Adding Attributes

To add attributes of a entitity write attributes inside the rectangle or connect them with
lines.

Step 6: Connecting Entities

Draw lines between the related entities to represent their connection.

Step 7: Specifying Cardinality

Indicate the minimum and maximum number of relationship instances associated with an
entity using notations like crow’s foot.

Step 8: Organizing ER Diagram

Organize all entities and relationships in a clean way for better readibility and
understanding.
UNIT THREE

What is the Relational Model?


The relational model represents how data is stored in Relational Databases.
A relational database consists of a collection of tables, each of which is assigned a
unique name. Consider a relation STUDENT with attributes ROLL_NO, NAME,
ADDRESS, PHONE, and AGE shown in the table.
Table Student
ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

2 RAMESH GURGAON 9652431543 18

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 18

Important Terminologies
• Attribute: Attributes are the properties that define an entity.
e.g.; ROLL_NO, NAME, ADDRESS
• Relation Schema: A relation schema defines the structure of the relation
and represents the name of the relation with its attributes. e.g.; STUDENT
(ROLL_NO, NAME, ADDRESS, PHONE, and AGE) is the relation schema
for STUDENT. If a schema has more than 1 relation, it is called Relational
Schema.
• Tuple: Each row in the relation is known as a tuple. The above relation
contains 4 tuples, one of which is shown as:

1 RAM DELHI 9455123451 18


• Relation Instance: The set of tuples of a relation at a particular instance
of time is called a relation instance. Table 1 shows the relation instance of
STUDENT at a particular time. It can change whenever there is an insertion,
deletion, or update in the database.
• Degree: The number of attributes in the relation is known as the degree of
the relation. The STUDENT relation defined above has degree 5.
• Cardinality: The number of tuples in a relation is known as cardinality.
The STUDENT relation defined above has cardinality 4.
• Column: The column represents the set of values for a particular attribute.
The column ROLL_NO is extracted from the relation STUDENT.
ROLL_NO

• NULL Values: The value which is not known or unavailable is called a


NULL value. It is represented by blank space. e.g.; PHONE of STUDENT
having ROLL_NO 4 is NULL.
• Relation Key: These are basically the keys that are used to identify the
rows uniquely or also help in identifying tables. These are of the following
types.
o Primary Key
o Candidate Key
o Super Key
o Foreign Key
o Alternate Key
o Composite Key

Constraints in Relational Model


While designing the Relational Model, we define some conditions which must hold
for data present in the database are called Constraints. These constraints are
checked before performing any operation (insertion, deletion, and updation ) in the
database. If there is a violation of any of the constraints, the operation will fail.
Domain Constraints
These are attribute-level constraints. An attribute can only take values that lie
inside the domain range. e.g.; If a constraint AGE>0 is applied to STUDENT
relation, inserting a negative value of AGE will result in failure.
Key Integrity
Every relation in the database should have at least one set of attributes that defines
a tuple uniquely. Those set of attributes is called keys. e.g.; ROLL_NO in
STUDENT is key. No two students can have the same roll number. So a key has
two properties:
• It should be unique for all tuples.
• It can’t have NULL values.
Referential Integrity
When one attribute of a relation can only take values from another attribute of the
same relation or any other relation, it is called referential integrity. Let us suppose
we have 2 relations
Table Student
ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE

1 RAM DELHI 9455123451 18 CS

2 RAMESH GURGAON 9652431543 18 CS

3 SUJIT ROHTAK 9156253131 20 ECE

4 SURESH DELHI 18 IT

Table Branch
BRANCH_CODE BRANCH_NAME

CS COMPUTER SCIENCE

IT INFORMATION TECHNOLOGY

ELECTRONICS AND
ECE COMMUNICATION
ENGINEERING

CV CIVIL ENGINEERING

BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint. The
relation which is referencing another relation is called REFERENCING
RELATION (STUDENT in this case) and the relation to which other relations
refer is called REFERENCED RELATION (BRANCH in this case).
Advantages of the Relational Model
• Simple model: Relational Model is simple and easy to use in
comparison to other languages.
• Flexible: Relational Model is more flexible than any other
relational model present.
• Secure: Relational Model is more secure than any other relational
model.
• Data Accuracy: Data is more accurate in the relational data
model.
• Data Integrity: The integrity of the data is maintained in the
relational model.
• Operations can be Applied Easily: It is better to perform
operations in the relational model.
Disadvantages of the Relational Model
• Relational Database Model is not very good for large databases.
• Sometimes, it becomes difficult to find the relation between tables.
• Because of the complex structure, the response time for queries is
high.
Characteristics of the Relational Model
• Data is represented in rows and columns called relations.
• Data is stored in tables having relationships between them called
the Relational model.
• The relational model supports the operations like Data definition,
Data manipulation, and Transaction management.
• Each column has a distinct name and they are representing
attributes.
• Each row represents a single entity.

Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to
obtain the result of the query. It uses operators to perform queries.
Types of Relational operation

1. Select Operation:

o The select operation selects tuples that satisfy a given predicate.


o It is denoted by sigma (σ).
Notation: σ p(r)
Where:

σ is used for selection prediction


r is used for relation
p is used as a propositional logic formula which may use connectors
like: AND OR and NOT. These relational can use as relational operators
like =, ≠, ≥, <, >, ≤.
For example: LOAN Relation
BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:

1. σ BRANCH_NAME="perryride" (LOAN)
Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

2. Project Operation:

o This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
o It is denoted by ∏.

Notation: ∏ A1, A2, An (r)


Where

A1, A2, A3 is used as an attribute name of relation r.

Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:

1. ∏ NAME, CITY (CUSTOMER)


Output:
NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

3. Union Operation:

o Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
Notation: R ∪ S
A union operation must hold the following condition:

o R and S must have the attribute of the same number.


o Duplicate tuples are eliminated automatically.

Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION
CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)


Output:
CUSTOMER_NAME

Johnson

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Mayes

4. Set Intersection:

o Suppose there are two tuples R and S. The set intersection operation contains
all tuples that are in both R & S.
o It is denoted by intersection ∩.

1. Notation: R ∩ S
Example: Using the above DEPOSITOR table and BORROW table
Input:

1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)


Output:

CUSTOMER_NAME

Smith

Jones

5. Set Difference:

o Suppose there are two relations R and S. The set intersection operation
contains all tuples that are in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S
Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)


Output:
CUSTOMER_NAME

Jackson

Hayes

Willians

Curry

6. Cartesian product

o The Cartesian product is used to combine each row in one table with each row
in the other table. It is also known as a cross product.
o It is denoted by X.
1. Notation: E X D
Example:
EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B

DEPARTMENT
DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal

Input:

1. EMPLOYEE X DEPARTMENT
Output:
EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to


STUDENT1.

1. ρ(STUDENT1, STUDENT)
SQL Set Operation
The SQL Set operation is used to combine the two or more SQL SELECT
statements.
Types of Set Operation
1. Union
2. UnionAll
3. Intersect
4. Minus

1. Union

o The SQL Union operation is used to combine the result of two or more SQL
SELECT queries.
o In the union operation, all the number of datatype and columns must be same
in both the tables on which UNION operation is being applied.
o The union operation eliminates the duplicate rows from its resultset.
Syntax

SELECT column_name FROM table1 UNION SELECT column_name FROM table2


;

2. Union All
Union All operation is equal to the Union operation. It returns the set without
removing duplication and sorting the data.

Syntax:

SELECT column_name FROM table1 UNION ALL SELECT column_name FROM


table2;

3. Intersect

o It is used to combine two SELECT statements. The Intersect operation returns


the common rows from both the SELECT statements.
o In the Intersect operation, the number of datatype and columns must be the
same.
o It has no duplicates and it arranges the data in ascending order by default.
Syntax

SELECT column_name FROM table1 INTERSECT SELECT column_name FR


OM table2;

4. Minus

o It combines the result of two SELECT statements. Minus operator is used to


display the rows which are present in the first query but absent in the second
query.
o It has no duplicates and data arranged in ascending order by default.
Syntax:

SELECT column_name FROM table1 MINUS SELECT column_name FROM table


2;

SQL Natural Join



Natural join is an SQL join operation that creates a join on the base of the common
columns in the tables. To perform natural join there must be one common attribute
(Column) between two tables. Natural join will retrieve from multiple relations. It
works in three steps.
In this article, we will discuss the overview of SQL Natural Join and then mainly
focus to implement queries with the help of examples.
Syntax:
We will perform the natural join query by using the following syntax.
SELECT *
FROM TABLE1
NATURAL JOIN TABLE2;
Features of Natural Join
Here, we will discuss the features of natural join.
1. It will perform the Cartesian product.
2. It finds consistent tuple and deletes inconsistent tuple.
3. Then it deletes the duplicate attributes.

Difference between Natural Join and Inner Join


Natural Join joins two tables based on the same attribute name and datatypes. The
resulting table will contain all the attributes of both the table but keep only one copy
of each common column while Inner Join joins two tables on the basis of the column
which is explicitly specified in the ON clause. The resulting table will contain all the
attributes from both tables including the common column also.

In a relational DBMS, we follow the principles of normalization that allows us to


minimize the large tables into small tables. By using a select statement in Joins, we
can retrieve the big table back. Outer joins are of following three types.
1. Left outer join
2. Right outer join
3. Full outer join

Types of outer join :


1.Left Outer Join : The left join operation returns all record from left table and
matching records from the right table. On a matching element not found in right
table, NULL is represented in that case.

Syntax :
SELECT column_name(s)
FROM table1
LEFT JOIN Table2
ON Table1.Column_Name=table2.column_name;
2. Right Outer Join : The right join operation returns all record from right table
and matching records from the left table. On a matching element not found in left
table, NULL is represented in that case.
Syntax :
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;
3. Full Outer Join : The full outer Join keyword returns all records when there is a
match in left or right table records.

Syntax:
SELECT column_name
FROM table1
FULL OUTER JOIN table2
ON table1.columnName = table2.columnName
WHERE condition;

SQL | DIVISION
Division in SQL is typically required when you want to find out entities that are
interacting with all entities of a set of different types of entities. The division
operator is used when we have to evaluate queries that contain the keyword ‘all’.
When to Use the Division Operator
You typically require the division operator in scenarios where you need to ascertain
entities that meet all specified conditions across different datasets. Here are a few
common examples:
● Identifying people who have accounts in every bank within a particular city.
● Determining students who have enrolled in all necessary courses to qualify for
graduation.
These use cases involve checking against all elements of a subset, thereby
necessitating a division-like operation.
In all these queries, the description after the keyword ‘all’ defines a set which
contains some elements and the final result contains those units who satisfy these
requirements.

SQL DIVISION Implementation Techniques

Given two relations(tables): R(x,y), S(y).


● R and S: tables
● x and y: column of R
● y: column of S
R(x,y) div S(y) means gives all distinct values of x from R that are associated with
all values of y in S.
Computation of Division: R(x,y) div S(y)
Method 1: Using Cross Join and EXCEPT
Steps:
● Step 1: Find out all possible combinations of S(y) with R(x) by computing R(x)
x(cross join) S(y), say r1
● Step 2: Subtract actual R(x,y) from r1, say r2
● Step 3: x in r2 are those that are not associated with every value in S(y);
therefore R(x)-r2(x) gives us x that are associated with all values in S.
Query:
SELECT * FROM R
WHERE x NOT IN (
SELECT x FROM (
(SELECT x , y FROM (SELECT y FROM S ) AS p CROSS
JOIN (SELECT DISTINCT x FROM R) AS sp)
EXCEPT
(SELECT x , y FROM R)
)
AS r
);
Method 2: Using Correlated Subquery and NOT EXISTS
SELECT * FROM R AS sx
WHERE NOT EXISTS (
(SELECT p.y FROM S AS p )
EXCEPT
(SELECT sp.y FROM R AS sp WHERE sp.x = sx.x )
);
Extended Operators in Relational Algebra
Last Updated : 05 Jul, 2024



Basic idea about relational model and basic operators in Relational


Algebra: Relational Model Basic Operators in Relational Algebra Extended
operators are those operators which can be derived from basic operators. There are
mainly three types of extended operators in Relational Algebra:
● Join
● Intersection
● Divide

Intersection (∩): Intersection on two relations R1 and R2 can only be computed if


R1 and R2 are union compatible (These two relation should have same number of
attributes and corresponding attributes in two relations have same domain).
Intersection operator when applied on two relations as R1∩R2 will give a relation
with tuples which are in R1 as well as R2. Syntax:
Relation1 ∩ Relation2
Example: Find a person who is student as well as employee- STUDENT ∩
EMPLOYEE

Conditional Join(⋈c): Conditional Join is used when you want to join two or
more relation based on some conditions. Example: Select students whose
ROLL_NO is greater than EMP_NO of employees
STUDENT⋈c STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
Equijoin(⋈): Equijoin is a special case of conditional join where only equality
condition holds between a pair of attributes. As values of two attributes will be
equal in result of equijoin, only one attribute will be appeared in result. Example:
Select students whose ROLL_NO is equal to EMP_NO of employees.
STUDENT⋈STUDENT.ROLL_NO=EMPLOYEE.EMP_NOEMPLOYEE

Natural Join(⋈): It is a special case of equijoin in which equality condition hold


on all attributes which have same name in relations R and S (relations on which
join operation is applied). While applying natural join on two relations, there is no
need to write equality condition explicitly. Natural Join will also return the similar
attributes only once as their value will be same in resulting relation. Example:
Select students whose ROLL_NO is equal to ROLL_NO of STUDENT_SPORTS
as:

Left Outer Join(⟕): When applying join on two relations R and S, some tuples of
R or S does not appear in result set which does not satisfy the join conditions. But
Left Outer Joins gives all tuples of R in the result set. The tuples of R which do not
satisfy join condition will have values as NULL for attributes of S. Example:Select
students whose ROLL_NO is greater than EMP_NO of employees and details of
other students as well
STUDENT⟕STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
Right Outer Join(⟖): When applying join on two relations R and S, some tuples
of R or S does not appear in result set which does not satisfy the join conditions.
But Right Outer Joins gives all tuples of S in the result set. The tuples of S which
do not satisfy join condition will have values as NULL for attributes of R.
Example: Select students whose ROLL_NO is greater than EMP_NO of
employees and details of other Employees as well
STUDENT⟖STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
Full Outer Join(⟗): When applying join on two relations R and S, some tuples of
R or S does not appear in result set which does not satisfy the join conditions. But
Full Outer Joins gives all tuples of S and all tuples of R in the result set. The tuples
of S which do not satisfy join condition will have values as NULL for attributes of
R and vice versa. Example: Select students whose ROLL_NO is greater than
EMP_NO of employees and details of other Employees as well and other Students
as well
UNIT FOUR

Functional Dependency
If the information stored in a table can uniquely determine another
information in the same table, then it is called Functional Dependency. Consider
it as an association between two attributes of the same relation.
If P functionally determines Q, then
P -> Q
Let us see an example −
<Employee>
EmpID EmpName EmpAge
E01 Amit 28
E02 Rohit 31
In the above table, EmpName is functionally dependent
on EmpID because EmpName can take only one value for the given value
of EmpID:
EmpID -> EmpName
The same is displayed below −

Fully-functionally Dependency
An attribute is fully functional dependent on another attribute, if it is
Functionally Dependent on that attribute and not on any of its proper subset.
For example, an attribute Q is fully functional dependent on another attribute P,
if it is Functionally Dependent on P and not on any of the proper subset of P.
Let us see an example −
<ProjectCost>
ProjectID ProjectCost
001 1000
002 5000
<EmployeeProject>
EmpID ProjectID Days (spent on the
project)
E099 001 320
E056 002 190
The above relations states:
EmpID, ProjectID, ProjectCost -> Days
However, it is not fully functional dependent.
Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent
on the project by the employee.
This summarizes and gives our fully functional dependency −
{EmpID, ProjectID} -> (Days)

Transitive Dependency
When an indirect relationship causes functional dependency it is
called Transitive Dependency.
If P -> Q and Q -> R is true, then P-> R is a transitive dependency.
Multivalued Dependency
When existence of one or more rows in a table implies one or more other
rows in the same table, then the Multi-valued dependencies occur.
If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.
It is represented by double arrow −
->->
For our example:
P->->QQ->->R
In the above case, Multivalued Dependency exists only if Q and R are
independent attributes.

Partial Dependency
Partial Dependency occurs when a nonprime attribute is functionally
dependent on part of a candidate key. The 2nd Normal Form (2NF) eliminates
the Partial Dependency. Let us see an example −
<StudentProject>
StudentID ProjectNo StudentName ProjectName
S01 199 Katie Geo Location
S02 120 Ollie Cluster
Exploration
In the above table, we have partial dependency; let us see how −
The prime key attributes are StudentID and ProjectNo.
As stated, the non-prime attributes i.e. StudentName and ProjectName should
be functionally dependent on part of a candidate key, to be Partial Dependent.
The StudentName can be determined by StudentID that makes the relation
Partial Dependent.
The ProjectName can be determined by ProjectID, which that the relation
Partial Dependent.

Normalization

A large database defined as a single relation may result in data duplication. This
repetition of data may result in:
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many
records in relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.

What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like
Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?


The main reason for normalizing the relations is removing these anomalies.
Failure to eliminate anomalies leads to data redundancy and can cause data
integrity and other problems as the database grows. Normalization consists of a
series of guidelines that helps to guide you in creating a good database structure.

Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal
forms apply to individual relations. The relation is said to be in particular normal
form if it satisfies constraints.
Following are the various types of Normal forms:

Normal Form Description

1NF A relation is in 1NF if it contains an atomic value.

A relation will be in 2NF if it is in 1NF and all non-


2NF key attributes are fully functional dependent on the
primary key.

A relation will be in 3NF if it is in 2NF and no


3NF
transition dependency exists.

A stronger definition of 3NF is known as Boyce


BCNF
Codd's normal form.

A relation will be in 4NF if it is in Boyce Codd's


4NF
normal form and has no multi-valued dependency.

A relation is in 5NF. If it is in 4NF and does not


5NF contain any join dependency, joining should be
lossless.

Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher
normal forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher
degree.
o Careless decomposition may lead to a bad database design, leading to
serious problems.

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold
only single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute,
and their combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

7272826385,
14 John UP
9064738238

20 Harry 8574783832 Bihar

7390372389,
12 Sam Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab


Second Normal Form (2NF)
o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional
dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects
they teach. In a school, a teacher can teach more than one subject.
TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on
TEACHER_ID which is a proper subset of a candidate key. That's why it violates
the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE

25 30

47 35

83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer
Third Normal Form (3NF)
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the
data integrity.
o If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions
for every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal


Super key in the table above:
1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME,
EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID
are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and
EMP_ZIP dependent on EMP_ID. The non-prime attributes
(EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389


666 John 462007
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Boyce Codd normal form (BCNF)


o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super
key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than
one department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549


In the above table Functional dependencies are as follows:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are
keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY

264 India

264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549


EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi-valued dependency.
o For a dependency A → B, if for a single value of A, multiple values of B
exists, then the relation will be a multi-valued dependency.
Example
STUDENT
STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and
HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there
is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition
of data.
So to make the above table into 4NF, we can decompose it into two tables:
Advertisement
STUDENT_COURSE
STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics
STUDENT_HOBBY
STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth normal form (5NF)


o A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1


In the above table, John takes both Computer and Math class for Semester 1 but
he doesn't take Math class for Semester 2. In this case, combination of all these
fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject
and who will be taking that subject so we leave Lecturer and Subject as NULL.
But all three columns together acts as a primary key, so we can't leave other two
columns blank.
So to make the above table into 5NF, we can decompose it into three relations
P1, P2 & P3:
P1
SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math
P2
SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen
P3
SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

You might also like