DBMS Notes
DBMS Notes
Introduction to DBMS:
A Database Management System (DBMS) is a software system that is designed to manage and
organize data in a structured manner. It allows users to create, modify, and query a database, as
well as manage the security and access controls for that database.
DBMS provides an environment to store and retrieve the data in convenient and efficient
manner.
Key Features of DBMS:
● Data modeling: A DBMS provides tools for creating and modifying data models, which
define the structure and relationships of the data in a database.
● Data storage and retrieval: A DBMS is responsible for storing and retrieving data from
the database, and can provide various methods for searching and querying the data.
● Concurrency control: A DBMS provides mechanisms for controlling concurrent access to
the database, to ensure that multiple users can access the data without conflicting with each
other.
● Data integrity and security: A DBMS provides tools for enforcing data integrity and
security constraints, such as constraints on the values of data and access controls that
restrict who can access the data.
● Backup and recovery: A DBMS provides mechanisms for backing up and recovering the
data in the event of a system failure.
● DBMS can be classified into two types: Relational Database Management System
(RDBMS) and Non-Relational Database Management System (NoSQL or Non-SQL)
● RDBMS: Data is organized in the form of tables and each table has a set of rows and
columns. The data are related to each other through primary and foreign keys.
● NoSQL: Data is organized in the form of key-value pairs, documents, graphs, or column-
based. These are designed to handle large-scale, high-performance scenarios.
A database is a collection of interrelated data which helps in the efficient retrieval, insertion, and
deletion of data from the database and organizes the data in the form of tables, views, schemas,
reports, etc. For Example, a university database organizes the data about students, faculty, admin
staff, etc. which helps in the efficient retrieval, insertion, and deletion of data from it.
Definition of DBMS:
i. Database Management Systems (DBMS) are software systems used to store, retrieve, and run
queries on data. A DBMS serves as an interface between an end-user and a database,
allowing users to create, read, update, and delete data in the database.
ii. DBMS – is a computerized solution that helps store information in a manner that is easy to
read, edit, delete, and scale, with the primary objective of drawing correlations, powering
analysis, and supporting data-driven workflows. It has a backend storage system as well as a
front-end user interface.
File system is collection of data. In this system, user has to write procedures for
managing database. It provides details of data representation and storage of data. In this –
● Data is stored in files.
● Each file has specific format.
● Programs that use these files depend on knowledge about that format.
● In earlier days, database applications were built on top of file systems.
This approach is mostly obsolete but –
● Understanding problems inherent in file based systems may prevent us from repeating
these problems in our database system.
● Understanding how file system works is extremely useful when converting a file-based
system to a database system.
Basically, it is a collection of application programs that performs services for end users such
as production of reports. Each file defines and manages its own data.
It doesn’t have a crash mechanism i.e., if system crashes while entering some data, then
content of file will be lost. This is disadvantage of traditional file based system. Also, it is
very difficult to protect a file under the file system. This system can’t efficiently store and
retrieve data.
Advantages of Traditional File System :
● File processing cost less and can be more speed than database.
● File processing design approach was well suited to mainframe hardware and batch input.
● Companies mainly use file processing to handle large volumes of structured data on a
regular basis.
● It can be more efficient and cost less than DBMS in certain situations.
● Design is simple.
● Customization is easy and efficient.
Disadvantages of Traditional File System :
● Data redundancy and inconsistency.
● Difficulty in accessing data.
● Data isolation – multiple files and formats.
● Integrity problems
● Unauthorized access is not restricted.
● It co-ordinates only physical access.
Data Abstraction DBMS gives an abstract view of data The file system provides the detail
that hides the details. of the data representation and
storage of data.
Security and DBMS provides a good protection It isn't easy to protect a file under
Protection mechanism. the file system.
Recovery DBMS provides a crash recovery The file system doesn't have a
Mechanism mechanism, i.e., DBMS protects the crash mechanism, i.e., if the system
user from system failure. crashes while entering some data,
then the content of the file will be
lost.
Manipulation DBMS contains a wide variety of The file system can't efficiently
Techniques sophisticated techniques to store and store and retrieve the data.
retrieve the data.
Where to use Database approach used in large File system approach used in large
systems which interrelate many files. systems which interrelate many
files.
Data Redundancy Due to the centralization of the In this, the files and application
and Inconsistency database, the problems of data programs are created by different
redundancy and inconsistency are programmers so that there exists a
controlled. lot of duplication of data which
may lead to inconsistency.
Structure The database structure is complex to The file system approach has a
design. simple structure.
Data In this system, Data Independence In the File system approach, there
Independence exists, and it can be of two types. exists no Data Independence.
o Logical Data Independence
o Physical Data Independence
Integrity Integrity Constraints are easy to Integrity Constraints are difficult to
Constraints apply. implement in file system.
Data Models In the database approach, 3 types of In the file system approach, there is
data models exist: no concept of data models exists.
o Hierarchal data models
o Network data models
o Relational data models
Flexibility Changes are often a necessity to the The flexibility of the system is less
content of the data stored in any as compared to the DBMS
system, and these changes are more approach.
easily with a database approach.
Objectives Of DBMS
DBMS has many objectives to increase the efficiency and organized handling of data in an
organization. The main objectives of DBMS are –
Data Availability:
DBMS ensures that the format of data should be meaningful and at a reasonable cost so that
a wide range of users can access it.
● Physical Level: At the physical level, the information about the location of database
objects in the data store is kept. Various users of DBMS are unaware of the locations of
these objects. In simple terms, physical level of a database describes how the data is
being stored in secondary storage devices like disks and tapes and also gives insights on
additional storage details.
● Conceptual Level: At conceptual level, data is represented in the form of various
database tables. For Example, STUDENT database may contain STUDENT and
COURSE tables which will be visible to users but users are unaware of their storage.
Also referred as logical schema, it describes what kind of data is to be stored in the
database.
● External Level: An external level specifies a view of the data in terms of conceptual
level tables. Each external level view is used to cater to the needs of a particular
category of users. For Example, FACULTY of a university is interested in looking
course details of student, STUDENTS are interested in looking at all details related to
academics, accounts, courses and hostel details as well. So, different views can be
generated for different users. The main focus of external level is data abstraction.
Data Independence
Data independence means a change of data at one level should not affect another level. Two
types of data independence are present in this architecture:
● Physical Data Independence: Any change in the physical location of tables and indexes
should not affect the conceptual level or external view of data. This data independence is
easy to achieve and implemented by most of the DBMS.
● Conceptual Data Independence: The data at conceptual level schema and external
level schema must be independent. This means a change in conceptual schema should
not affect external schema. e.g.; Adding or deleting attributes of a table should not affect
the user’s view of the table. But this type of independence is difficult to achieve as
compared to physical data independence because the changes in conceptual schema are
reflected in the user’s view.
Component of DBMS:
Hardware, Software, Data, Database Access Language, Procedures and Users all together
form the components of a DBMS.
Let us discuss the components one by one clearly.
Hardware
The hardware is the actual computer system used for keeping and accessing the database.
The conventional DBMS hardware consists of secondary storage devices such as hard disks.
Databases run on the range of machines from micro computers to mainframes.
Software
Software is the actual DBMS between the physical database and the users of the system. All
the requests from the user for accessing the database are handled by DBMS.
Data
It is an important component of the database management system. The main task of DBMS
is to process the data. Databases are used to store the data, retrieved, and updated to and
from the databases.
Users
There are a number of users who can access or retrieve the data on demand using the
application and the interfaces provided by the DBMS.
The users of the database can be classified into different groups −
● Native Users
● Online Users
● Sophisticated Users
● Specialized Users
● Application Users
● DBA- Database Administrator
DBAs frequently design and develop new features and debug issues, while working
alongside a diverse group of individuals such as developers, data architects and business
stakeholders. Therefore, they're expected to be proficient in both technical and business
skills.
A DBA assumes many different roles and responsibilities within the IT department
involving database systems and applications.
● Documentation and training. The DBA must also establish policies and procedures
pertaining to the management, security, maintenance and use of the database
management system. The DBA group creates training materials and instructs
employees and developers on the proper access and usage of the DBMS.
● Troubleshooting. When problems arise, the DBA is the focal point for resolution,
including troubleshooting, root cause analysis, fine tuning and optimizing the
performance of tasks and programs that access the database. The DBA must be
capable of performing root cause analysis -- identifying the cause of the problem so it
can be resolved. This task requires the ability to locate bottlenecks and points of
contention, monitor workload and throughput, review SQL performance and
optimization, monitor storage space and fragmentation and view and manage the
system and DBMS resource use.
● Backup and disaster recovery. The DBA is responsible for ensuring that databases
and data are backed up appropriately and can be recovered correctly and quickly in
the event of failure. The DBA also ensures that databases are protected and secured,
enacts measures to maintain the database integrity in terms of data accuracy and
makes sure unauthorized users can't access the data.
● Collaboration and integration. The DBA frequently gets pulled into other projects
as a subject matter expert on the database. Because databases are at the center of most
modern application development projects, the DBA participates in database
integration and use in a variety of IT projects. This exposure to many different
technologies and experiences can make the DBA a valuable IT technician not just for
database-related issues, but for other technologies as well.
● Staying current. As the central point of contact for information about the company's
database management systems, DBAs must keep up to date on the latest versions and
capabilities of each DBMS.
● Using the right tools. Administering a DBMS usually requires the use of other tools
than those provided with the DBMS. DBAs must understand the strengths and
weaknesses of native DBA tools, develop a strategy for addressing the weaknesses
and use tools that improve the performance, availability, administration and recovery
of the databases they manage.
● User access and support. A DBA creates new user accounts and provides access
permissions. They also create and manage database reports, visualizations and
dashboards to help users easily understand the status of their database.
● Communication. DBAs must have good communication skills. They must be able to
converse and work with application programmers, business end users, IT and
business managers, data analysts and other DBAs.
System DBA
This role focuses on technical, rather than business, issues. The system DBA is
knowledgeable in the arcane technical details of how the database is installed, configured
and modified. Typical tasks center on the physical installation and performance of the
DBMS software and can include the following:
● Installing new software versions and applying fixes.
● Setting and tuning system parameters.
● Tuning the operating system, network and transaction processors to work with the
DBMS.
● Ensuring appropriate storage and memory are available for the DBMS.
System DBAs are rarely involved with the actual database and application set up. They
might get involved in application tuning when operating system parameters or complex
DBMS parameters need to be altered.
Database architect
The primary responsibility of this role is the design and set up of new databases. The
database architect designs new databases and database structures for new and existing
applications and is rarely involved in the maintenance and tuning of established databases
and applications. Typical tasks include the following:
● Modelling logical data.
● Translating logical data models into a physical database design.
● Analysing data access requirements to ensure optimal database design and efficient
SQL access.
● Creating backup and recovery strategies for new databases.
Database analyst
Sometimes junior DBAs are referred to as database analysts. The database analyst's role
might be like that of the database architect. The database analyst designation can be another
name for a database administrator.
Application DBA
An application DBA focuses on database design and the ongoing database support and
administration for a specific application or subset of applications. The application DBA is
likely an expert in writing and debugging complex SQL. They understand the best ways to
incorporate database requests into application programs. Application DBAs typically are
also responsible for managing and refreshing test data for application development teams.
Not every organization has an application DBA on staff. In that case, the general-purpose
DBA supports specific applications while also maintaining the organization's database
environment. But even with an application DBA on board, general-purpose DBAs are still
required to support the overall database environment and infrastructure.
Task-oriented DBA
This specialized DBA focuses on a specific administrative task and is uncommon outside of
large IT shops. A backup and recovery DBA, for example, would be task-oriented to ensure
the organization's databases are recoverable, including creating backup plans, building and
testing backup scripts, testing recovery scripts, and driving recovery tasks when required.
The backup and recovery DBA also participates in building and testing disaster contingency
plans for the company's databases.
Performance analyst
As the most common task-oriented DBA, the performance analyst focuses entirely on
monitoring databases and improving the performance of applications that access them. A
performance analyst is an expert in SQL coding for performance and is knowledgeable in
designing and building high-performance databases. Performance analysts should have a
deep understanding of the DBMS, collaborate with other DBAs to execute changes when
required and communicate with application developers in their language to facilitate
appropriate program changes for performance.
Data warehouse administrator
This fully capable DBA has the knowledge and skills to monitor and support the data
warehouse environment. Data warehouse administrators understand the differences between
a database that supports online transaction processing and a data warehouse and must have
experience in the following areas:
● Business intelligence and query tools.
● Specialized database design for data warehousing.
● ETL skills.
● Knowledge of data warehousing technologies, such as online analytical
processing and star schema.
Cloud DBA
As companies increasingly migrate workloads to the cloud, the cloud DBA has become
more popular and performs many of the same tasks as a general-purpose DBA but for cloud
database executions on services like Amazon Web Services and Microsoft Azure. The cloud
DBA understands the services the cloud provider offers, including backup and security, to
set up databases in the cloud. Cloud DBAs need to be aware of latency, fault tolerance and
especially cost management because adding data or workloads to a cloud setup can
significantly increase costs.
Data modelers
Although an efficient database design must include data modelling, this isn't always
considered the job of a DBA, since they're more focused on technology than the business.
However, DBAs should know the essentials of data modelling, including how to build and
manage data models to illustrate the relationships between data components.
Data modelers create the data models and computer databases needed to transform intricate
organizational data into functional computer systems. They also oversee the information
flow across different departments within a business using relational, dimensional
and NoSQL databases.
A Database User is defined as a person who interacts with data daily, updating, reading, and
modifying the given data. Database users can access and retrieve data from the database
through the Database Management System (DBMS) applications and interfaces.
Database users are categorized based on their interaction with the database. There are seven
types of database users in DBMS. Below mentioned are the types of database users:
1. Database Administrator (DBA)
A Database Administrator (DBA) is a person/team who defines the schema and also
controls the 3 levels of the database. The DBA will then create a new account ID and
password for the user if he/she needs to access the database. DBA is also responsible for
providing security to the database and he allows only authorized users to access/modify the
database. DBA is responsible for problems such as security breaches and poor system
response time.
● DBA also monitors the recovery and backup and provides technical support.
● The DBA has a DBA account in the DBMS which is called a system or superuser
account.
● DBA repairs damage caused due to hardware and/or software failures.
● DBA is the one having privileges to perform DCL (Data Control
Language) operations such as GRANT and REVOKE, to allow/restrict a particular
user from accessing the database.
3. A System Analyst
A system Analyst is a user who analyses the requirements of parametric end users. They
check whether all the requirements of end users are satisfied.
4. Sophisticated Users
Sophisticated users can be engineers, scientists, business analyst, who are familiar with the
database. They can develop their own database applications according to their requirement.
They don’t write the program code but they interact the database by writing SQL queries
directly through the query processor.
5. Database Designers
Data Base Designers are the users who design the structure of database which includes
tables, indexes, views, triggers, stored procedures and constraints which are usually
enforced before the database is created or populated with data. He/she controls what data
must be stored and how the data items to be related. It is the responsibility of Database
Designers to understand the requirements of different user groups and then create a design
which satisfies the need of all the user groups.
6. Application Programmers
Application Programmers also referred as System Analysts or simply Software Engineers,
are the back-end programmers who writes the code for the application programs. They are
the computer professionals. These programs could be written in Programming languages
such as Visual Basic, Developer, C, FORTRAN, COBOL etc. Application programmers
design, debug, test, and maintain set of programs called “canned transactions” for the Naive
(parametric) users in order to interact with database.
8. Specialized users
Specialized users are sophisticated users who write specialized database application that
does not fit into the traditional data-processing framework. Among these applications are
computer aided-design systems, knowledge-base and expert systems etc.
A Data Model in Database Management System (DBMS) is the concept of tools that are
developed to summarize the description of the database. Data Models provide us with a
transparent picture of data which helps us in creating an actual database. It shows us from
the design of the data to its proper implementation of data.
The conceptual data model describes the database at a very high level and is useful to
understand the needs or requirements of the database. It is this model, that is used in the
requirement-gathering process i.e. before the Database Designers start making a particular
database. One such popular model is the entity/relationship model (ER model). The E/R
model specializes in entities, relationships, and even attributes that are used by database
designers. In terms of this concept, a discussion can be made even with non-computer
science(non-technical) users and stakeholders, and their requirements can be understood.
Entity-Relationship Model (ER Model): It is a high-level data model which is used to
define the data and the relationships between them. It is basically a conceptual design of
any database which is easy to design the view of data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can be a name, place, object,
class, etc. These are represented by a rectangle in an ER Diagram.
2. Attributes: An attribute can be defined as the description of the entity. These are
represented by Ellipse in an ER Diagram. It can be Age, Roll Number, or Marks for a
Student.
3. Relationship: Relationships are used to define relations among different entities.
Diamonds and Rhombus are used to show Relationships.
Characteristics of a conceptual data model
● Offers Organization-wide coverage of the business concepts.
● This type of Data Models are designed and developed for a business audience.
● The conceptual model is developed independently of hardware specifications like
data storage capacity, location or software specifications like DBMS vendor and
technology. The focus is to represent data as a user will see it in the “real world.”
Conceptual data models known as Domain models create a common vocabulary for all
stakeholders by establishing basic concepts and scope
2. Network Model
The Network Model was formalized by the Database Task group in the 1960s. This model
is the generalization of the hierarchical model. This model can consist of multiple parent
segments and these segments are grouped as levels but there exists a logical association
between the segments belonging to any level. Mostly, there exists a many-to-many logical
association between any of the two segments.
3. Object-Oriented Data Model
In the Object-Oriented Data Model, data and their relationships are contained in a single
structure which is referred to as an object in this data model. In this, real-world problems
are represented as objects with different attributes. All objects have multiple relationships
between them. Basically, it is a combination of Object Oriented programming and a
Relational Database Model.
In the above figure, Project is the root node which has two children i.e. Project 1 and
Project 2. Project 1 has 3 children and Project 2 has 2 children. Total there are 5 children
i.e. Department A, Department B and Department C, they are network related children as
we said that this model can have more than one parent. So, for the Department B and
Department C have two parents i.e. Project 1 and Project 2.
Records are linked with the help of Records are linked with the help of linked
4.
pointers. list.
It is used to access the data which is It is used to access the data which is
7.
complex and asymmetric. complex and symmetric.
When you are searching for a record Searching for a record is easy because of
11. then firstly you need to visit parent the availability of multiple access paths to
record before retrieving a child record. reach data item.
Introduction of ER Model
● Peter Chen developed the ER diagram in 1976 .The ER model was created to provide a
simple and understandable model for representing the structure and logic of databases. It
has since evolved into variations such as the Enhanced ER Model and the Object
Relationship Model
● The Entity Relational Model is a model for identifying entities to be represented in the
database and representation of how those entities are related. The ER data model specifies
enterprise schema that represents the overall logical structure of a database graphically.
● The Entity Relationship Diagram explains the relationship among the entities present in
the database. ER models are used to model real-world objects like a person, a car, or a
company and the relation between these real-world objects. In short, the ER Diagram is the
structural format of the database.
ER Model is used to model the logical view of the system from a data perspective which consists
of these symbols:
● Rectangles: Rectangles represent Entities in the ER Model.
● Ellipses: Ellipses represent Attributes in the ER Model.
● Diamond: Diamonds represent Relationships among Entities.
● Lines: Lines represent attributes to entities and entity sets with other relationship types.
● Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
● Double Rectangle: Double Rectangle represents a Weak Entity.
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be represented
as rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be
taken as an entity.
a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't contain any
key attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It
represents a primary key. The key attribute is represented by an ellipse with
the text underlined.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite
attribute. The composite attribute is represented by an ellipse, and those
ellipses are connected with an ellipse.
c. Multi-valued Attribute
An attribute can have more than one value. These attributes are known as a
multi-valued attribute. The double oval is used to represent multi-valued
attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived
attribute. It can be represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from
another attribute like Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or
rhombus is used to represent the relationship.
1 Has 1
Person Passport
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of
an entity on the right associates with the relationship then this is known as a
one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done
by the only specific scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of
an entity on the right associates with the relationship then it is known as a
many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many
students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one
instance of an entity on the right associates with the relationship then it is
known as a many-to-many relationship.
For example, Employee can assign by many projects and project can have
many employees.
Cardinality
Cardinality means how the entities are arranged to each other or what is the
relationship structure between entities in a relationship set. In a Database
Management System, Cardinality represents a number that denotes how many
times an entity is participating with another entity in a relationship set. The
Cardinality of DBMS is a very important attribute in representing the structure
of a Database. In a table, the number of rows or tuples represents the
Cardinality.
Cardinality Ratio
Cardinality ratio is also called Cardinality Mapping, which represents the
mapping of one entity set to another entity set in a relationship set. We
generally take the example of a binary relationship set where two entities are
mapped to each other. Cardinality is very important in the Database of various
businesses. For example, if we want to track the purchase history of each
customer then we can use the one-to-many cardinality to find the data of a
specific customer. The Cardinality model can be used in Databases by
Database Managers for a variety of purposes, but corporations often use it to
evaluate customer or inventory data.
here are four types of Cardinality Mapping in Database Management Systems:
1. One to one
2. Many to one
3. One to many
4. Many to many
One to One
One to one cardinality is represented by a 1:1 symbol. In this, there is at most
one relationship from one entity to another entity. There are a lot of examples
of one-to-one cardinality in real life databases.
For example, one student can have only one student id, and one student id can
belong to only one student. So, the relationship mapping between student and
student id will be one to one cardinality mapping.
Another example is the relationship between the director of the school and the
school because one school can have a maximum of one director, and one
director can belong to only one school.
many.
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the
table. It is also used to establish and identify relationships
between tables.
Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of
2. Candidate key
o A candidate key is an attribute or set of attributes that can
6. Composite key
Whenever a primary key consists of more than one attribute, it is
known as a composite key. This key is also known as Concatenated
Key.
For example, in employee relations, we assume that an employee
may be assigned multiple roles, and an employee may work on
multiple projects simultaneously. So the primary key will be
composed of all three attributes, namely Emp_ID, Emp_role, and
Proj_ID in combination. So these attributes act as a composite key
since the primary key comprises more than one attribute.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial
keys. These keys are created when a primary key is large and complex
and has no relationship with many other relations. The data values of
the artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID,
Emp_role, and Proj_ID, is large in employee relations. So it would be
better to add a new virtual attribute to identify each tuple in the
relation uniquely.
Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called
the key attribute. For example, Roll_No will be unique for each student. In ER
diagram, the key attribute is represented by an oval with underlying lines.
Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite
attribute. For example, the Address attribute of the student Entity type consists
of Street, City, State, and Country. In ER diagram, the composite attribute is
represented by an oval comprising of ovals.
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example,
Phone_No (can be more than one for a given student). In ER diagram, a
multivalued attribute is represented by a double oval.
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known
as a derived attribute. e.g.; Age (can be derived from DOB). In ER diagram,
the derived attribute is represented by a dashed oval.
The Complete Entity Type Student with its Attributes can be represented as:
ER Design Issues
1) Use of Entity Set vs Attributes
The use of an entity set or attribute depends on the structure of the real-
world enterprise that is being modeled and the semantics associated with
its attributes. It leads to a mistake when the user use the primary key of an
entity set as an attribute of another entity set. Instead, he should use the
relationship to do so. Also, the primary key attributes are implicit in the
relationship set, but we designate it in the relationship sets.
Aggregation
An ER diagram is not capable of representing the relationship
between an entity and a relationship which may be required in some
scenarios. In those cases, a relationship with its corresponding entities
is aggregated into a higher-level entity. Aggregation is an abstraction
through which we can represent relationships as higher-level entity
sets.
For Example, an Employee working on a project may require some
machinery. So, REQUIRE relationship is needed between the
relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities
EMPLOYEE and PROJECT is aggregated into a single entity and
relationship REQUIRES is created between the aggregated entity and
MACHINERY.
How to Convert ER Diagrams to Tables in DBMS
Step 1: Analyze the Entities
● Identify each distinct entity in the diagram represented by a rectangle.
● Examine the attributes associated with each entity which are its
characteristics represented by ellipses within the rectangle.
Step 2: Create Separate Tables for Each Entity
● For each entity, create a corresponding table in the database with
each attribute as a column in the table.
● Assign the primary key as a column in the respective table.
Step 3: Map Relationships
● For one-to-many relationships, add a foreign key in the “many” table
referencing the “one” table’s primary key.
● For many-to-many, create a linking table with foreign keys to both
related tables.
Step 4: Normalize the Tables
● Analyze the tables for redundancies and potential data inconsistencies.
● Apply normalization techniques like decomposition to split tables and
eliminate redundancy, improving data integrity and reducing storage
space.
Step 5: Verify and Refine
● Review the created tables and relationships for accuracy and
efficiency.
● Refine the schema as needed based on further analysis or specific
requirements of your database.
o Entity type becomes a table.
In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE
forms individual tables.
Entity in DBMS
An entity is referred to as an object or thing that exists in the real world. For
example, customer, car, pen, etc.
Entities are stored in the database, and they should be distinguishable, i.e., they
should be easily identifiable from the group. For example, a group of pens that
are from the same company cannot be identified, so they are only objects, but
pens with different colors become unique and will be called an entity like a red
pen, green pen, blue pen, black pen, etc.
In a group of pens, we can easily identify any pen because of its different
colors, so a pen of different colors is an entity.
For extracting data from the database, each data must be unique in its own way
so that it becomes easier to differentiate between them. Distinct and unique
data is known as an entity.
An entity has some attributes which depict the entity's characteristics. For
example, an entity "Student" has attributes such as "Student_roll_no",
"Student_name", "Student_subject", and "Student_marks".
1 Robert English 85
2 Parker Mathematics 75
3 Harry Science 80
4 George Geography 70
Some entities are related to other entities in the table. For example, the
"Student" entity is related to the "University" entity. The ERD (Entity
Relationship Diagram) model comes to light to visually show the relationship
between several entities.
Kinds of Entity:
1. Tangible Entity:
2. Intangible Entity:
For example, a bank account logically exists, but we cannot see or touch it.
Entity Type:
The Employee entity type can have attributes such as name, age, address,
phone number, and salary.
The Department entity type can have attributes such as name, number, and
location in the department.
1. Strong Entity Type: It is an entity that has its own existence and is
independent.
The entity relationship diagram represents a strong entity type with the help of
a single rectangle. Below is the ERD of the strong entity type:
In the above example, the "Customer" is the entity type with attributes such
as ID, Name, Gender, and Phone Number. Customer is a strong entity type as
it has a unique ID for each customer.
2. Weak Entity Type: It is an entity that does not have its own existence and
relies on a strong entity for its existence.
The Entity Relationship Diagram represents the weak entity type using
double rectangles. Below is the ERD of the weak entity type:
In the above example, "Address" is a weak entity type with attributes such as
House No., City, Location, and State.
Let us see an example of the relationship between the Strong entity type and
weak entity type with the help of ER Diagram:
Entity Set
Example
of Entity
Determine the main objects you want to represent in the database. Eg, “students”,
“courses”, or “products”.
Create relationships between entities to specify how entities interact with each other.
Relationships are verbs like “teaches”, “studies”, or “sells”.
To add attributes of a entitity write attributes inside the rectangle or connect them with
lines.
Indicate the minimum and maximum number of relationship instances associated with an
entity using notations like crow’s foot.
Organize all entities and relationships in a clean way for better readibility and
understanding.
UNIT THREE
4 SURESH DELHI 18
Important Terminologies
• Attribute: Attributes are the properties that define an entity.
e.g.; ROLL_NO, NAME, ADDRESS
• Relation Schema: A relation schema defines the structure of the relation
and represents the name of the relation with its attributes. e.g.; STUDENT
(ROLL_NO, NAME, ADDRESS, PHONE, and AGE) is the relation schema
for STUDENT. If a schema has more than 1 relation, it is called Relational
Schema.
• Tuple: Each row in the relation is known as a tuple. The above relation
contains 4 tuples, one of which is shown as:
4 SURESH DELHI 18 IT
Table Branch
BRANCH_CODE BRANCH_NAME
CS COMPUTER SCIENCE
IT INFORMATION TECHNOLOGY
ELECTRONICS AND
ECE COMMUNICATION
ENGINEERING
CV CIVIL ENGINEERING
BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint. The
relation which is referencing another relation is called REFERENCING
RELATION (STUDENT in this case) and the relation to which other relations
refer is called REFERENCED RELATION (BRANCH in this case).
Advantages of the Relational Model
• Simple model: Relational Model is simple and easy to use in
comparison to other languages.
• Flexible: Relational Model is more flexible than any other
relational model present.
• Secure: Relational Model is more secure than any other relational
model.
• Data Accuracy: Data is more accurate in the relational data
model.
• Data Integrity: The integrity of the data is maintained in the
relational model.
• Operations can be Applied Easily: It is better to perform
operations in the relational model.
Disadvantages of the Relational Model
• Relational Database Model is not very good for large databases.
• Sometimes, it becomes difficult to find the relation between tables.
• Because of the complex structure, the response time for queries is
high.
Characteristics of the Relational Model
• Data is represented in rows and columns called relations.
• Data is stored in tables having relationships between them called
the Relational model.
• The relational model supports the operations like Data definition,
Data manipulation, and Transaction management.
• Each column has a distinct name and they are representing
attributes.
• Each row represents a single entity.
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to
obtain the result of the query. It uses operators to perform queries.
Types of Relational operation
1. Select Operation:
Input:
1. σ BRANCH_NAME="perryride" (LOAN)
Output:
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
o It is denoted by ∏.
Input:
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
Notation: R ∪ S
A union operation must hold the following condition:
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains
all tuples that are in both R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S
Example: Using the above DEPOSITOR table and BORROW table
Input:
CUSTOMER_NAME
Smith
Jones
5. Set Difference:
o Suppose there are two relations R and S. The set intersection operation
contains all tuples that are in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S
Example: Using the above DEPOSITOR table and BORROW table
Input:
Jackson
Hayes
Willians
Curry
6. Cartesian product
o The Cartesian product is used to combine each row in one table with each row
in the other table. It is also known as a cross product.
o It is denoted by X.
1. Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
1. EMPLOYEE X DEPARTMENT
Output:
EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
1. ρ(STUDENT1, STUDENT)
SQL Set Operation
The SQL Set operation is used to combine the two or more SQL SELECT
statements.
Types of Set Operation
1. Union
2. UnionAll
3. Intersect
4. Minus
1. Union
o The SQL Union operation is used to combine the result of two or more SQL
SELECT queries.
o In the union operation, all the number of datatype and columns must be same
in both the tables on which UNION operation is being applied.
o The union operation eliminates the duplicate rows from its resultset.
Syntax
2. Union All
Union All operation is equal to the Union operation. It returns the set without
removing duplication and sorting the data.
Syntax:
3. Intersect
4. Minus
●
●
●
Natural join is an SQL join operation that creates a join on the base of the common
columns in the tables. To perform natural join there must be one common attribute
(Column) between two tables. Natural join will retrieve from multiple relations. It
works in three steps.
In this article, we will discuss the overview of SQL Natural Join and then mainly
focus to implement queries with the help of examples.
Syntax:
We will perform the natural join query by using the following syntax.
SELECT *
FROM TABLE1
NATURAL JOIN TABLE2;
Features of Natural Join
Here, we will discuss the features of natural join.
1. It will perform the Cartesian product.
2. It finds consistent tuple and deletes inconsistent tuple.
3. Then it deletes the duplicate attributes.
Syntax :
SELECT column_name(s)
FROM table1
LEFT JOIN Table2
ON Table1.Column_Name=table2.column_name;
2. Right Outer Join : The right join operation returns all record from right table
and matching records from the left table. On a matching element not found in left
table, NULL is represented in that case.
Syntax :
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;
3. Full Outer Join : The full outer Join keyword returns all records when there is a
match in left or right table records.
Syntax:
SELECT column_name
FROM table1
FULL OUTER JOIN table2
ON table1.columnName = table2.columnName
WHERE condition;
SQL | DIVISION
Division in SQL is typically required when you want to find out entities that are
interacting with all entities of a set of different types of entities. The division
operator is used when we have to evaluate queries that contain the keyword ‘all’.
When to Use the Division Operator
You typically require the division operator in scenarios where you need to ascertain
entities that meet all specified conditions across different datasets. Here are a few
common examples:
● Identifying people who have accounts in every bank within a particular city.
● Determining students who have enrolled in all necessary courses to qualify for
graduation.
These use cases involve checking against all elements of a subset, thereby
necessitating a division-like operation.
In all these queries, the description after the keyword ‘all’ defines a set which
contains some elements and the final result contains those units who satisfy these
requirements.
●
●
●
Conditional Join(⋈c): Conditional Join is used when you want to join two or
more relation based on some conditions. Example: Select students whose
ROLL_NO is greater than EMP_NO of employees
STUDENT⋈c STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
Equijoin(⋈): Equijoin is a special case of conditional join where only equality
condition holds between a pair of attributes. As values of two attributes will be
equal in result of equijoin, only one attribute will be appeared in result. Example:
Select students whose ROLL_NO is equal to EMP_NO of employees.
STUDENT⋈STUDENT.ROLL_NO=EMPLOYEE.EMP_NOEMPLOYEE
Left Outer Join(⟕): When applying join on two relations R and S, some tuples of
R or S does not appear in result set which does not satisfy the join conditions. But
Left Outer Joins gives all tuples of R in the result set. The tuples of R which do not
satisfy join condition will have values as NULL for attributes of S. Example:Select
students whose ROLL_NO is greater than EMP_NO of employees and details of
other students as well
STUDENT⟕STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
Right Outer Join(⟖): When applying join on two relations R and S, some tuples
of R or S does not appear in result set which does not satisfy the join conditions.
But Right Outer Joins gives all tuples of S in the result set. The tuples of S which
do not satisfy join condition will have values as NULL for attributes of R.
Example: Select students whose ROLL_NO is greater than EMP_NO of
employees and details of other Employees as well
STUDENT⟖STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
Full Outer Join(⟗): When applying join on two relations R and S, some tuples of
R or S does not appear in result set which does not satisfy the join conditions. But
Full Outer Joins gives all tuples of S and all tuples of R in the result set. The tuples
of S which do not satisfy join condition will have values as NULL for attributes of
R and vice versa. Example: Select students whose ROLL_NO is greater than
EMP_NO of employees and details of other Employees as well and other Students
as well
UNIT FOUR
Functional Dependency
If the information stored in a table can uniquely determine another
information in the same table, then it is called Functional Dependency. Consider
it as an association between two attributes of the same relation.
If P functionally determines Q, then
P -> Q
Let us see an example −
<Employee>
EmpID EmpName EmpAge
E01 Amit 28
E02 Rohit 31
In the above table, EmpName is functionally dependent
on EmpID because EmpName can take only one value for the given value
of EmpID:
EmpID -> EmpName
The same is displayed below −
Fully-functionally Dependency
An attribute is fully functional dependent on another attribute, if it is
Functionally Dependent on that attribute and not on any of its proper subset.
For example, an attribute Q is fully functional dependent on another attribute P,
if it is Functionally Dependent on P and not on any of the proper subset of P.
Let us see an example −
<ProjectCost>
ProjectID ProjectCost
001 1000
002 5000
<EmployeeProject>
EmpID ProjectID Days (spent on the
project)
E099 001 320
E056 002 190
The above relations states:
EmpID, ProjectID, ProjectCost -> Days
However, it is not fully functional dependent.
Whereas the subset {EmpID, ProjectID} can easily determine the {Days} spent
on the project by the employee.
This summarizes and gives our fully functional dependency −
{EmpID, ProjectID} -> (Days)
Transitive Dependency
When an indirect relationship causes functional dependency it is
called Transitive Dependency.
If P -> Q and Q -> R is true, then P-> R is a transitive dependency.
Multivalued Dependency
When existence of one or more rows in a table implies one or more other
rows in the same table, then the Multi-valued dependencies occur.
If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.
It is represented by double arrow −
->->
For our example:
P->->QQ->->R
In the above case, Multivalued Dependency exists only if Q and R are
independent attributes.
Partial Dependency
Partial Dependency occurs when a nonprime attribute is functionally
dependent on part of a candidate key. The 2nd Normal Form (2NF) eliminates
the Partial Dependency. Let us see an example −
<StudentProject>
StudentID ProjectNo StudentName ProjectName
S01 199 Katie Geo Location
S02 120 Ollie Cluster
Exploration
In the above table, we have partial dependency; let us see how −
The prime key attributes are StudentID and ProjectNo.
As stated, the non-prime attributes i.e. StudentName and ProjectName should
be functionally dependent on part of a candidate key, to be Partial Dependent.
The StudentName can be determined by StudentID that makes the relation
Partial Dependent.
The ProjectName can be determined by ProjectID, which that the relation
Partial Dependent.
Normalization
A large database defined as a single relation may result in data duplication. This
repetition of data may result in:
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many
records in relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like
Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database table.
Normalization works through a series of stages called Normal forms. The normal
forms apply to individual relations. The relation is said to be in particular normal
form if it satisfies constraints.
Following are the various types of Normal forms:
Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher
normal forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher
degree.
o Careless decomposition may lead to a bad database design, leading to
serious problems.
7272826385,
14 John UP
9064738238
7390372389,
12 Sam Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on
TEACHER_ID which is a proper subset of a candidate key. That's why it violates
the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF)
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the
data integrity.
o If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions
for every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
264 India
264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
D394 283
D394 300
D283 232
D283 549
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and
HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there
is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition
of data.
So to make the above table into 4NF, we can decompose it into two tables:
Advertisement
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen