0% found this document useful (0 votes)
17 views31 pages

DBMS Updated Module 1

B.tech cse Database management system notes Punjab technical University

Uploaded by

course5325
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
17 views31 pages

DBMS Updated Module 1

B.tech cse Database management system notes Punjab technical University

Uploaded by

course5325
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 31

Course Code: BTCS501-18

Course Title: Database Management Systems


Module: 1
1.Abstraction
Abstraction in DBMS means hiding unnecessary or irrelevent details and displaying only necessary
details to the user. In order to make the system efficient in terms of retrieval of data, and reduce
complexity in terms of usability of users, developers use abstraction. This approach simplifies database
design.
Level of Abstraction in a DBMS: 3
 Physical or Internal Level
 Logical or Conceptual Level
 View or External Level

Figure: Data Abstraction

1.1 Physical or Internal Level: This is the lowest level of data abstraction. It tells us how the data is
actually stored in memory. Access methods like sequential or random access and file organization
methods like B+ trees and hashing are used for the same. For example, we need to store the details of
an employee, then blocks of storage and the amount of memory used for these purposes are kept hidden
from the user.
1.2 Logical or Conceptual Level: This level includes the information that is actually stored in the
database in the form of tables. It also stores the relationship among the data entities in simple structures.
At this level, the information available to the user at the view level is unknown.
The logical level describes the entire database in terms of a small number of relatively simple structures.
Database administrators, who must decide what information to keep in the database, use the logical level
of abstraction.
1.3 View or External Level: This is the highest level of abstraction. Only a part of the actual database
is viewed by the users. Users view data in the form of rows and columns. Tables and relations are used
to store data. Multiple views of the same database may exist. Users can just view the data and interact
with the database, storage and implementation details are hidden from them.
Many users of the database system do not need all this information; instead, they need to access only a
part of the database.
The view level of abstraction exists to simplify their interaction with the system.

For Example: In case of storing customer data,


 Physical level: it will contain block of storages (bytes,GB,TB,etc)
 Logical level - it will contain the fields and the attributes of data.
 View level – it works with CLI and GUI access of database

2. Data Independence
Data Independence is defined as a property of DBMS that helps you to change the database schema at
one level of a system without requiring to change the schema at the next level. it helps to keep the data
separated from all program that makes use of it. We have mainly 2 levels of data independence arising
from these levels of abstraction:
 Physical level data independence
 Logical level data independence
2.1 Physical Level Data Independence: It refers to the characteristic of being able to modify the
physical schema without any alterations to the conceptual or logical schema. The Conceptual structure
of the databse would not be affected by any change in storage size of the database system server. These
alterations or modifications to the physical structure may include:
 Utilizing new storage devices.
 Modifying data structures used for storage.
 Altering indexes or using alternative file organization techniques etc.

2.2 Logical Level Data Independence: It refers characteristic of being able to modify the logical schema
without affecting the external schema or application program. The user view of the data would not be
affected by any changes to the conceptual view of the data. These changes may include insertion or
deletion of attributes, altering table structures entities or relationships to the logical schema, etc.

3. SQL commands / SQL sublanguage commands


 DDL: Data Definition Language
 DML: Data Manipulation Language
 TCL: Transaction Control Language
 DQL: Data Query Language
 DCL: Data Control Language
3.1 Data Definition Language (DDL): Data Definition Language consists of the SQL commands that
can be used to define the database schema. It simply deals with descriptions of the database schema and
is used to create and modify the structure of database objects in the database. DDL is a set of SQL
commands used to create, modify, and delete database structures but not data. These commands are
normally not used by a general user, who should be accessing the database via an application.

3.2 Data Manipulation Language (DML): The SQL commands that deal with the manipulation of data
present in the database belong to DML or Data Manipulation Language and this includes most of the
SQL statements. It is the component of the SQL statement that controls access to data and to the database.

3.3 Transaction Control Language (TCL): Transactions group a set of tasks into a single execution
unit. Each transaction begins with a specific task and ends when all the tasks in the group are successfully
completed. If any of the tasks fail, the transaction fails. Therefore, a transaction has only two results:
success or failure.

3.4 Data Query Language (DQL): DQL is a portion of a SQL statement that allows you to get and
organise data from a database. You can use the SELECT command to extract data from a database in
order to perform actions on it.

3.5 Data Control Language (DCL): DCL includes commands such as GRANT and REVOKE which
mainly deal with the rights, permissions, and other controls of the database system.

4. Data Model
Data Model is the modeling of the data description, data semantics, and consistency constraints of the
data. It provides the conceptual tools for describing the design of a database at each level of data
abstraction.
Data models define how the logical structure of a database is modeled. Data Models are fundamental
entities to introduce abstraction in a DBMS. Data models define how data is connected to each other and
how they are processed and stored inside the system.
Data Models provide us with a transparent picture of data which helps us in creating an actual database.
It is basically classified into 3 types:-
4.1. Conceptual Data Model: The conceptual data model describes the database at a very high level and
is useful to understand the needs or requirements of the database. It is this model, that is used in the
requirement-gathering process i.e. before the Database Designers start making a particular database. One
such popular model is the Entity / Relationship Model (ER Model). The E/R model specializes in entities,
relationships, and even attributes that are used by database designers.
Characteristics of a conceptual data model:
 Offers Organization-wide coverage of the business concepts.
 This type of Data Models are designed and developed for a business audience.
 The conceptual model is developed independently of hardware specifications like data storage
capacity, location or software specifications like DBMS vendor and technology. The focus is to
represent data as a user will see it in the “real world.”

Entity-Relationship Model( ER Model): It is a high-level data model which is used to define the data
and the relationships between them. It is basically a conceptual design of any database which is easy to
design the view of data.
Components of ER Model:
• Entity: An entity is referred to as a real-world object which has somephysical or logical existance.
It can be a name, place, object, class, etc. These are represented by a rectangle in an ER Diagram.
• Attribute: attribute can be defined as the description of the entity. These are represented by
Ellipse in an ER Diagram. It can be Age, Roll Number, or Marks for a Student.
• Relationship: Relationships are used to define relations among different entities. Diamonds and
Rhombus are used to show Relationships.
4.2. Representational Data Model
This type of data model is used to represent only the logical part of the database and does not represent
the physical structure of the database. The representational data model allows us to focus primarily, on
the design part of the database. A popular representational model is a Relational Model. The relational
Model consists of Relational Algebra and Relational Calculas. In the Relational Model, we basically use
tables to represent our data and the relationships between them. It is a theoretical concept whose practical
implementation is done in Physical Data Model.
The advantage of using a Representational data model is to provide a foundation to form the base for the
Physical model.
4.3. Physical Data Model
The physical Data Model is used to practically implement Relational Data Model. Ultimately, all data in
a database is stored physically on a secondary storage device such as discs and tapes. This is stored in
the form of files, records, and certain other data structures. Here, we basically save tables in memory so
they can be accessed efficiently. Structured Query Language (SQL) used to practically implement
Relational Algebra.
This Data Model describes HOW the system will be implemented using a specific DBMS system. This
model is typically created by DBA and developers. The purpose is actual implementation of the database.
Characteristics of a physical data model:
• The physical data model describes data need for a single project or application though it maybe
integrated with other physical data models based on project scope.
• Data Model contains relationships between tables that which addresses cardinality and nullability
of the relationships.
• Developed for a specific version of a DBMS, location, data storage or technology to be used in
the project.
• Columns should have exact datatypes, lengths assigned and default values.
• Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined.

4.4 Some Other Data Models

4.4.1 Hierarchical Model


In a hierarchical model, data are viewed as a collection of tables, or we can say segments that form a
hierarchical relation. In this, the data is organized into a tree-like structure where each record consists of
one parent record and many children.
Consider the below Student database system hierarchical model.
Applications of hierarchical model:
 Used as semantic models in practice as many real-world occurrences of events are hierarchical in
nature like biological structures, political, or social structures.
 commonly used as physical models because of the inherent hierarchical structure of the disk
storage system like tracks, cylinders, etc. There are various examples such as Information
Management System (IMS) by IBM, NOMAD by NCSS, etc.
Consider the below cricket database system hierarchical model scheme.

The "root" in the structure is a single table in the database and other tables act as the branches flowing
from the root. The diagram below also shows a typical hierarchical database structure.

Advantages of the hierarchical model:


 It has data sharing as all data are held in a common database data and therefore sharing of data
becomes practical.
 It offers data security and this model was the first database model that offered data security.
 There’s also data integrity as it is based on the parent-child relationship and also there’s always
a link between the parents and the child segments.

Disadvantages of the hierarchical model:


 Even though this model is conceptually simple and easy to design at the same time it is quite
complex to implement.
 This model also lacks flexibility as the changes in the new tables or segments often yield very
complex system management tasks. Here, a deletion of one segment can lead to the involuntary
deletion of all segments under it.
 It has no standards as the implementation of this model does not provide any specific standard.
 It is also limited as many of the common relationships do not conform to the 1 to N format as
required by the hierarchical model.
 When a user needs to store a record in a child table that is currently unrelated to any record in a
parent table, it gets difficulty in recording and user must record an additional entry in the parent table.

 This type of database cannot support complex relationships, and there is also a problem of
redundancy, which can result in producing inaccurate information due to the inconsistent recording
of data at various sites.

4.4.2 Network Model


This model is the generalization of the hierarchical model. This model can consist of multiple parent
segments and these segments are grouped as levels but there exists a logical association between the
segments belonging to any level. Mostly, there exists a many-to-many logical association between any
of the two segments.
Network data model

Network model for a Finance Department


Network Model for a store

Network Model for a college

Advantages of Network Model


 Very simple and easy to design like the hierarchical data model.
 Capable of handling multiple types of relationships which can help in modeling real-life
applications, for example, 1: 1, 1: M, M: N relationships.
 We can access the data easily, and also there is a chance that the application can access the
owner’s and the member’s records within a set.
 This network does not allow a member to exist without an owner which leads to the concept
of Data integrity.
 This model allows to represent multi parent relationships.
 Network models represent complex data relationships better than the hierarchical models.
 Improved database performance.
 It includes Data Definition Language (DDL) and Data Manipulation Language (DML)
commands.
Disadvantages of Network Model
 The schema or the structure of this database is very complex in nature as all the records are
maintained by the use of pointers.
 There’s an existence of operational anomalies as there is a use of pointers for navigation which
further leads to complex implementation.
 The design or the structure of this model is not user-friendly.
 This model does not have any scope of automated query optimization.
 This model fails in achieving structural independence even though the network database
model is capable of achieving data independence.
 Database contains a complex array of pointers.
 Navigation systems yield complex implementation and management.
 Any change like updating, deletion, insertion is very complex.
Features of Network Model in DBMS
 Data Relationship Representation:The network model uses a graph structure to represent
data relationships. It allows many-to-many relationships, providing greater flexibility in how
data is connected.
 Records and Sets: Data in a network model is organized into records and sets. Records are
similar to rows in a relational table, and sets are used to define relationships between records,
akin to links in a graph.
 Owner-Member Relationships: The network model defines data relationships using owner-
member pairs. An owner record can be linked to multiple member records, and a member
record can belong to multiple owner records, facilitating complex relationships.
 Navigational Access: The network model supports navigational data access, where records
are accessed through predefined paths. This is different from relational models, which use
declarative query languages like SQL.
 Hierarchical and Non-Hierarchical Structures: The network model can represent both
hierarchical (tree-like) and non-hierarchical (graph-like) structures, providing flexibility in
data modeling.
 Ability to Merge Relationships: In this model, because of more relationships the data is
more related. It has an ability to manage one-to-one relationships as well as many-to-many
relationships.
 Many paths: There can be more than one path to the same record because of more
relationships. It makes data access fast and simple.
 Circular Linked List: The operations in this model are done with the help of the circular
linked list. The current position is maintained with the help of a program and navigates
through the records based on relationships.
Difference Between the Network Model and the Hierarchical Model

Feature Hierarchical Model Network Model

Structure Tree-like structure Graph structure

One-to-many (single parent, multiple Many-to-many (multiple parents and


Relationships
children) children)

Flexibility Less flexible More flexible

Data Access Single access path Multiple access paths

Higher redundancy due to rigid Lower redundancy due to shared


Redundancy
hierarchy relationships

Complexity Simpler to design and implement More complex to design and manage

Usage Suitable for simple, hierarchical data Suitable for complex, interconnected data
Scenario structures structures

Efficient for complex queries and data


Efficiency Efficient for hierarchical traversal
retrieval

Example Organizational chart Telecommunications network


4.4.3 Relational Model in DBMS
This model can be represented as a table with columns and rows. Each row is known as a tuple. Each
table of the column has a name or attribute. It is well knows in database technology because it is usually
used to represent real-world objects and the relationships between them. Some popular relational
databases are used nowadays like Oracle, Sybase, DB2, MySQL Server etc.
Following are the terminologies of Relational Model:

Relation Table

Tuple Row, Record

Attribute Column, Field

Domain It consists of set of legal values

Cardinality It consists of number of rows

Degree It contains number of columns

Relational data model is the primary data model, which is used widely around the world for data storage
and processing. This model is simple and it has all the properties and capabilities required to process data
with storage efficiency.
The relational model represents how data is stored in Relational Databases. A relational database consists
of a collection of tables, each of which is assigned a unique name.

 Relation: A relation is usually represented as a table, organized into rows and columns. A
relationship consists of multiple records.For example: student relation which contains tuples and
attributes.
 Tuple: The rows of a relation that contain the values corresponding to the attributes are called
tuples. For example: In the Student relation there are 5 tuples.

 Data Item:The smallest unit of data in the relation is the individual data item. It is stored at the
intersection of rows and columns are also known as cells.
 Domain: It contains a set of atomic values that an attribute can take. It could be accomplish
explicitly by listing all possible values or specifying conditions that all values in that domain must
be confirmed. For example: the domain of gender attributes is a set of data values "M" for male
and "F" for female. No database software fully supports domains typically allowing the users to
define very simple data types such as numbers, dates, characters etc.
 Attribute: The smallest unit of data in relational model is an attribute. It contains the name of a
column in a particular table. Each attribute Ai must have a domain, dom(Ai). For example:
Stu_No, S_Name, PHONE_NO, ADDRESS, Gender are the attributes of a student relation. In
relational databases a column entry in any row is a single value that contains exactly one item
only.

 Cardinality: The total number of rows at a time in a relation is called the cardinality of that
relation. For example: In a student relation, the total number of tuples in this relation is3 so the
cardinality of a relation is 3. The cardinality of a relation changes with time as more and more
tuples get added or deleted.
 Degree: The degree of association is called the total number of attributes in a relationship. The
relation with one attribute is called unary relation, with two attributes is known a binary relation
and with three attributes is known as ternary relation. For example: in the Student relation, the
total number of attributes is 5, so the degree of the relations is 5. The degree of a relation does
not change with time as tuples get added or deleted.

 Relational instance: In the relational database system, the relational instance is represented by a
finite set of tuples. Relation instances do not have duplicate tuples.
 Relational schema: A relational schema contains the name of the relation and name of all
columns or attributes.

 Relational key: In the relational key, each row has one or more attributes. It can identify the row
in the relation uniquely.

Properties of Relations
Name of the relation is distinct from all other relations.
 Each relation cell contains exactly one atomic (single) value
 Each attribute contains a distinct name
 Attribute domain has no significance
 tuple has no duplicate value
 Order of tuple can have a different sequence
 It also provides information about metadata.

Advantages of Relational Model:


Following are the various merits of relational model:

 This provides an abstract view of the data. It abstracts the physical structure from the logical
structure of data.

 This model is very easy to design. Tables can use different attributes as per requirements.
 The relational model supports data independence. In a relational database the data is stored in
tables so that we can modify the data without changing the physical structure.

 Relational database helps the user to use a query language to query the database.

 It offers more flexibility than other models.

 By moving sensitive attributes, we can also implement database security control and authorization
in a particular table into a separate relation with its authorization controls.

 Relational database helps the user to use a query language to query the database.

 A relational model consists of simple relationships. The characteristics of a database that make it
immune to certain maintenance problems have been developed in the context of relational models.

 It is useful for representing most real world objects and the relationships between them. It is very
easy to implement a relationship through the use of a composite key, so this model persistence
method dominates the market.

 Simple model: Relational Model is simple and easy to use in comparison to other languages.
 Flexible: Relational Model is more flexible than any other relational model present.
 Secure: Relational Model is more secure than any other relational model.
 Data Accuracy: Data is more accurate in the relational data model.
 Data Integrity: The integrity of the data is maintained in the relational model.
 Operations can be Applied Easily: It is better to perform operations in the relational model.

Disadvantages of Relational Model:


Most of the drawbacks of the relational database is not because of the shortcoming but because of the
way it is being implemented, we can avoid the drawbacks of the relational model by using proper
designing techniques and proper database standards are enforced.
Following are the various demerits of relational model:
The main disadvantage of relational models is that they do not support binary data for example: images,
documents, spreadsheets etc.
 The relational model can easily adapt to new hardware so incurs large hardware overhead.

 Relational databases use a simple mapping of logical tables to physical structures.

 This mostly limits performance and allows non-relational systems such as object oriented
management systems to perform better on specialised applications such as CAD, CAM etc.
 Enforcing data integrity in relational models is difficult because no single piece of hardware has
control over the data.

 The relational model is suitable for small databases but not suitable for complex databases
because the user needs to know the complex physical data storage details. So, while designing
the databases they don't come to light when they may cause problems. When a database grows it
will slow down the system and will result in performance degradation and data corruption.

 Relational Database Model is not very good for large databases.


 Sometimes, it becomes difficult to find the relation between tables.
 Because of the complex structure, the response time for queries is high

Characteristics of the Relational Model


 Data is represented in rows and columns called relations.
 Data is stored in tables having relationships between them called the Relational model.
 The relational model supports the operations like Data definition, Data manipulation, and
Transaction management.
 Each column has a distinct name and they are representing attributes.
 Each row represents a single entity.
Operations on Relational Model:
List of the following basic operations that can be performed on a relational model:

 Insertion Operation

 Deletion Operation

 Update Operation

 Retrieval Operation
Let's explain each operation one by one.
Insert operation: It is used to insert a new record in the table. Adding new records to the table is much
easier than other models. Data values will not be found in a relation when the following condition occurs:
If we try to insert a duplicate value for the field that is selected as a primary key.
 If we insert a NULL value in the attribute that contains primary key.

 If we try to enter a data value in the foreign key attribute that does not exist in corresponding
primary key attribute.

 If an attribute is assigned a value that does not exist in the corresponding domain.
DELETE operation: This operation is used to delete records from the table but problems arise when the
rows to be deleted have some attributes which are foreign key attributes.

Update operation: It is used to modify or change the data value of a record in a table. Updating an
attribute that is neither a primary key nor a foreign key requires only checking that the new value is of
the correct data type and domain. If we modify a data value of a primary key and foreign key attribute
then need to check:
 The modified value does not contain the value of the corresponding foreign key value.

 The new values must not already exist in the table.


Retrieval operation: It is used to save a record from a relation. This operation is very simple and
homogeneous.

Concepts
 Tables − In relational data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represents records and columns
represent the attributes.
 Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
 Relation instance − A finite set of tuples in the relational database system represents relation instance.
Relation instances do not have duplicate tuples.
 Relation schema − A relation schema describes the relation name (table name), attributes, and their
names.
 Relation key − Each row has one or more attributes, known as relation key, which can identify the
row in the relation (table) uniquely.
 Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.

4.4.4. Object-Oriented Data Model


In this model, data and their relationships are contained in a single structure which is referred to as an
object in this data model. In this, real-world problems are represented as objects with different attributes.
All objects have multiple relationships between them. Basically, it is a combination of Object Oriented
programming and a Relational Database Model.
The OODM is a better representation of real-world challenges. Both the data and the relationship are
contained into a single structure that’s known as an object in this model. We can now store audios,
pictures, videos, and other types of data in databases, which was previously impossible with the relational
approach (Although you can store video and audio in relational DB, it is generally advised not to store
them in the relational DB).
In Object Oriented Data Model, data and their relationships are contained in a single structure which is
referred as object in this data model.
Object Oriented Data Model = Combination of Object Oriented Programming + Relational database
model
Objects-
An object is an abstraction of a real world entity or we can say it is an instance of class. Objects
encapsulates data and code into a single unit which provide data abstraction by hiding the implementation
details from the user. For example: Instances of student, doctor, engineer in above figure.
 Attribute – An attribute describes the properties of object. For example: Object is
STUDENT and its attribute are Roll no, Branch, Setmarks() in the Student class.
 Methods – Method represents the behavior of an object. Basically, it represents the real-
world action. For example: Finding a STUDENT marks in above figure as Setmarks().
 Class –A class is a collection of similar objects with shared structure i.e. attributes and
behavior i.e. methods.
 Object: An object is an instance of class. For example: Person, Student, Doctor, Engineer in
above figure.
class student
{
char Name[20];
int roll_no;
--
--
public:
void search();
void update();
}

Inheritance – By using inheritance, new class can inherit the attributes and methods of the old class i.e.
base class. For example: as classes Student, Doctor and Engineer are inherited from the base class Person.
Advantages of Object Oriented Data Model :
 Codes can be reused due to inheritance.
 Easily understandable.
 Cost of maintenance can reduced due to reusability of attributes and functions because of
inheritance.
Disadvantages of Object Oriented Data Model:
• It is not properly developed so not accepted by users easily.
4.4.5. Float Data Model
The float data model basically consists of a two-dimensional array of data models that do not contain any
duplicate elements in the array. This data model has one drawback it cannot store a large amount of data
that is the tables can not be of large size.
4.4.6. Context Data Model
The Context data model is simply a data model which consists of more than one data model. For example,
the Context data model consists of ER Model, Object-Oriented Data Model, etc. This model allows users
to do more than one thing which each individual data model can do.
4.4.7. Semi-Structured Data Model
Semi-Structured data models deal with the data in a flexible way. Some entities may have extra attributes
and some entities may have some missing attributes. Basically, you can represent data here in a flexible
way.

Advantages of Data Models


 Data Models help us in representing data accurately.
 It helps us in finding the missing data and also in minimizing Data Redundancy.
 Data Model provides data security in a better way.
 The data model should be detailed enough to be used for building the physical database.
 The information in the data model can be used for defining the relationship between tables, primary
and foreign keys, and stored procedures.

Disadvantages of Data Models


 In the case of a vast database, sometimes it becomes difficult to understand the data model.
 You must have the proper knowledge of SQL to use physical models.
 Even smaller change made in structure require modification in the entire application.
 There is no set data manipulation language in DBMS.
 To develop Data model one should know physical data stored characteristics.

5. Key
A key refers to an attribute/a set of attributes that help us identify a row (or tuple) uniquely in a table (or
relation). A key is also used when we want to establish relationships between the different columns and
tables of a relational database. The individual values present in a key are commonly referred to as key
values.
Keys play a vital role in locating distinct records or rows within a table, and they serve as a means to find
a unique record or row within the database. In this article let’s explore the importance of keys in DBMS
and its major types.
Keys are utilized in a database to establish different types of integrity constraints. Conversely, a table is
a compilation of records pertaining to diverse events within a given relation. Many records may exist
within these tables, some of which could be duplicates.
Hence, it becomes necessary to uniquely and distinctly identify each of these records, eliminating any
duplications. This predicament is effectively resolved through the implementation of keys. Keys in
DBMS play a pivotal role in ensuring the uniqueness and separation of records, alleviating the associated
challenges.
Different Types of Database Keys
 Candidate Key
 Primary Key
 Super Key
 Alternate Key
 Foreign Key
 Composite Key

Candidate Key: A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys
are as strong as the primary key.
It must contain unique values.
 It can contain NULL values.
 Every table must have at least a single candidate key.
 A table can have multiple candidate keys but only one primary key.
 The value of the Candidate Key is unique and may be null for a tuple.
 There can be more than one candidate key in a relationship.
Primary Key
There can be more than one candidate key in relation out of which one can be chosen as the primary key.
For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation STUDENT but
STUD_NO can be chosen as the primary key (only one out of many candidate keys).
It is a unique key.
 It can identify only one tuple (a record) at a time.
 It has no duplicate values, it has unique values.
 It cannot be NULL.
 Primary keys are not necessarily to be a single column; more than one column can also be a primary
key for a table.

SuperKey
The set of attributes that can uniquely identify a tuple is known as Super Key. For Example, STUD_NO,
(STUD_NO, STUD_NAME), etc. A super key is a group of single or multiple keys that identifies rows
in a table. It supports NULL values.
 Adding zero or more attributes to the candidate key generates the super key.
 A candidate key is a super key but vice versa is not true.
 Super Key values may also be NULL.

Alternate Key
The candidate key other than the primary key is called an alternate key.
 All the keys which are not primary keys are called alternate keys.
 It is a secondary key.
 It contains two or more fields to identify two or more records.
 These values are repeated.
Foreign Key
If an attribute can only take the values which are present as values of some other attribute, it will be a
forein key to the attribute to which it refers. The relation which is being referenced is called referenced
relation and the corresponding attribute is called referenced attribute. The referenced attribute of the
referenced relation should be the primary key to it.
 It is a key it acts as a primary key in one table and it acts as secondary key in another table.
 It combines two or more relations (tables) at a time.
 They act as a cross-reference between the tables.

Composite Key
Sometimes, a table might not have a single column/attribute that uniquely identifies all the records of a
table. To uniquely identify rows of a table, a combination of two or more columns/attributes can be used.
It still can give duplicate values in rare cases. So, we need to find the optimal set of attributes that can
uniquely identify rows in a table.
 It acts as a primary key if there is no primary key in a table
 Two or more attributes are used together to make a composite key.
 Different combinations of attributes may give different accuracy in terms of identifying the rows
uniquely.
6. Integrity Constraints
 Integrity constraints are a set of rules. It is used to maintain the quality of information.
 Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
 Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraints:


A. Domain Constraints
These are defined as the definition of valid set of values for an attribute. The data type of domain include
string, char, time, integer, date, currency etc. The value of the attribute must be available in comparable
domains.
Example:

B. Not-Null Constraints
It specifies that within a tuple, attributes overs which not-null constraint is specified must not contain
any null value.
C. Entity Integrity Constraints
Entity integrity constraints state that primary key can never contain null value because primary key is
used to determine individual rows in a relation uniquely, if primary key contains null value then we
cannot identify those rows. A table can contain null value in it except primary key field.

D. Key Constraints
Keys are the entity set that are used to identify an entity within its entity set uniquely. An entity set can
contain multiple keys, bit out of them one key will be primary key. A primary key is always unique, it
does not contain any null value in table.
E. Primary Key Constraints
It states that the primary key attributes are required to be unique and not null. That is, primary key
attributes of a relation must not have null values and primary key attributes of two tuples must never be
same. This constraint is specified on database schema to the primary key attributes to ensure that no two
tuples are same.
F. Referential integrity constraints
A referential integrity constraint is specified between two tables. In the Referential integrity constraints,
if a foreign key in Table 1 refers to the Primary Key of Table 2, then every value of the Foreign Key in
Table 1 must be null or be available in Table 2.

7. Data Manipulation
Data manipulation is the method of organizing data to make it easier to read or more designed or
structured. For instance, a collection of any kind of data could be organized in alphabetical order so that
it can be understood easily. On the other hand, it can be difficult to find information about any particular
employee in an organization if all the employees' information is not organized. Therefore, all the
employee's information could be organized in alphabetical order that makes it easier to find information
easily of any individual employee. Data manipulation helps website owners to monitor their sources of
traffic and their most popular pages. Hence, it is frequently used on web server logs.
Data manipulation is also used by accounting users or similar fields to organized data in order to figure
out product costs, future tax obligations, pricing patterns, etc.
The DML is used to manipulate data, which is a programming language. It short for Data Manipulation
Language that helps to modify data like adding, removing, and altering databases. It means that changing
the information in a way that can be read easily.
Data manipulation is a fundamental step in data analysis, data mining, and data preparation for
machine learning and is essential for making informed decisions and drawing conclusions from raw data.
To make use of these data points, we perform data manipulation. It involves:
 Creating a database
 SQL for structured data manipulation
 NoSQL languages like MongoDB for unstructured data manipulation.

Steps Required to Perform Data Manipulation


The steps we perform in Data Manipulation are:
 Mine the data and create a database: The data is first mined from the internet, either with API
requests or Web Scraping, and these data points are structured into a database for further processing.
 Perform data preprocessing: The Data acquired from mining is still a little rough and may have
incorrect values, missing values, and some outliers. In this step, all these problems are taken care of,
either by deleting the rows or, by adding the mean values in all missing areas (Note: This is only in
the case of numerical data.)
 Arrange the data: After the data has been preprocessed, it is arranged accordingly to make
analysis of data easier.
 Transform the data: The data in question is transformed, either by changing datatypes or
transposing data in some cases.
 Perform Data Analysis: Work with the data to view the result. Create visualizations or an output
column to view the output.

Difference between Data manipulation and Data modification


Both terms, data manipulation and data modification sound similar; however, they are not
interchangeable. Generally, data manipulation is the act of organizing data to make it cooler to read or
additional refined. On the other hand, data modification is the process of changing the existing data
values or data itself.

Operations of Data Manipulation


Data Manipulation follows the 4 main operations, CRUD (Create, Read, Update and Delete). It is used
in many industries to improve the overall output.
In most DML, there is some version of the CRUD operations where:
 Create: To create a new data point or database.
 Read: Read the data to understand where we need to perform data manipulation.
 Update: Update missing/wrong data points with the correct ones to encourage data to be
streamlined.
 Delete: Deletes the rows with missing data points/ erroneous/ misclassified data.

SQL commands for Data Manipulation:


• INSERT: The INSERT command is used to add new records to a table.

• UPDATE: The UPDATE command is used to modify existing records in a table.

• DELETE: The DELETE command is used to remove records from a table.

• CALL: The CALL command is used to execute a stored procedure. In this example,
UpdateEmployeeSalary would be a stored procedure that takes two parameters: the employee ID
and the new salary.

• EXPLAIN CALL: The EXPLAIN CALL command is used to get an execution plan for a stored
procedure call. This command would return details about how the stored procedure will be
executed, such as the steps involved and the resources used.
• LOCK: The LOCK command is used to lock a table or tables to prevent other users from making
changes. This command locks the employees table, allowing only the session that issued the lock
to modify it.

Use of Data Manipulation


• Increased Efficiency.
• Less Room for Error.
• Easier to Analyze data.
• Fewer chances for unexpected results.

You might also like