0% found this document useful (0 votes)
3 views

DF_Unit3_DataBaseManagement

Data is a collection of distinct units of information that can be stored in various forms, while a database is an organized collection of data that allows for easy access and management. The document outlines the evolution of databases from flat-file systems to modern relational and NoSQL databases, highlighting their structures, advantages, and disadvantages. It also discusses various types of databases, including centralized, distributed, and cloud databases, as well as the role of Database Management Systems (DBMS) and RDBMS in managing data.

Uploaded by

hvnaveen94
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

DF_Unit3_DataBaseManagement

Data is a collection of distinct units of information that can be stored in various forms, while a database is an organized collection of data that allows for easy access and management. The document outlines the evolution of databases from flat-file systems to modern relational and NoSQL databases, highlighting their structures, advantages, and disadvantages. It also discusses various types of databases, including centralized, distributed, and cloud databases, as well as the role of Database Management Systems (DBMS) and RDBMS in managing data.

Uploaded by

hvnaveen94
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

What is Data?

Data is a collection of a distinct small unit of information. It can be used in a variety


of forms like text, numbers, media, bytes, etc. it can be stored in pieces of paper or
electronic memory, etc.

Word 'Data' is originated from the word 'datum' that means 'single piece of
information.' It is plural of the word datum.

In computing, Data is information that can be translated into a form for efficient
movement and processing. Data is interchangeable.

What is Database?
A database is an organized collection of data, so that it can be easily accessed and
managed.

You can organize data into tables, rows, columns, and index it to make it easier to
find relevant information.

Database handlers create a database in such a way that only one set of software
program provides access of data to all the users.

The main purpose of the database is to operate a large amount of information by


storing, retrieving, and managing data.

There are many dynamic websites on the World Wide Web nowadays which are
handled through databases. For example, a model that checks the availability of
rooms in a hotel. It is an example of a dynamic website that uses a database.

There are many databases available like MySQL, Sybase, Oracle, MongoDB,
Informix, PostgreSQL, SQL Server, etc.

Modern databases are managed by the database management system (DBMS).

SQL or Structured Query Language is used to operate on the data stored in a


database. SQL depends on relational algebra and tuple relational calculus.

A cylindrical structure is used to display the image of a database.


Evolution of Databases
The database has completed more than 50 years of journey of its evolution from flat-
file system to relational and objects relational systems. It has gone through several
generations.

The Evolution
File-Based

1968 was the year when File-Based database were introduced. In file-based
databases, data was maintained in a flat file. Though files have many advantages,
there are several limitations.

One of the major advantages is that the file system has various access methods, e.g.,
sequential, indexed, and random.

It requires extensive programming in a third-generation language such as COBOL,


BASIC.

Hierarchical Data Model

1968-1980 was the era of the Hierarchical Database. Prominent hierarchical database
model was IBM's first DBMS. It was called IMS (Information Management System).

In this model, files are related in a parent/child manner.

Below diagram represents Hierarchical Data Model. Small circle represents objects.
Like file system, this model also had some limitations like complex implementation,
lack structural independence, can't easily handle a many-many relationship, etc.

Network data model


Charles Bachman developed the first DBMS at Honeywell called Integrated Data
Store (IDS). It was developed in the early 1960s, but it was standardized in 1971 by
the CODASYL group (Conference on Data Systems Languages).

In this model, files are related as owners and members, like to the common network
model.

Network data model identified the following components:

o Network schema (Database organization)


o Sub-schema (views of database per user)
o Data management language (procedural)

This model also had some limitations like system complexity and difficult to design
and maintain.

Relational Database

1970 - Present: It is the era of Relational Database and Database Management. In


1970, the relational model was proposed by E.F. Codd.

Relational database model has two main terminologies called instance and schema.

The instance is a table with rows or columns

Schema specifies the structure like name of the relation, type of each column and
name.

This model uses some mathematical concept like set theory and predicate logic.

The first internet database application had been created in 1995.

During the era of the relational database, many more models had introduced like
object-oriented model, object-relational model, etc.

Cloud database
Cloud database facilitates you to store, manage, and retrieve their structured,
unstructured data via a cloud platform. This data is accessible over the Internet.
Cloud databases are also called a database as service (DBaaS) because they are
offered as a managed service.

Some best cloud options are:

o AWS (Amazon Web Services)


o Snowflake Computing
o Oracle Database Cloud Services
o Microsoft SQL server
o Google cloud spanner

Advantages of cloud database

Lower costs

Generally, company provider does not have to invest in databases. It can maintain
and support one or more data centers.

Automated

Cloud databases are enriched with a variety of automated processes such as


recovery, failover, and auto-scaling.

Increased accessibility

You can access your cloud-based database from any location, anytime. All you need
is just an internet connection.

NoSQL Database
A NoSQL database is an approach to design such databases that can accommodate a
wide variety of data models. NoSQL stands for "not only SQL." It is an alternative to
traditional relational databases in which data is placed in tables, and data schema is
perfectly designed before the database is built.

NoSQL databases are useful for a large set of distributed data.

Some examples of NoSQL database system with their category are:

o MongoDB, CouchDB, Cloudant (Document-based)


o Memcached, Redis, Coherence (key-value store)
o HBase, Big Table, Accumulo (Tabular)
Advantage of NoSQL
High Scalability

NoSQL can handle an extensive amount of data because of scalability. If the data
grows, NoSQL database scale it to handle that data in an efficient manner.

High Availability

NoSQL supports auto replication. Auto replication makes it highly available because,
in case of any failure, data replicates itself to the previous consistent state.

Disadvantage of NoSQL
Open source

NoSQL is an open-source database, so there is no reliable standard for NoSQL yet.

Management challenge

Data management in NoSQL is much more complicated than relational databases. It


is very challenging to install and even more hectic to manage daily.

GUI is not available

GUI tools for NoSQL database are not easily available in the market.

Backup

Backup is a great weak point for NoSQL databases. Some databases, like MongoDB,
have no powerful approaches for data backup.

The Object-Oriented Databases


The object-oriented databases contain data in the form of object and classes. Objects
are the real-world entity, and types are the collection of objects. An object-oriented
database is a combination of relational model features with objects oriented
principles. It is an alternative implementation to that of the relational model.

Object-oriented databases hold the rules of object-oriented programming. An


object-oriented database management system is a hybrid application.

The object-oriented database model contains the following properties.

Object-oriented programming properties


o Objects
o Classes
o Inheritance
o Polymorphism
o Encapsulation

Relational database properties

o Atomicity
o Consistency
o Integrity
o Durability
o Concurrency
o Query processing

Graph Databases
A graph database is a NoSQL database. It is a graphical representation of data. It
contains nodes and edges. A node represents an entity, and each edge represents a
relationship between two edges. Every node in a graph database represents a unique
identifier.

Graph databases are beneficial for searching the relationship between data because
they highlight the relationship between relevant data.

Graph databases are very useful when the database contains a complex relationship
and dynamic schema.

It is mostly used in supply chain management, identifying the source of IP


telephony.

DBMS (Data Base Management System)


Database management System is software which is used to store and retrieve the
database. For example, Oracle, MySQL, etc.; these are some popular DBMS tools.

o DBMS provides the interface to perform the various operations like creation,
deletion, modification, etc.
o DBMS allows the user to create their databases as per their requirement.
o DBMS accepts the request from the application and provides specific data
through the operating system.
o DBMS contains the group of programs which acts according to the user
instruction.
o It provides security to the database.

Advantage of DBMS
Controls redundancy

It stores all the data in a single database file, so it can control data redundancy.

Data sharing

An authorized user can share the data among multiple users.

Backup

It providesBackup and recovery subsystem. This recovery system creates automatic


data from system failure and restores data if required.

Multiple user interfaces

It provides a different type of user interfaces like GUI, application interfaces.

Disadvantage of DBMS
Size

It occupies large disk space and large memory to run efficiently.

Cost

DBMS requires a high-speed data processor and larger memory to run DBMS
software, so it is costly.

Complexity
DBMS creates additional complexity and requirements.

RDBMS (Relational Database Management System)


The word RDBMS is termed as 'Relational Database Management System.' It is
represented as a table that contains rows and column.

RDBMS is based on the Relational model; it was introduced by E. F. Codd.

A relational database contains the following components:

o Table
o Record/ Tuple
o Field/Column name /Attribute
o Instance
o Schema
o Keys

An RDBMS is a tabular DBMS that maintains the security, integrity, accuracy, and
consistency of the data

Types of Databases
There are various types of databases used for storing different varieties of data:

1) Centralized Database
It is the type of database that stores data at a centralized database system. It
comforts the users to access the stored data from different locations through several
applications. These applications contain the authentication process to let users
access data securely. An example of a Centralized database can be Central Library
that carries a central database of each library in a college/university.

Advantages of Centralized Database

o It has decreased the risk of data management, i.e., manipulation of data will not affect
the core data.
o Data consistency is maintained as it manages data in a central repository.
o It provides better data quality, which enables organizations to establish data
standards.
o It is less costly because fewer vendors are required to handle the data sets.

Disadvantages of Centralized Database

o The size of the centralized database is large, which increases the response time for
fetching the data.
o It is not easy to update such an extensive database system.
o If any server failure occurs, entire data will be lost, which could be a huge loss.

2) Distributed Database
Unlike a centralized database system, in distributed systems, data is distributed
among different database systems of an organization. These database systems are
connected via communication links. Such links help the end-users to access the data
easily. Examples of the Distributed database are Apache Cassandra, HBase, Ignite,
etc.

We can further divide a distributed database system into:

o Homogeneous DDB: Those database systems which execute on the same operating
system and use the same application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which execute on different operating
systems under different application procedures, and carries different hardware
devices.

Advantages of Distributed Database

o Modular development is possible in a distributed database, i.e., the system can be


expanded by including new computers and connecting them to the distributed
system.
o One server failure will not affect the entire data set.

3) Relational Database
This database is based on the relational data model, which stores data in the form of
rows(tuple) and columns(attributes), and together forms a table(relation). A relational
database uses SQL for storing, manipulating, as well as maintaining the data. E.F.
Codd invented the database in 1970. Each table in the database carries a key that
makes the data unique from others. Examples of Relational databases are MySQL,
Microsoft SQL Server, Oracle, etc.

Properties of Relational Database


There are following four commonly known properties of a relational model known as
ACID properties, where:

A means Atomicity: This ensures the data operation will complete either with
success or with failure. It follows the 'all or nothing' strategy. For example, a
transaction will either be committed or will abort.

C means Consistency: If we perform any operation over the data, its value before
and after the operation should be preserved. For example, the account balance
before and after the transaction should be correct, i.e., it should remain conserved.

I means Isolation: There can be concurrent users for accessing data at the same
time from the database. Thus, isolation between the data should remain isolated. For
example, when multiple transactions occur at the same time, one transaction effects
should not be visible to the other transactions in the database.

D means Durability: It ensures that once it completes the operation and commits
the data, data changes should remain permanent.

4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for storing a wide range of
data sets. It is not a relational database as it stores data not only in tabular form but
in several different ways. It came into existence when the demand for building
modern applications increased. Thus, NoSQL presented a wide variety of database
technologies in response to the demands. We can further divide a NoSQL database
into the following four types:

a. Key-value storage: It is the simplest type of database storage where it stores every
single item as a key (or attribute name) holding its value, together.
b. Document-oriented Database: A type of database used to store data as JSON-like
document. It helps developers in storing data by using the same document-model
format as used in the application code.
c. Graph Databases: It is used for storing vast amounts of data in a graph-like
structure. Most commonly, social networking websites use the graph database.
d. Wide-column stores: It is similar to the data represented in relational databases.
Here, data is stored in large columns together, instead of storing in rows.

Advantages of NoSQL Database

o It enables good productivity in the application development as it is not required to


store data in a structured format.
o It is a better option for managing and handling large data sets.
o It provides high scalability.
o Users can quickly access data from the database through key-value.

5) Cloud Database
A type of database where data is stored in a virtual environment and executes over
the cloud computing platform. It provides users with various cloud computing
services (SaaS, PaaS, IaaS, etc.) for accessing the database. There are numerous cloud
platforms, but the best options are:

o Amazon Web Services(AWS)


o Microsoft Azure
o Kamatera
o PhonixNAP
o ScienceSoft
o Google Cloud SQL, etc.

6) Object-oriented Databases
The type of database that uses the object-based data model approach for storing
data in the database system. The data is represented and stored as objects which are
similar to the objects used in the object-oriented programming language.

7) Hierarchical Databases
It is the type of database that stores data in the form of parent-children relationship
nodes. Here, it organizes data in a tree-like structure.
Data get stored in the form of records that are connected via links. Each child record
in the tree will contain only one parent. On the other hand, each parent record can
have multiple child records.

8) Network Databases
It is the database that typically follows the network data model. Here, the
representation of data is in the form of nodes connected via links between them.
Unlike the hierarchical database, it allows each record to have multiple children and
parent nodes to form a generalized graph structure.

9) Personal Database
Collecting and storing data on the user's system defines a Personal Database. This
database is basically designed for a single user.

Advantage of Personal Database

o It is simple and easy to handle.


o It occupies less storage space as it is small in size.

10) Operational Database


The type of database which creates and updates the database in real-time. It is
basically designed for executing and handling the daily data operations in several
businesses. For example, An organization uses operational databases for managing
per day transactions.

11) Enterprise Database


Large organizations or enterprises use this database for managing a massive amount
of data. It helps organizations to increase and improve their efficiency. Such a
database allows simultaneous access to users.

Advantages of Enterprise Database:

o Multi processes are supportable over the Enterprise database.


o It allows executing parallel queries on the system.

What is RDBMS
RDBMS stands for Relational Database Management Systems..
All modern database management systems like SQL, MS SQL Server, IBM DB2,
ORACLE, My-SQL and Microsoft Access are based on RDBMS.

It is called Relational Data Base Management System (RDBMS) because it is based on


relational model introduced by E.F. Codd.

How it works
Data is represented in terms of tuples (rows) in RDBMS.

Relational database is most commonly used database. It contains number of tables


and each table has its own primary key.

Due to a collection of organized set of tables, data can be accessed easily in RDBMS.

Brief History of RDBMS


During 1970 to 1972, E.F. Codd published a paper to propose the use of relational
database model.

RDBMS is originally based on that E.F. Codd's relational model invention.

What is table
The RDBMS database uses tables to store data. A table is a collection of related data
entries and contains rows and columns to store data.

A table is the simplest example of data storage in RDBMS.

Let's see the example of student table.

ID Name AGE COURSE

1 Ajeet 24 B.Tech

2 aryan 20 C.A

3 Mahesh 21 BCA

4 Ratan 22 MCA
5 Vimal 26 BSC

What is field
Field is a smaller entity of the table which contains specific information about every
record in the table. In the above example, the field in the student table consist of id,
name, age, course.

What is row or record


A row of a table is also called record. It contains the specific information of each
individual entry in the table. It is a horizontal entity in the table. For example: The
above table contains 5 records.

Let's see one record/row in the table.

1 Ajeet 24 B.Tech

What is column
A column is a vertical entity in the table which contains all information associated
with a specific field in a table. For example: "name" is a column in the above table
which contains all information about student's name.

Ajeet

Aryan

Mahesh

Ratan

Vimal

NULL Values
The NULL value of the table specifies that the field has been left blank during record
creation. It is totally different from the value filled with zero or a field that contains
space.

Data Integrity
There are the following categories of data integrity exist with each RDBMS:

Entity integrity: It specifies that there should be no duplicate rows in a table.

Domain integrity: It enforces valid entries for a given column by restricting the type,
the format, or the range of values.

Referential integrity: It specifies that rows cannot be deleted, which are used by
other records.

User-defined integrity: It enforces some specific business rules that are defined by
users. These rules are different from entity, domain or referential integrity.

Difference between DBMS and RDBMS


Although DBMS and RDBMS both are used to store information in physical database
but there are some remarkable differences between them.

The main differences between DBMS and RDBMS are given below:

No. DBMS RDBMS

1) DBMS applications store data as file. RDBMS applications store data in a tabular form.

2) In DBMS, data is generally stored in In RDBMS, the tables have an identifier called
either a hierarchical form or a primary key and the data values are stored in the
navigational form. form of tables.

3) Normalization is not present in DBMS. Normalization is present in RDBMS.

4) DBMS does not apply any security with RDBMS defines the integrity constraint for the
regards to data manipulation. purpose of ACID (Atomocity, Consistency, Isolation
and Durability) property.
5) DBMS uses file system to store data, so in RDBMS, data values are stored in the form of
there will be no relation between the tables, so a relationship between these data values
tables. will be stored in the form of a table as well.

6) DBMS has to provide some uniform RDBMS system supports a tabular structure of the
methods to access the stored data and a relationship between them to access the
information. stored information.

7) DBMS does not support distributed RDBMS supports distributed database.


database.

8) DBMS is meant to be for small RDBMS is designed to handle large amount of


organization and deal with small data. data. it supports multiple users.
it supports single user.

9) Examples of DBMS are file Example of RDBMS are mysql, postgre, sql
systems, xml etc. server, oracle etc.

After observing the differences between DBMS and RDBMS, you can say that RDBMS
is an extension of DBMS. There are many software products in the market today who
are compatible for both DBMS and RDBMS. Means today a RDBMS application is
DBMS application and vice-versa.

DBMS:
A database approach is a well-organized collection of data that are related in a
meaningful way which can be accessed by different users but stored only once in a
system. The various operations performed by the DBMS system are: Insertion,
deletion, selection, sorting etc.
In the above figure,

In the above figure, duplication of data is reduced due to centralization of data.

There are the following differences between DBMS and File systems:

Basis DBMS Approach File System Approach

Meaning DBMS is a collection of data. In The file system is a


DBMS, the user is not required collection of data. In this
to write the procedures. system, the user has to
write the procedures for
managing the database.

Sharing of Due to the centralized approach, Data is distributed in


data data sharing is easy. many files, and it may be
of different formats, so it
isn't easy to share data.

Data DBMS gives an abstract view of The file system provides


Abstraction data that hides the details. the detail of the data
representation and
storage of data.

Security and DBMS provides a good It isn't easy to protect a


Protection protection mechanism. file under the file system.

Recovery DBMS provides a crash recovery The file system doesn't


Mechanism mechanism, i.e., DBMS protects have a crash mechanism,
the user from system failure. i.e., if the system crashes
while entering some data,
then the content of the
file will be lost.

Manipulation DBMS contains a wide variety of The file system can't


Techniques sophisticated techniques to store efficiently store and
and retrieve the data. retrieve the data.

Concurrency DBMS takes care of Concurrent In the File system,


Problems access of data using some form concurrent access has
of locking. many problems like
redirecting the file while
deleting some information
or updating some
information.

Where to use Database approach used in large File system approach used
systems which interrelate many in large systems which
files. interrelate many files.

Cost The database system is The file system approach


expensive to design. is cheaper to design.

Data Due to the centralization of the In this, the files and


Redundancy database, the problems of data application programs are
and redundancy and inconsistency created by different
Inconsistency are controlled. programmers so that
there exists a lot of
duplication of data which
may lead to inconsistency.

Structure The database structure is The file system approach


complex to design. has a simple structure.

Data In this system, Data In the File system


Independence Independence exists, and it can approach, there exists no
be of two types. Data Independence.
o Logical Data
Independence
o Physical Data
Independence

Integrity Integrity Constraints are easy to Integrity Constraints are


Constraints apply. difficult to implement in
file system.

Data Models In the database approach, 3 In the file system


types of data models exist: approach, there is no
o Hierarchal data models
concept of data models
exists.
o Network data models

o Relational data models

Flexibility Changes are often a necessity to The flexibility of the


the content of the data stored in system is less as
any system, and these changes compared to the DBMS
are more easily with a database approach.
approach.

Examples Oracle, SQL Server, Sybase etc. Cobol, C++ etc.

DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server architecture
is used to deal with a large number of PCs, web servers, database servers and other
components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are
connected via the network.
o DBMS architecture depends upon how users are connected to the database to get
their request done.

Types of DBMS Architecture


Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.

1-Tier Architecture

o In this architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide
a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.

2-Tier Architecture

o The 2-Tier architecture is same as basic client-server. In the two-tier architecture,


applications on the client end can directly communicate with the database at the
server side. For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and
transaction management.
o To communicate with the DBMS, client-side application establishes a connection with
the server side.

Fig: 2-tier Architecture

3-Tier Architecture

o The 3-Tier architecture contains another layer between the client and server. In this
architecture, client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further
communicates with the database system.
o End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.
Fig: 3-tier Architecture

Data Models
Data Model is the modeling of the data description, data semantics, and consistency
constraints of the data. It provides the conceptual tools for describing the design of a
database at each level of data abstraction. Therefore, there are following four data
models used for understanding the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows
and columns within a table. Thus, a relational model uses tables for representing data
and in-between relationships. Tables are also called relations. This model was initially
described by Edgar F. Codd, in 1969. The relational data model is the widely used
model which is primarily used by commercial data processing applications.

2) Entity-Relationship Data Model: An ER model is the logical representation of


data as objects and relationships among them. These objects are known as entities,
and relationship is an association among these entities. This model was designed by
Peter Chen and published in 1976 papers. It was widely used in database designing.
A set of attributes describe the entities. For example, student_name, student_id
describes the 'student' entity. A set of the same type of entities is known as an 'Entity
set', and the set of the same type of relationships is known as 'relationship set'.

3) Object-based Data Model: An extension of the ER model with notions of


functions, encapsulation, and object identity, as well. This model supports a rich type
system that includes structured and collection types. Thus, in 1980s, various database
systems following the object-oriented approach were developed. Here, the objects
are nothing but the data carrying its properties.

4) Semistructured Data Model: This type of data model is different from the other
three data models (explained above). The semistructured data model allows the data
specifications at places where the individual data items of the same type may have
different attributes sets. The Extensible Markup Language, also known as XML, is
widely used for representing the semistructured data. Although XML was initially
designed for including the markup information to the text document, it gains
importance because of its application in the exchange of data.

Relational Model concept


Relational model can represent as a table with columns and rows. Each row is known
as a tuple. Each table of the column has a name or attribute.

Domain: It contains a set of atomic values that an attribute can take.

Attribute: It contains the name of a column in a particular table. Each attribute Ai


must have a domain, dom(Ai)

Relational instance: In the relational database system, the relational instance is


represented by a finite set of tuples. Relation instances do not have duplicate tuples.

Relational schema: A relational schema contains the name of the relation and name
of all columns or attributes.
Relational key: In the relational key, each row has one or more attributes. It can
identify the row in the relation uniquely.

Example: STUDENT Relation

NAME ROLL_NO PHONE_NO ADDRESS AGE

Ram 14795 7305758992 Noida 24

Shyam 12839 9026288936 Delhi 35

Laxman 33289 8583287182 Gurugram 20

Mahesh 27857 7086819134 Ghaziabad 27

Ganesh 17282 9028 9i3988 Delhi 40

o In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the
attributes.
o The instance of schema STUDENT has 5 tuples.
o t3 = <Laxman, 33289, 8583287182, Gurugram, 20>

Properties of Relations
o Name of the relation is distinct from all other relations.
o Each relation cell contains exactly one atomic (single) value
o Each attribute contains a distinct name
o Attribute domain has no significance
o tuple has no duplicate value
o Order of tuple can have a different sequence

You might also like