Advanced DBMS Notes
Advanced DBMS Notes
Unit 1
What is Database
The database is a collection of inter-related data which is used to retrieve,
insert and delete the data efficiently. It is also used to organize the data in the
form of a table, schema, views, and reports, etc.
Using the database, you can easily retrieve, insert, and delete the information.
Advantages of DBMS
o Controls database redundancy: It can control data redundancy
because it stores all the data in one single database file and that
recorded data is placed in the database.
o Data sharing: In DBMS, the authorized users of an organization can
share the data among multiple users.
o Easily Maintenance: It can be easily maintainable due to the
centralized nature of the database system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create
automatic backup of data from hardware and software failures and
restores the data if required.
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data
processor and large memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them
efficiently.
o Complexity: Database system creates additional complexity and
requirements.
o Higher impact of failure: Failure is highly impacted the database
because in most of the organization, all the data stored in a single
database and if the database is damaged due to electric failure or
database corruption then the data may be lost forever.
DBMS Architecture
DBMS architecture describes the structure of Database and how the users
are connected to a specific database system.
Architecture affects the performance of the database.
DBMS Architecture helps users to get their requests done while
connecting to the database.
We choose database architecture depending on several factors like
the size of the database, number of users, and relationships between
the users.
1-Tier Architecture
2-Tier Architecture
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and
server. In this architecture, client can't directly communicate with the
server.
o The application on the client-end interacts with an application server
which further communicates with the database system.
o End user has no idea about the existence of the database beyond the
application server. The database also has no idea about any other user
beyond the application.
o The 3-Tier architecture is used in case of large web application.
Data Models
Data Model is the modeling of the meta data like: data description, data
semantics, and consistency constraints of the data.
Data models describe how a database’s logical structure is represented.
Data models specify how data is linked to one another, as well as how it is
handled and stored within the system.
It provides the conceptual tools for describing the design of a database at
each level of data abstraction.
1) Relational Data Model: This type of model designs the data in the form of
rows and columns within a table. Thus, a relational model uses tables for
representing data and in-between relationships. Tables are also called
relations. The relational data model is the widely used model which is
primarily used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation
of data as objects and relationships among them. These objects are known as
entities, and relationship is an association among these entities. It was widely
used in database designing. A set of attributes describe the entities. For
example, student_name, student_id describes the 'student' entity.
3) Object-based Data Model: An extension of the ER model with notions of
functions, encapsulation, and object identity, as well. This model supports a
rich type system that includes structured and collection types.
4) Semistructured Data Model: This type of data model is different from the
other three data models (explained above). The semistructured data model
allows the data specifications at places where the individual data items of the
same type may have different attributes sets. The Extensible Markup
Language, also known as XML, is widely used for representing the
semistructured data
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step
process to obtain the result of the query. The main purpose of using
Relational Algebra is to define operators that transform one or more
input relations into an output relation. It uses operators to perform queries.
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear
in the result. Rest of the attributes are eliminated from the table.
o It is denoted by ∏.
3. Union Operation:
o Suppose there are two tuples R and S. The union operation contains all
the tuples that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
4. Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in both R & S.
o It is denoted by intersection ∩.
5. Set Difference:
o Suppose there are two tuples R and S. The set intersection operation
contains all tuples that are in R but not in S.
o It is denoted by intersection minus (-).
6. Cartesian product
o The Cartesian product is used to combine each row in one table with
each row in the other table. It is also known as a cross product.
o It is denoted by X.
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted
by rho (ρ).
SQL
o SQL stands for Structured Query Language. It is used for storing and
managing data in relational database management system (RDMS).
o It is a standard language for Relational Database System. It enables a
user to create, read, update and delete relational databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server
use SQL as their standard database language.
o SQL allows users to query the database in a number of ways, using
English-like statements
Need of SQL :
It is widely used in the Business Intelligence tool.
Data Science tools depend highly on SQL. Big data tools such as Spark,
Impala are dependent on SQL.
Advantages of SQL :
Faster Query Processing – Large amount of data is retrieved quickly
and efficiently.
No Coding Skills – For data retrieval, large number of lines of code is
not required.
Portable – It can be used in programs in PCs, server, laptops
independent of any platform (Operating System, etc).
Interactive Language – Easy to learn and understand, answers to
complex queries can be received in seconds.
Disadvantages of SQL :
Complex Interface – SQL has a difficult interface that makes few
users uncomfortable while dealing with the database.
Cost – Some versions are costly and hence, programmers cannot
access it.
Complexity: SQL databases can be complex to set up and manage,
Rules:
SQL follows the following rules:
Normalization
Normalization is the process of minimizing redundancy from a relation
or set of relations. Redundancy in relation may cause insertion, deletion,
and update anomalies. So, it helps to minimize the redundancy in
relations. Normal forms are used to eliminate or reduce redundancy in
database tables.
Advantages of Normalization
Reduced data redundancy: Normalization helps to eliminate
duplicate data in tables, reducing the amount of storage space needed
and improving database efficiency.
Improved data consistency: Normalization ensures that data is
stored in a consistent and organized manner, reducing the risk of data
inconsistencies and errors.
Simplified database design: Normalization provides guidelines for
organizing tables and data relationships, making it easier to design
and maintain a database.
Improved query performance: Normalized tables are typically easier
to search and retrieve data from, resulting in faster query performance.
Easier database maintenance: Normalization reduces the complexity
of a database by breaking it down into smaller, more manageable
tables, making it easier to add, modify, and delete data.
To
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
To
201010 UP Noida
02228 US Boston
to
EMP_ID EMP_COUNTRY
264 India
364 UK
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 2 Math
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Query Processing
Query Processing is the activity performed in extracting data from the
database. Query Processing includes translations on high level Queries into
low level expressions that can be used at physical level of file system, query
optimization and actual execution of query to get the actual result
3. Evaluation
For this, with addition to the relational algebra translation, it is required to
annotate the translated relational algebra expression with the instructions
used for specifying and evaluating each operation. Thus, after translating the
user query, the system executes a query evaluation plan.
Unit 2
Data Recovery
Recoverability
Recoverability is a property of database systems that ensures that, in
the event of a failure or error, the system can recover the database to
a consistent state.
Recoverability guarantees that all committed transactions are durable
and that their effects are permanently stored in the database, while
the effects of uncommitted transactions are undone to maintain data
consistency.
recoverability is a crucial property of database systems, as it ensures
that data is consistent and durable even in the event of failures or
errors.
It is important for database administrators to understand the level of
recoverability provided by their system and to configure it
appropriately to meet their application’s requirements.
levels of recoverability
No-undo logging: This level of recoverability only guarantees that
committed transactions are durable, but does not provide the ability to
undo the effects of uncommitted transactions.
Undo logging: This level of recoverability provides the ability to undo the
effects of uncommitted transactions but may result in the loss of updates
made by committed transactions that occur after the failed transaction.
Redo logging: This level of recoverability provides the ability to redo the
effects of committed transactions, ensuring that all committed updates
are durable and can be recovered in the event of failure.
Undo-redo logging: This level of recoverability provides both undo and
redo capabilities, ensuring that the system can recover to a consistent
state regardless of whether a transaction has been committed or not.
Transaction
Transaction in Database Management Systems (DBMS) can be defined
as a set of logically related operations.
It is the result of a request made by the user to access the contents of
the database and perform operations on it.
When the data of users is stored in a database, that data needs to be
accessed and modified from time to time. This task should be
performed with a specified set of rules and in a systematic way to
maintain the consistency and integrity of the data present in a
database. In DBMS, this task is called a transaction.
Operations of Transaction
i) Read(X)
ii) Write(X)
iii) Commit
iv) Rollback
Database Buffer
A database buffer is a temporary storage area in the main memory. It
allows storing the data temporarily when moving from one place to
another. A database buffer stores a copy of disk blocks. But, the version of
block copies on the disk may be older than the version in the buffer.
A buffer is a memory location used by a database management system (DBMS)
to temporarily hold data that has recently been accessed or updated in the
database. This buffer, often referred to as a database buffer, acts as a link
between the programs accessing the data and the physical storage devices.
A Database Management System's goal is to minimize the
number of transfers between the disk storage and the main
memory (RAM). We can lessen the number of disk accesses by
maintaining as many blocks of data as possible (also known as
database buffer) in the main memory. Therefore, when the user
wishes to access the data, the user can do so immediately from
the main memory. But, it is challenging to retain so many blocks
of data in the main memory, so we must carefully allocate the
space in the main memory for buffer storage. We can do so using
Buffer Management in DBMS. Let's first see the definition of a
database buffer in the main memory.
The database buffer is essential for enhancing the DBMS's overall performance.
Caching frequently requested data in memory, it decreases the frequency of disc
I/O operations, accelerating query and transaction processing.
Buffer Manager
A buffer manager in DBMS is in charge of allocating buffer space in the
main memory so that the temporary data can be stored there.
The buffer manager sends the block address if a user requests certain
data and the data block is present in the database buffer in the main
memory.
It is also responsible for allocating the data blocks in the database
buffer if the data blocks are not found in the database buffer.
In the absence of accessible empty space in the buffer, it removes a few
older blocks from the database buffer to make space for the new data
blocks.
If there is no space for a new data block in the database buffer, an existing
block must be removed from the buffer for the allocation of a new data block.
Here, the Least Recently Used (LRU) technique is used by several operating
systems. The least recently used data block is taken out of the buffer and sent
back to the disk. The term Buffer Replacement Strategy refers to this kind of
replacement technique.
Pinned Blocks
When a user needs to restore any data block from a system crash or failure, it
is crucial to limit the number of times a block is copied/written to the disk
storage to preserve the data. The majority of the recovery systems forbid
writing blocks to the disk while a data block update is taking place. Pinned
Blocks are the data blocks that are restricted from being written back to the
disk. It helps a database to have the capability to prevent writing data blocks
while doing updates so that the correct data is persisted after all operations.
Disaster Recovery
Disaster recovery is an organization’s method of regaining access and
functionality to its IT infrastructure after events like a natural disaster, cyber
attack, or even business disruptions related to the COVID-19 pandemic. It is
also called recoverability.
Concurrency
Concurrency means multiple tasks or transactions are happening at the same
time. Concurrency means errors or anomalis that happens in data base during
querry processing.
Concurrency Control
Concurrency Control is the management procedure that is required for
controlling concurrent execution of the operations that take place on a
database.
Concurrency control is a very important concept of DBMS which
ensures the simultaneous execution or manipulation of data by
several processes or user without resulting in data inconsistency.
Concurrency Control deals with interleaved execution of more than
one transaction.
Concurrency control provides a procedure that is able to control
concurrent execution of the operations in the database.
The fundamental goal of database concurrency control is to ensure
that concurrent execution of transactions does not result in a loss of
database consistency.
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it
acquires an appropriate lock on it. Locking is an operation which secures:
permission to read, OR permission to write a data item. Two phase
locking is a process used to gain ownership of shared resources without
creating the possibility of deadlock
There are two types of lock:
o The priority of the older transaction is higher that's why it executes first.
To determine the timestamp of the transaction, this protocol uses
system time or logical counter.
Drawbacks of OCC
o it requires more memory and storage space, as each data item has to
store a version number or a timestamp, and each transaction has to
keep track of its read set and write set.
o it increases the complexity and overhead of the commit phase, as
transactions have to validate their changes and handle conflicts.
o , it may not be suitable for some applications that require strict
serializability or real-time guarantees, as OCC does not enforce a global
order of transactions.
Serializability
serializability is a term that is a property of the system that describes how the
different process operates the shared data.
If the result given by the system is similar to the operation performed by the
system, then in this situation, we call that system serializable
It refers to the sequence of actions such as read, write, abort, commit are performed
in a serial manner.
T1 T2
READ1(A)
WRITE1(A)
READ1(B)
C1
READ2(B)
WRITE2(B)
READ2(B)
C2
Types of Serializability
1. Conflict Serializability
2. View Serializability
View serializability is a type of operation in the serializable in which each
transaction should produce some result and these results are the output of
proper sequential execution of the data item. Unlike conflict serialized, the
view serializability focuses on preventing inconsistency in the database. In
DBMS, the view serializability provides the user to view the database in a
conflicting way.
o The first condition is each schedule has the same type of transaction.
The meaning of this condition is that both schedules S1 and S2 must
not have the same type of set of transactions. If one schedule has
committed the transaction but does not match the transaction of
another schedule, then the schedule is not equivalent to each other.
o The second condition is that both schedules should not have the same
type of read or write operation. On the other hand, if schedule S1 has
two write operations while schedule S2 has one write operation, we say
that both schedules are not equivalent to each other. We may also say
that there is no problem if the number of the read operation is
different, but there must be the same number of the write operation in
both schedules.
o The final and last condition is that both schedules should not have the
same conflict. Order of execution of the same data item. For example,
suppose the transaction of schedule S1 is T1, and the transaction of
schedule S2 is T2. The transaction T1 writes the data item A, and the
transaction T2 also writes the data item A. in this case, the schedule is
not equivalent to each other. But if the schedule has the same number
of each write operation in the data item then we called the schedule
equivalent to each other.
Scheduling
o The process of queuing up transactions and executing them one by
one is known as scheduling.
o The term “schedule” refers to a sequence of operations from one
transaction to the next.
o When there are numerous transactions operating at the same time,
and the order of operations needs to be determined so that the
operations do not overlap, scheduling is used, and the transactions are
timed properly.
Serial Schedule
The serial schedule is a type of schedule where one transaction is executed
completely before starting another transaction. In the serial schedule, when
the first transaction completes its cycle, then the next transaction is executed.
Non-Serial Schedule
o This is a type of Scheduling where the operations of multiple
transactions are interleaved
o The transactions are executed in a non-serial manner, keeping the
end result correct and same as the serial schedule.
o a non-serial schedule allows the next transaction to continue without
waiting for the last one to finish.
o The Non-Serial Schedule can be divided further into Serializable
and Non-Serializable.
Serializable:
o Conflict Serializable:
o View Serializable:
Non-Serializable:
Recoverable Schedule:
Schedules in which transactions commit only after all transactions whose
changes they read commit are called recoverable schedules
T1 T2
R(A)
W(A)
W(A)
R(A)
COMMIT
COMMIT
Non-Recoverable Schedule:
R(A)
W(A)
W(A)
R(A)
commit
abort
Deadlock
A deadlock is a condition where two or more transactions are waiting
indefinitely for one another to give up locks.
3. No preemption
The process which once scheduled will be executed till the completion.
No other process can be scheduled by the scheduler meanwhile.
4. Circular Wait
All the processes must be waiting for the resources in a cyclic manner
so that the last process is waiting for the resource which is being held
by the first process.
Unit 3
Parallel and Distributed databases
Parallel Database :
Advantages :
1. Performance Improvement –
By connecting multiple resources like CPU and disks in parallel we
can significantly increase the performance of the system.
2. High availability –
In the parallel database, nodes have less contact with each other, so
the failure of one node doesn’t cause for failure of the entire system.
This amounts to significantly higher database availability.
4. Increase Reliability –
When one site fails, the execution can continue with another available
site which is having a copy of data. Making the system more reliable.
Distributed Database :
The Distributed DBMS is defined as, the software that allows for the
management of the distributed database and makes the distributed
data available for the users.
It is a collection of multiple interconnected databases that are spread
physically across various locations that communicate via a computer
network.
Distributed database as a collection of multiple interrelated databases
distributed over a computer network
hese fragments are called logical data units and are stored at various
sites.
Advantages :
As the data is stored close to the usage site, the efficiency of the
database system will increase
Local query optimization methods are sufficient for some queries as
the data is available locally
In order to maintain the security and privacy of the database system,
fragmentation is advantageous
Disadvantages :
Access speeds may be very high if data from different fragments are
needed
If we are using recursive fragmentation, then it will be very expensive
We have three methods for data fragmenting of a table:
Horizontal fragmentation: Horizontal fragmentation refers to the
process of dividing a table horizontally by assigning each row (or a
group of rows) of relation to one or more fragments. These fragments
can then be assigned to different sites in the distributed system.
Ex:SELECT * FROM student SALARY<30000;
Vertical fragmentation: Vertical fragmentation refers to the process
of decomposing a table vertically by attributes or columns. In this
fragmentation, some of the attributes are stored in one system and the
rest are stored in other systems.
SELECT * FROM name;#fragmentation 1 SELECT * FROM id,
age;#fragmentation 2
Mixed or Hybrid fragmentation: The combination of vertical
fragmentation of a table followed by further horizontal fragmentation of
some fragments is called mixed or hybrid fragmentation.
SELECT * FROM name WHERE age=22;
Data Replication
Data replication means a replica is made i. e. data is copied at multiple
locations to improve the availability of data. It is used to remove
inconsistency between the same data which result in a distributed
database so that users can do their task without interrupting the work of
other users
Data Replication is the process of storing data in more than one site or
node. It is useful in improving the availability of data. It is simply
copying data from a database from one server to another server so that
all the users can share the same data without any inconsistency.
Transactional Replication
It makes a full copy of the database along with the changed data.
Transactional consistency is guaranteed because the order of data is the
same when copied from publisher to subscriber database. It is used in
server−to−server environments by consistently and accurately replicating
changes in the database.
Snapshot Replication
Merge Replication
It combines data from several databases into a single database. It is the
most complex type of replication because both the publisher and
subscriber can do database changes. It is used in a server−to−client
environment and has changes sent from one publisher to multiple
subscribers.
Transparency in DDBMS
Distribution transparency is the property of distributed
databases by the virtue of which the internal details of
the distribution are hidden from the users.
Transparency in DDBMS refers to the transparent distribution of
information to the user from the system.
It helps in hiding the information that is to be implemented by the
user.
for example, in a normal DBMS, data independence is a form of
transparency that helps in hiding changes in the definition &
organization of the data from the user. But, they all have the same
overall target.
Query Trading
In query trading algorithm for distributed database systems, the
controlling/client site for a dist ributed query is called the buyer and
the sites where the local queries execute are called sellers. The
buyer formulates a number of alternatives for choosing sellers and
for reconstructing the global results. The target of the buyer is to
achieve the optimal cost.
Concurrency control
.
Two-Phase Commit (2PC): A widely-used protocol for coordinating
distributed transactions. It involves a prepare phase, where
participants indicate their readiness to commit, followed by a commit
phase, where the transaction is either committed or aborted based
on participant responses.
.
.
Three-Phase Commit (3PC): An extension of the 2PC protocol that
adds an additional phase to handle certain failure scenarios more
effectively. It includes a pre-commit phase, commit phase, and abort
phase.
.
.
Optimistic Concurrency Control (OCC): A concurrency control
technique where transactions proceed assuming they will not conflict
with other transactions. Validation occurs at the end of the
transaction to detect conflicts and ensure consistency.
.
.
Multi-Version Concurrency Control (MVCC): A technique that
allows multiple versions of data to coexist, enabling transactions to
operate on consistent snapshots of the database without blocking
each other.
.
.
Consistency: Ensuring that distributed transactions maintain
consistency across all participating systems, even in the presence of
failures or concurrent access.
.
.
Concurrency Control: Managing concurrent access to shared
resources to prevent conflicts and maintain isolation between
transactions.
.
.
Fault Tolerance: Designing systems to tolerate failures, such as
network partitions or participant crashes, without compromising the
integrity of distributed transactions.
.
.
Performance: Balancing consistency requirements with performance
considerations to ensure efficient transaction processing in
distributed environments.
.
Conclusion:
Distributed Deadlock
In distributed systems, a deadlock occurs when two or more transactions or
processes are waiting for resources held by each other, preventing any of
them from progressing. Distributed deadlocks are more complex than
deadlocks in a centralized system because resources and transactions are
spread across multiple nodes.
Communication Deadlock:
Description: Communication deadlocks occur when transactions are
waiting for messages or responses from other nodes, halting
communication.
Cause: Network communication failures, message queuing delays, or
synchronization issues can lead to transactions being unable to
proceed due to waiting for communication.
Characteristics: Transactions may be blocked indefinitely due to
communication issues, necessitating timeout mechanisms or message
retransmission strategies for resolution.
Resource Allocation Deadlock:
Description: Resource allocation deadlocks arise when transactions
across different nodes contend for distributed resources, such as locks,
connections, or data partitions.
Cause: Conflicting resource requests and inadequate coordination
mechanisms between distributed nodes lead to resource contention
and circular waits.
Characteristics: Requires careful management of distributed resources
and coordination mechanisms to prevent and resolve resource
allocation deadlocks effectively.
Partitioned Deadlock:
Description: Partitioned deadlocks occur in distributed databases with
partitioned data, where transactions accessing different partitions
contend for resources.
Cause: Concurrent transactions accessing partitioned data may lead to
conflicts and circular waits, particularly if proper partitioning and
coordination mechanisms are lacking.
Characteristics: Specific to distributed databases with partitioned data,
requiring partition-aware deadlock detection and resolution strategies.
Path-Pushing Algorithms:
Edge-Chasing Algorithms:
Description: Edge-chasing algorithms, also known as probe-based
algorithms, involve periodically sending probes or messages between
nodes to detect potential deadlocks. Each node probes its neighbors to
identify blocked transactions or resources and determine whether a
deadlock exists.
Operation: Nodes exchange probe messages to determine the status of
neighboring transactions and resources. If a node identifies a cycle of
blocked transactions, it signifies the presence of a deadlock.
Example: The Chandy-Misra-Haas distributed deadlock detection
algorithm is a well-known edge-chasing algorithm used to detect
deadlocks in distributed systems.
Commit Protocol
Commit protocols are defined as algorithms that are used in distributed
systems to ensure that the transaction is completed entirely or not. It
helps us to maintain data integrity, Atomicity, and consistency of the data.
It helps us to create robust, efficient and reliable systems.
Prepare Phase
Each slave sends a ‘DONE’ message to the controlling site after each
slave has completed its transaction.
After getting ‘DONE’ message from all the slaves, it sends a “prepare”
message to all the slaves.
Then the slaves share their vote or opinion whether they want to commit
or not. If a slave wants to commit, it sends a message which is “Ready”.
If the slaves doesn’t want to commit, it then sends a “Not Ready”
message
Commit/Abort Phase
Controlling Site, after receiving “Ready” message from all the slaves
The controlling site sends a message “Global Commit” to all the
slaves. The message contains the details of the transaction which
needs to be stored in the databases.
Then each slave completes the transaction and returns an
acknowledgement message back to the controlling site.
The controlling site after receiving acknowledgement from all the
slaves, which means the transaction is completed.
When the controlling site receives “Not Ready” message from the
slaves
The controlling site sends a message “Global Abort” to all the slaves.
After receiving the “Global Abort” message, the transaction is aborted
by the slaves. Then the slaves sends back an acknowledegement
message back to the controlling site.
When the controlling site receives Abort Acknowledgement from all
the slaves, it means the transaction is aborted
Three Phase Commit Protocol
It is the second type of commit protocol in DBMS. It was introduced to
address the issue of blocking. In this commit protocol, there are three
phases: –
Prepare Phase
Commit/Abort Phase
This phase consists of the steps which are same as they were in the two-
phase commit. In this phase, no acknowledegement is provided after the
process.
Advantages :
1. It has high-speed data access for a limited number of processors.
2. The communication is efficient.
Disadvantages :
1. It cannot use beyond 80 or 100 CPUs in parallel.
2. The bus or the interconnection network gets block due to the
increment of the large number of CPUs.
2. Shared Disk Architectures :
In Shared Disk Architecture, various CPUs are attached to an
interconnection network. In this, each CPU has its own memory and all of
them have access to the same disk. Also, note that here the memory is
not shared among CPUs therefore each node has its own copy of the
operating system and DBMS. Shared disk architecture is a loosely
coupled architecture optimized for applications that are inherently
centralized. They are also known as clusters.
Shared Disk Architecture
Advantages :
1. The interconnection network is no longer a bottleneck each CPU has
its own memory.
2. Load-balancing is easier in shared disk architecture.
3. There is better fault tolerance.
Disadvantages :
1. If the number of CPUs increases, the problems of interference and
memory contentions also increase.
2. There’s also exists a scalability problem.
3, Shared Nothing Architecture :
Shared Nothing Architecture is multiple processor architecture in which
each processor has its own memory and disk storage. In this, multiple
CPUs are attached to an interconnection network through a node. Also,
note that no two CPUs can access the same disk area. In this
architecture, no sharing of memory or disk resources is done. It is also
known as Massively parallel processing (MPP).
Shared Nothing Architecture
Advantages :
1. It has better scalability as no sharing of resources is done
2. Multiple CPUs can be added
Disadvantages:
1. The cost of communications is higher as it involves sending of data
and software interaction at both ends
2. The cost of non-local disk access is higher than the cost of shared
disk architectures.
Parallel Query Evaluation
Parallel Scan: The data needed for the query is partitioned and distributed
across multiple nodes, with each node scanning its local data in parallel.
Parallel Join: Join operations involving multiple tables are parallelized by
partitioning and distributing the join keys across nodes, enabling parallel
processing of join operations.
Parallel Aggregation: Aggregate functions such as SUM, AVG, COUNT, etc.,
are computed in parallel across multiple nodes, with partial results combined
at the end.
Parallel Sorting: Sorting operations are parallelized by partitioning data
across nodes and performing parallel sorting within each partition, followed
by merging sorted partitions.
Horizontal Partitioning: Tables are divided into disjoint subsets of rows, with
each subset assigned to a different node for parallel processing.
Vertical Partitioning: Attributes of a table are split into separate partitions,
with each partition assigned to a different node, enabling parallel processing
of queries involving specific attributes.
5. Load Balancing:
Scaling with System Size: Parallel query evaluation techniques should scale
efficiently with the size of the parallel database system, supporting large-scale
deployments with thousands of nodes.
Performance Tuning: Performance monitoring and tuning mechanisms are
employed to optimize parallel query execution by identifying and addressing
performance bottlenecks.
Unit 4
Object Oriented and Object relation
Database
Object-Oriented Databases
An object-oriented database (OODB) is a type of database management
system (DBMS) that supports the storage, retrieval, and management of
data in the form of objects, which are instances of classes or types in
object-oriented programming (OOP).
This allows for a more natural representation of complex data structures,
relationships, and behaviors. Developers can define classes to represent
real-world entities, and objects of these classes can encapsulate both data
and the operations that can be performed on that data.
Features of ODBMS:
Advantages:
Disadvantages:
Advantages of ORDBs
Efficient Handling of Structured Data: ORDBs excel in efficiently
handling structured data, making them suitable for applications that
require organized and structured information
Support for SQL Queries and Transactions: ORDBs maintain
compatibility with SQL queries and transactions, allowing users to
leverage the benefits of SQL while working with complex data structures
and relationships
Integration into Existing Systems: ORDBs can be seamlessly integrated
into existing systems, making them a practical choice for applications that
need to work with both relational and object-oriented data models
Good Support for Transactions: ORDBs provide robust support for
transactions, ensuring data integrity and consistency during complex
operations
Specialization
o Specialization is a top-down approach, and it is opposite to
Generalization. In specialization, one higher level entity can be broken
down into two lower level entities.
o Specialization is used to identify the subset of an entity set that shares
some distinguishing characteristics.
o Normally, the superclass is defined first, the subclass and its related
attributes are defined next, and relationship set are then added.
Example, an EMPLOYEE entity in an Employee management system can
be specialized into DEVELOPER, TESTER, etc. as shown in Figure 2. In
this case, common attributes like E_NAME, E_SAL, etc. become part of a
higher entity (EMPLOYEE), and specialized attributes like TES_TYPE
become part of a specialized entity (TESTER).
Generalization
o Generalization is like a bottom-up approach in which two or more
entities of lower level combine to form a higher level entity if they have
some attributes in common.
o In generalization, an entity of a higher level can also combine with the
entities of the lower level to form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only
difference is the approach. Generalization uses the bottom-up
approach.
o In generalization, entities are combined to form a more generalized
entity, i.e., subclasses are combined to make a superclass.
For example, Faculty and Student entities can be generalized and create a
higher level entity Person.
Aggregation
In aggregation, the relation between two entities is treated as a single entity.
In aggregation, relationship with its corresponding entities is aggregated into
a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the
relationship which is in a relationship with another entity visitor. In the real
world, if a visitor visits a coaching center then he will never enquiry about the
Course only or just about the Center instead he will ask the enquiry about
both.
Association
Association is a relation between two separate classes which is
established through their Objects. Association can be one-to-one, one-to-
many, many-to-one, many-to-many. In Object-Oriented programming, an
Object communicates to another object to use functionality and services
provided by that object. Composition and Aggregation are the two
forms of association.
Aggregation
Aggregation is a subset of association, is a collection of different things. It
represents has a relationship. It is more specific than an association. It
describes a part-whole or part-of relationship. It is a binary association, i.e., it
only involves two classes. It is a kind of relationship in which the child is
independent of its parent.
For example:
Here we are considering a car and a wheel example. A car cannot move
without a wheel. But the wheel can be independently used with the bike,
scooter, cycle, or any other vehicle. The wheel object can exist without the car
object, which proves to be an aggregation relationship.
Composition
The composition is a part of aggregation, and it portrays the whole-part
relationship. It depicts dependency between a composite (parent) and its parts
(children), which means that if the composite is discarded, so will its parts get
deleted. It exists between similar objects.
As you can see from the example given below, the composition association
relationship connects the Person class with Brain class, Heart class, and Legs
class. If the person is destroyed, the brain, heart, and legs will also get
discarded.
In UML, it can exist between It is a part of the association It is a part of the aggr
two or more classes. relationship. relationship.
In this, objects are linked In this, the linked objects are Here the linked objec
together. independent of each other. dependent on each other.
It may or may not affect the Deleting one element in the It affects the other elemen
other associated element if aggregation relationship does of its associated elem
one element is deleted. not affect other associated deleted.
elements.
Example: A tutor can Example: A car needs a wheel for Example: If a file is place
associate with multiple its proper functioning, but it may folder and that is folder is
students, or one student can not require the same wheel. It The file residing inside tha
associate with multiple may function with another wheel will also get deleted at the
teachers. as well. folder deletion.
Database Objects
A database object is any defined object in a database that is used to
store or reference data.Anything which we make from create
command is known as Database Object.It can be used to hold and
manipulate the data
Table – Basic unit of storage; composed rows and columns
View – Logically represents subsets of data from one or more tables
Sequence – Generates primary key values
Index – Improves the performance of some queries
Synonym – Alternative name for an object
Object Identity
Object Identity in DBMS refers to the property of data in an object data model
where each object is assigned a unique internal identifier, also known as an
Object Identifier (OID).
object identity is a key property of object data models that allows for the
unique identification of objects and supports object sharing and object
updates. It is implemented through a system-generated object identifier that
is immutable and unique to each object.
Equality
Equality of Objects: Equality of objects refers to determining whether two
objects have the same content or values. This is typically checked using
the equals() method, which compares the attributes of objects to ascertain if
they are equal in content.
Equality of References: On the other hand, equality of references involves
checking if two object references point to the same memory location. This is
evaluated using the == operator, which compares the memory addresses of
objects to establish if they refer to the same object in memory.