0% found this document useful (0 votes)
40 views46 pages

Understanding Database Transactions and ACID

The document discusses the transaction concept in databases, detailing transaction states, ACID properties, and concurrency control mechanisms. It explains various transaction states such as Active, Partially Committed, Failed, Aborted, Committed, and Terminated, along with the importance of ACID properties for maintaining data integrity and consistency. Additionally, it covers issues related to concurrent executions, such as lost updates and deadlocks, and introduces scheduling concepts like serial and concurrent schedules.

Uploaded by

tmjasper2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views46 pages

Understanding Database Transactions and ACID

The document discusses the transaction concept in databases, detailing transaction states, ACID properties, and concurrency control mechanisms. It explains various transaction states such as Active, Partially Committed, Failed, Aborted, Committed, and Terminated, along with the importance of ACID properties for maintaining data integrity and consistency. Additionally, it covers issues related to concurrent executions, such as lost updates and deadlocks, and introduces scheduling concepts like serial and concurrent schedules.

Uploaded by

tmjasper2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT V: Transaction Concept:Transaction State, ACID properties, Concurrent

Executions, Serializability, Recoverability, Implementation of Isolation, Testing for


Serializability, lock based, time stamp based, optimistic, concurrency protocols,
Deadlocks, Failure Classification, Storage, Recovery and Atomicity, Recovery
algorithm. Introduction to Indexing Techniques: B+ Trees, operations on
B+Trees, Hash Based Indexing:

Transaction: Any logical work or set of works that are done on the data of a
database is known as a transaction. Logical work can be inserting a new value in
the current database, deleting existing values, or updating the current values in
the database.

For example, adding a new member to the database of a team is a transaction.

To complete a transaction, we have to follow some steps which make a


transaction successful. For example, we withdraw the cash from ATM is an
example of a transaction, and it can be done in the following steps:

o Initialization if transaction
o Inserting the ATM card into the machine
o Choosing the language
o Choosing the account type
o Entering the cash amount
o Entering the pin
o Collecting the cash
o Aborting the transaction
What is a Transaction State?
A transaction is a set of operations or tasks performed to complete a logical
process, which may or may not change the data in a database. To handle
different situations, like system failures, a transaction is divided into different
states.
A transaction state refers to the current phase or condition of a transaction
during its execution in a database. It represents the progress of the transaction
and determines whether it will successfully complete (commit) or fail (abort).
A transaction involves two main operations:
1. Read Operation: Reads data from the database, stores it temporarily in
memory (buffer), and uses it as needed.
2. Write Operation: Updates the database with the changed data using the
buffer.
From the start of executing instructions to the end, these operations are
treated as a single transaction. This ensures the database remains consistent
and reliable throughout the process.
Different Types of Transaction States in DBMS
These are different types of Transaction States :
1. Active State – This is the first stage of a transaction, when the transaction’s
instructions are being executed.
 It is the first stage of any transaction when it has begun to execute. The
execution of the transaction takes place in this state.
 Operations such as insertion, deletion, or updation are performed during this
state.
 During this state, the data records are under manipulation and they are not
saved to the database, rather they remain somewhere in a buffer in the
main memory.
2. Partially Committed –
 The transaction has finished its final operation, but the changes are still not
saved to the database.
 After completing all read and write operations, the modifications are initially
stored in main memory or a local buffer. If the changes are made permanent
on the DataBase then the state will change to “committed state” and in
case of failure it will go to the “failed state”.
3. Failed State –If any of the transaction-related operations cause an error
during the active or partially committed state, further execution of the
transaction is stopped and it is brought into a failed state. Here, the database
recovery system makes sure that the database is in a consistent state.
5. Aborted State- If a transaction reaches the failed state due to a failed
check, the database recovery system will attempt to restore it to a consistent
state. If recovery is not possible, the transaction is either rolled back or
cancelled to ensure the database remains consistent.
In the aborted state, the DBMS recovery system performs one of two actions:
 Kill the transaction: The system terminates the transaction to prevent it
from affecting other operations.
 Restart the transaction: After making necessary adjustments, the system
reverts the transaction to an active state and attempts to continue its
execution.
6. Commited- This state of transaction is achieved when all the transaction-
related operations have been executed successfully along with the Commit
operation, i.e. data is saved into the database after the required manipulations
in this state. This marks the successful completion of a transaction.
7. Terminated State – If there isn’t any roll-back or the transaction comes
from the “committed state”, then the system is consistent and ready for new
transaction and the old transaction is terminated.

ACID Properties in DBMS


A transaction is a single logical unit of work that interacts with the database,
potentially modifying its content through read and write operations. To
maintain database consistency both before and after a transaction, specific
properties, known as ACID properties must be followed.

Atomicity:
By this, we mean that either the entire transaction takes place at once or
doesn’t happen at all. There is no midway i.e. transactions do not occur
partially. Each transaction is considered as one unit and either runs to
completion or is not executed at all. It involves the following two operations.
— Abort : If a transaction aborts, changes made to the database are not
visible.
— Commit : If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2 : Transfer of 100
from account X to account Y .

If the transaction fails after completion of T1 but before completion of T2 ( say,


after write(X) but before write(Y) ), then the amount has been deducted from X
but not added to Y . This results in an inconsistent database state. Therefore,
the transaction must be executed in its entirety in order to ensure the
correctness of the database state.
Consistency:
Consistency ensures that a database remains in a valid state before and after a
transaction. It guarantees that any transaction will take the database from one
consistent state to another, maintaining the rules and constraints defined for
the data.
Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700 .
Total after T occurs = 400 + 300 = 700 .
Therefore, the database is consistent . Inconsistency occurs in case T1
completes but T2 fails.
Isolation:
This property ensures that multiple transactions can occur concurrently without
leading to the inconsistency of the database state. Transactions occur
independently without interference. Changes occurring in a particular
transaction will not be visible to any other transaction until that particular
change in that transaction is written to memory or has been committed. This
property ensures that when multiple transactions run at the same time, the
result will be the same as if they were run one after another in a specific order.
Let X = 500, Y = 500.
Consider two transactions T and T”.

Suppose T has been executed till Read (Y) and then T’’ starts. As a result,
interleaving of operations takes place due to which T’’ reads the correct value
of X but the incorrect value of Y and sum computed by
T’’: (X+Y = 50, 000+500=50, 500) .
is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450) .
This results in database inconsistency, due to a loss of 50 units. Hence,
transactions must take place in isolation and changes should be visible only
after they have been made to the main memory.

Durability:
This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk and
they persist even if a system failure occurs. These updates now become
permanent and are stored in non-volatile memory. The effects of the
transaction, thus, are never lost.
Advantages of ACID Properties in DBMS
1. Data Consistency: ACID properties ensure that the data remains
consistent and accurate after any transaction execution.
2. Data Integrity: ACID properties maintain the integrity of the data by
ensuring that any changes to the database are permanent and cannot be
lost.
3. Concurrency Control: ACID properties help to manage multiple
transactions occurring concurrently by preventing interference between
them.
4. Recovery: ACID properties ensure that in case of any failure or crash, the
system can recover the data up to the point of failure or crash.
Disadvantages of ACID Properties in DBMS
1. Performance: The ACID properties can cause a performance overhead in
the system, as they require additional processing to ensure data consistency
and integrity.
2. Scalability: The ACID properties may cause scalability issues in large
distributed systems where multiple transactions occur concurrently.
3. Complexity: Implementing the ACID properties can increase the complexity
of the system and require significant expertise and resources.
Overall, the advantages of ACID properties in DBMS outweigh the
disadvantages. They provide a reliable and consistent approach to data
management, ensuring data integrity, accuracy, and reliability. However, in
some cases, the overhead of implementing ACID properties can cause
performance and scalability issues. Therefore, it’s important to balance the
benefits of ACID properties against the specific needs and requirements of
the system.
Concurrent Executions in DBMS
Concurrent execution refers to the simultaneous execution of more than one transaction.
This is a common scenario in multi-user database environments where many users or
applications might be accessing or modifying the database at the same time. Concurrent
execution is crucial for achieving high throughput and efficient resource utilization. However,
it introduces the potential for conflicts and data inconsistencies.

Advantages of Concurrent Execution

1. Increased System Throughput: Multiple transactions can be in progress at the


same time, but at different stages
2. Maximized Processor Utilization: If one transaction is waiting for I/O operations,
another transaction can utilize the processor.
3. Decreased Wait Time: Transactions no longer have to wait for other long
transactions to complete.
4. Improved Transaction Response Time: Transactions get processed faster
because they can be executed in parallel.

Potential Problems with Concurrent Execution

In a DBMS, concurrent execution might offer a number of issues that need to be resolved in
order to guarantee accurate and dependable database operation. Some of the issues with
concurrent execution in DBMS include the following –

• 1. Lost Update: When two or more transactions try to update the same data item at
the same time, a lost update happens, and the outcome relies on the sequence in
which the transactions are executed. The modifications made by the other
transaction will be lost if one transaction overwrites them before they are committed.
Inconsistent data and inaccurate findings might occur from lost updates.
• 2. Dirty Read: When a transaction accesses data that has already been updated but
hasn't been committed, it's known as a dirty read. The information read by the first
transaction will be invalid if the modifying transaction rolls back. Data discrepancies
and inaccurate outcomes might occur from dirty readings.

• 3. Non-Repeatable Read: When a transaction reads the same data item twice and
the data is updated by another transaction between the two reads, this is known as a
non-repeatable read. This might result in discrepancies in outcomes and data.

• 4. Phantom Read: When a transaction reads a group of rows that meet a given
criterion and a subsequent transaction adds or deletes rows that meet the same
requirement, this is known as a phantom read. The same set of data will have new
rows that were not present the first time when the initial transaction reads them.
Results and data discrepancies may emerge from this.

• 5. Deadlock: In DBMS, a deadlock happens when many transactions are held up as


they wait for one another to release the resources they are holding. Deadlocks can
happen when resources are not released properly or are acquired by transactions in
a different sequence. Deadlocks can result in decreased system performance or
even system crashes.

• 6. Starvation: In DBMS, starvation happens when one transaction is perpetually


blocked from using a resource or finishing a job because that resource has been
allocated to another transaction. When resources are not equitably distributed among
transactions or when priorities are not correctly controlled, starvation may result.

In conclusion, concurrent processing in a DBMS can have a number of advantages,


including faster response times and higher system throughput. To ensure accurate and
dependable database functioning, there may be a number of issues that need to be
resolved. Lost updates, unclean reads, non-repeatable reads, phantom reads, deadlocks,
and hunger are some issues with concurrent execution in a DBMS. Many concurrency
control strategies, including locks, timestamps, and optimistic concurrency control, are
employed to avoid these issues. The DBMS and the application it supports' unique needs
will determine which concurrency control method is best. Concurrent execution must be
managed properly to guarantee a DBMS's accuracy and dependability.

What is a Schedule?

• A schedule is a series of operations from one or more transactions. A schedule can


be of two types:

• Serial Schedule: When one transaction completely executes before starting another
transaction, the schedule is called serial schedule. A serial schedule is always
consistent. e.g.; If a schedule S has debit transaction T1 and credit transaction T2,
possible serial schedules are T1 followed by T2 (T1->T2) or T2 followed by T1 ((T1-
>T2). A serial schedule has low throughput and less resource utilization.

• Concurrent Schedule: When operations of a transaction are interleaved with


operations of other transactions of a schedule, the schedule is called Concurrent
schedule. e.g.; Schedule of debit and credit transactions….. One need to be careful
--- > but concurrency can lead to inconsistency in the database.
SERIALIZABILITY:

• When multiple transactions are running concurrently then there is a possibility that
the database may be left in an inconsistent state.
• Serializability is a concept that helps us to check which schedules are serializable.
• A serializable schedule is the one that always leaves the database in consistent
state.

What is a Serializable Schedule?

• A serializable schedule always leaves the database in consistent state. A serial


schedule is always a serializable schedule because in serial schedule, a transaction
only starts when the other transaction finished execution. However a non-serial
schedule needs to be checked for Serializability.
• A non-serial schedule of n number of transactions is said to be serializable schedule,
if it is equivalent to the serial schedule of those n transactions. A serial schedule
doesn’t allow concurrency, only one transaction executes at a time and the other
starts when the already running transaction finished.

Types of Serializability:

They are of two types:

• 1. Conflict Serializability: Conflict Serializability is one, which can be used to check


whether a nonserial schedule is conflict serializable or not.
• 2. View Serializability: View Serializability is a process to find out that a given
schedule is view serializable or not.

CONFLICT SERIALIZABILITY:

• Conflicting operations: Two operations are said to be in conflict, if they satisfy all
the following three conditions:
• 1. Both the operations should belong to different transactions.
• 2. Both the operations are working on same data item.
• 3. At least one of the operation is a write operation.
• Example 1: Operation W(X) of transaction T1 and operation R(X) of transaction T2
are conflicting operations, because they satisfy all the three conditions mentioned
above. They belong to different transactions, they are working on same data item X,
one of the operation in write operation.
• Example 2: Similarly Operations W(X) of T1 and W(X) of T2 are conflicting
operations.
• Example 3: Operations W(X) of T1 and W(Y) of T2 are non-conflicting operations
because both the write operations are not working on same data item so these
operations don’t satisfy the second condition.
• Example 4: Similarly R(X) of T1 and R(X) of T2 are non-conflicting operations
because none of them is write operation.
• Example 5: Similarly W(X) of T1 and R(X) of T1 are non-conflicting operations
because both the operations belong to same transaction T1.

Conflict Equivalent Schedules:

Two schedules are said to be conflict Equivalent if one schedule can be converted into other
schedule after swapping non-conflicting operations.

Conflict Serializable check:

If a schedule is conflict Equivalent to its serial schedule then it is called Conflict Serializable
schedule. Lets take few examples of schedules.
Conflict Equivalent:

In the conflict equivalent, one can be transformed to another by swapping non-conflicting


operations. In the given example, S2 is conflict equivalent to S1 (S1 can be converted to S2 by
swapping non-conflicting operations).

Two schedules are said to be conflict equivalent if and only if:

1. They contain the same set of the transaction.

2. If each pair of conflict operations are ordered in the same way.


Schedule S2 is a serial schedule because, in this, all operations of T1 are performed before starting
any operation of T2. Schedule S1 can be transformed into a serial schedule by swapping non-
conflicting operations of [Link] swapping of non-conflict operations, the schedule S1 becomes:
T1 T2
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)

Since, S1 is conflict serializable.

VIEW SERIALIZABILITY: View Serializability is a process to find out that a given schedule is
view serializable or not.

• To check whether a given schedule is view serializable, we need to check whether


the given schedule is View Equivalent to its serial schedule. Lets take an example to
understand what I mean by that.
View Equivalent: to check whether the two schedules are view equivalent. Two
schedules T1 and T2 are said to be view equivalent, if they satisfy all the
following conditions:

• 1. Initial Read: Initial read of each data item in transactions must match
in both schedules. For example, if transaction T1 reads a data item X
before transaction T2 in schedule S1 then in schedule S2, T1 should read X
before T2.

• 2. Final Write: Final write operations on each data item must match in
both the schedules. For example, a data item X is last written by
Transaction T1 in schedule S1 then in S2, the last write operation on X
should be performed by the transaction T1.

• 3. Update Read: If in schedule S1, the transaction T1 is reading a data


item updated by T2 then in schedule S2, T1 should read the value after
the write operation of T2 on same data item. For example, In schedule S1,
T1 performs a read operation on X after the write operation on X by T2
then in S2, T1 should read the X after T2 performs write on X.

If a schedule is view equivalent to its serial schedule then the given schedule is
said to be View Serializable. Lets take an example.
• Lets check the three conditions of view serializability:

• Initial Read

• In schedule S1, transaction T1 first reads the data item X. In S2 also


transaction T1 first reads the data item X. Lets check for Y. In schedule S1,
transaction T1 first reads the data item Y. In S2 also the first read
operation on Y is performed by T1. We checked for both data items X & Y
and the initial read condition is satisfied in S1 & S2.

• Final Write

• In schedule S1, the final write operation on X is done by transaction T2. In


S2 also transaction T2 performs the final write on X. Lets check for Y. In
schedule S1, the final write operation on Y is done by transaction T2. In
schedule S2, final write on Y is done by T2. We checked for both data
items X & Y and the final write condition is satisfied in S1 & S2.

• Update Read

• In S1, transaction T2 reads the value of X, written by T1. In S2, the same
transaction T2 reads the X after it is written by T1.

• In S1, transaction T2 reads the value of Y, written by T1. In S2, the same
transaction T2 reads the value of Y after it is updated by T1.

• The update read condition is also satisfied for both the schedules.

• Result: Since all the three conditions that checks whether the two
schedules are view equivalent are satisfied in this example, which means
S1 and S2 are view equivalent. Also, as we know that the schedule S2 is
the serial schedule of S1, thus we can say that the schedule S1 is view
serializable schedule.

Recoverability: a transaction may not execute completely due to hardware


failure, system crash or software issues. In that case, we have to roll back the
failed transaction. But some other transaction may also have used values
produced by the failed transaction. So we have to roll back those transactions as
well.

Recoverable Schedule:
Implementation of Isolation:

Isolation is one of the core ACID properties of a database transaction, ensuring that the
operations of one transaction remain hidden from other transactions until completion. It
means that no two transactions should interfere with each other and affect the other's
intermediate state.
Isolation Levels

Isolation levels defines the degree to which a transaction must be isolated from the data
modifications made by any other transaction in the database system. There are four levels of
transaction isolation defined by SQL -
• Read Uncommitted – Read Uncommitted is the lowest isolation level. In
this level, one transaction may read not yet committed changes made by
other transaction, thereby allowing dirty reads. In this level, transactions
are not isolated from each other.

• Read Committed – This isolation level guarantees that any data read is
committed at the moment it is read. Thus it does not allows dirty read.
The transaction holds a read or write lock on the current row, and thus
prevent other transactions from reading, updating or deleting it.

• Repeatable Read – This is the most restrictive isolation level. The


transaction holds read locks on all rows it references and writes locks on
all rows it inserts, updates, or deletes. Since other transaction cannot
read, update or delete these rows, consequently it avoids non-repeatable
read.

• Serializable – This is the Highest isolation level. A serializable execution


is guaranteed to be serializable. Serializable execution is defined to be an
execution of operations in which concurrently executing transactions
appears to be serially executing.

Testing for Serializability:

Serialization Graph is used to test the Serializability of a schedule.

Assume a schedule S. For S, we construct a graph known as precedence graph. This graph has a pair
G = (V, E), where V consists a set of vertices, and E consists a set of edges. The set of vertices is used
to contain all the transactions participating in the schedule. The set of edges is used to contain all
edges Ti ->Tj for which one of the three conditions holds:

1. Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).

2. Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).


3. Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).

If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are executed
before the first instruction of Tj is executed.

If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the precedence
graph has no cycle, then S is known as serializable.

For example:

Explanation:

Read(A): In T1, no subsequent writes to A, so no new edges


Read(B): In T2, no subsequent writes to B, so no new edges

Read(C): In T3, no subsequent writes to C, so no new edges

Write(B): B is subsequently read by T3, so add edge T2 → T3

Write(C): C is subsequently read by T1, so add edge T3 → T1

Write(A): A is subsequently read by T2, so add edge T1 → T2

Write(A): In T2, no subsequent reads to A, so no new edges

Write(C): In T1, no subsequent reads to C, so no new edges

Write(B): In T3, no subsequent reads to B, so no new edges

Precedence graph for schedule S1:

The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is nonserializable.

Example2:
Explanation:

Read(A): In T4,no subsequent writes to A, so no new edges

Read(C): In T4, no subsequent writes to C, so no new edges

Write(A): A is subsequently read by T5, so add edge T4 → T5

Read(B): In T5,no subsequent writes to B, so no new edges

Write(C): C is subsequently read by T6, so add edge T4 → T6

Write(B): A is subsequently read by T6, so add edge T5 → T6

Write(C): In T6, no subsequent reads to C, so no new edges

Write(A): In T5, no subsequent reads to A, so no new edges

Write(B): In T6, no subsequent reads to B, so no new edges

Precedence graph for schedule S2:


The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is serializable.

Concurrency Protocols: Concurrency control protocols in DBMS ensure that multiple transactions
execute concurrently without leading to data inconsistency. They maintain ACID (Atomicity,
Consistency, Isolation, Durability) properties by preventing issues like lost updates, dirty reads, and
uncommitted data dependencies.

Types of concurrency control protocols:

• Lock-Based Protocols

• Timestamp-Based Protocols

• Optimistic Concurency Control

• Multi-Version Concurrency Control

• Graph based protocols

Lock-Based Protocols: lock-based concurrency control (BCC) is a method used to manage how
multiple transactions access the same data. This protocol ensures data consistency and integrity
when multiple users interact with the database simultaneously.

This method uses locks to manage access to data, ensuring transactions don’t clash and everything
runs smoothly when multiple transactions happen at the same time.

What is a Lock?

• A lock is a variable associated with a data item that indicates whether it is currently in use or
available for other operations. Locks are essential for managing access to data during
concurrent transactions. When one transaction is accessing or modifying a data item, a lock
ensures that other transactions cannot interfere with it, maintaining data integrity and
preventing conflicts. This process, known as locking, is a widely used method to ensure
smooth and consistent operation in database systems.

• Lock-Based Protocols in DBMS ensure that a transaction cannot read or write data until it
gets the necessary lock. Here’s how they work:

• These protocols prevent concurrency issues by allowing only one transaction to access a
specific data item at a time.

• Locks help multiple transactions work together smoothly by managing access to the
database items.

• Locking is a common method used to maintain the serializability of transactions.

• A transaction must acquire a read lock or write lock on a data item before performing any
read or write operations on it.

Types of Lock
• Shared Lock (S): Shared Lock is also known as Read-only lock. As the name suggests it can be
shared between transactions because while holding this lock the transaction does not have
the permission to update data on the data item. S-lock is requested using lock-S instruction.

• Exclusive Lock (X): Data item can be both read as well as written. This is Exclusive and
cannot be held simultaneously on the same data item. X-lock is requested using lock-X
instruction.

Rules of Locking

• The basic rules for Locking are given below :

• Read Lock (or) Shared Lock(S)

• If a Transaction has a Read lock on a data item, it can read the item but not update it.

• If a transaction has a Read lock on the data item, other transaction can obtain Read Lock on
the data item but no Write Locks.

• So, the Read Lock is also called a Shared Lock.

• Write Lock (or) Exclusive Lock (X)

• If a transaction has a write Lock on a data item, it can both read and update the data item.

• If a transaction has a write Lock on the data item, then other transactions cannot obtain
either a Read lock or write lock on the data item.

• So, the Write Lock is also known as Exclusive Lock.

Types of Lock Based Protocols:

1. Simplistic Lock Protocol: It is the simplest method for locking data during a transaction. Simple
lock-based protocols enable all transactions to obtain a lock on the data before inserting, deleting, or
updating it. It will unlock the data item once the transaction is completed.

Example:

Consider a database with a single data item X = 10.

• Transactions:

• T1: Wants to read and update X.

• T2: Wants to read X.

Steps:

• T1 requests an exclusive lock on X to update its value. The lock is granted.

– T1 reads X = 10 and updates it to X = 20.


• T2 requests a shared lock on X to read its value. Since T1 is holding an exclusive lock, T2 must
wait.

• T1 completes its operation and releases the lock.

• T2 now gets the shared lock and reads the updated value X = 20.

This example shows how simplistic lock protocols handle concurrency but do not prevent problems
like deadlocks or limits concurrency.

2. Pre-Claiming Lock Protocol: The Pre-Claiming Lock Protocol evaluates a transaction to identify all
the data items that require locks. Before the transaction begins, it requests the database
management system to grant locks on all necessary data elements. If all the requested locks are
successfully acquired, the transaction proceeds. Once the transaction is completed, all locks are
released. However, if any of the locks are unavailable, the transaction rolls back and waits until all
required locks are granted before restarting.

Example:

Consider two transactions T1 and T2 and two data items, X and Y:

• Transaction T1 declares that it needs:

– A write lock on X.

– A read lock on Y.

• Since both locks are available, the system grants them. T1 starts execution:

– It updates X.

– It reads the value of Y.

• While T1 is executing, Transaction T2 declares that it needs:

– A read lock on X.

• However, since T1 already holds a write lock on X, T2’s request is denied. T2 must wait until
T1 completes its operations and releases the locks.

• Once T1 finishes, it releases the locks on X and Y. The system now grants the read lock
on X to T2, allowing it to proceed.

This method is simple but may lead to inefficiency in systems with a high number of
transactions.

3. Two-phase locking (2PL): A transaction is said to follow the Two-Phase Locking protocol if Locking
and Unlocking can be done in two phases :

• Growing Phase: New locks on data items may be acquired but none can be released.

• Shrinking Phase: Existing locks may be released but no new locks can be acquired.
• Does not prevent cascading rollbacks (if a transaction releases a lock early and another
transaction reads uncommitted data, a rollback of the first transaction may force others to
rollback).
• Prone to deadlocks because locks are released at different times.

Example:

T1 T2
X(A)
W(A)
Unlock(A)
X(A)
W(A)
X(B)
Commit
In the above example X(B) in not allowed since T1 released A early if, it later rollbacks T2 may read
incorrect data causing cascading rollback. i.e., In two phase locking protocol once a transaction
releases any lock, it enters shrinking phase meaning no new locks can be acquired.

4. Strict Two-Phase Locking Protocol: Strict Two-Phase Locking requires that in addition to the 2-
PL all Exclusive(X) locks held by the transaction be released until after the Transaction Commits.

• It is a stronger version of 2PL.

• Prevents cascading rollbacks (ensures that transactions read only committed data).

• Still prone to deadlocks but eliminates the issue of early locking.


1. Deadlock

• In the given execution scenario, T1 holds an exclusive lock on B, while T2 holds a shared lock
on A. At Statement 7, T2 requests a lock on B, and at Statement 8, T1 requests a lock on A.
This situation creates a deadlock, as both transactions are waiting for resources held by the
other, preventing either from proceeding with their execution.

2. Starvation

• Starvation is also possible if concurrency control manager is badly designed. For example: A
transaction may be waiting for an X-lock on an item, while a sequence of other transactions
request and are granted an S-lock on the same item. This may be avoided if the concurrency
control manager is properly designed.

Conclusion

• In conclusion, lock-based concurrency control in a database management system (DBMS)


uses locks to control access, avoid conflicts, and preserve the integrity of the database
across multiple users. The protocol seeks to achieve a balance between concurrency and
integrity by carefully controlling the acquisition and deletion of locks by operations.

Time-Stamp Based Protocols: Timestamp-based concurrency control is a method used in database


systems to ensure that transactions are executed safely and consistently without conflicts, even
when multiple transactions are being processed simultaneously. This approach relies on timestamps
to manage and coordinate the execution order of transactions. Refer to the timestamp of a
transaction T as TS(T).

What is Timestamp Ordering Protocol?

• The Timestamp Ordering Protocol is a method used in database systems to order


transactions based on their timestamps. A timestamp is a unique identifier assigned to each
transaction, typically determined using the system clock or a logical counter. Transactions
are executed in the ascending order of their timestamps, ensuring that older transactions
get higher priority.

• For example:

• If Transaction T1 enters the system first, it gets a timestamp TS(T1) = 007 (assumption).

• If Transaction T2 enters after T1, it gets a timestamp TS(T2) = 009 (assumption).

• This means T1 is “older” than T2 and T1 should execute before T2 to maintain consistency.

Key Features of Timestamp Ordering Protocol:

• Transaction Priority:

• Older transactions (those with smaller timestamps) are given higher priority.

• For example, if transaction T1 has a timestamp of 007 times and transaction T2 has a
timestamp of 009 times, T1 will execute first as it entered the system earlier.

• Early Conflict Management:

• Unlike lock-based protocols, which manage conflicts during execution, timestamp-based


protocols start managing conflicts as soon as a transaction is created.

• Ensuring Serializability:

• The protocol ensures that the schedule of transactions is serializable. This means the
transactions can be executed in an order that is logically equivalent to their timestamp
order.
Basic Timestamp ordering protocol:

The basic timestamp ordering methods make sure that any conflicting read and
write operations are executed.

1. Check the following condition whenever a transaction Ti issues a Read


(X) operation:

o If W_TS(X) >TS(Ti) then the operation is rejected.


o If W_TS(X) <= TS(Ti) then the operation is executed.
o Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues
a Write(X) operation:

o If TS(Ti) < R_TS(X) then the operation is rejected.


o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back
otherwise the operation is executed.
Where,

TS(TI) denotes the timestamp of the transaction Ti.

R_TS(X) denotes the Read time-stamp of data-item X.

W_TS(X) denotes the Write time-stamp of data-item X.

Advantages and Disadvantages of TO protocol:

o TO protocol ensures serializability since the precedence graph is as


follows:
o TS protocol ensures freedom from deadlock that means no transaction
ever waits.
o But the schedule may not be recoverable and may not even be cascade-
free.
To illustrate this mechanism, we shall consider two transactions T1 and
T2. Both the transaction subtracts a 200 from A and adds it to B. The
timestamps of T1 and T2 are 5:29 pm and 5:30 pm respectively. Initially
all the system clock is set to zero.

Once restarted the transaction T1 will acquire new timestamp (say 5:31 pm).
It can be verified that schedule produces the correct result even if T1 has
aborted but this is not always the case. This method guarantees that the
transactions are conflict serializable and the results are equivalent to a serial
schedule in which transaction are executed in the timestamp order. The basic
timestamp ordering method would be the same as if all transactions were
executed one after the other without any interference.

Basic Timestamp Ordering


The Basic Timestamp Ordering (TO) Protocol is a method in database systems that uses
timestamps to manage the order of transactions. Each transaction is assigned a unique
timestamp when it enters the system ensuring that all operations follow a specific order
making the schedule conflict-serializable and deadlock-free.
 Suppose, if an old transaction Ti has timestamp TS(Ti), a new transaction Tj is
assigned timestamp TS(Tj) such that TS(Ti) < TS(Tj).
 The protocol manages concurrent execution such that the timestamps determine the
serializability order.
 The timestamp ordering protocol ensures that any conflicting read and write operations
are executed in timestamp order.
 Whenever some Transaction T tries to issue a R_item(X) or a W_item(X), the Basic TO
algorithm compares the timestamp of T with R_TS(X) & W_TS(X) to ensure that the
Timestamp order is not violated.
This describes the Basic TO protocol in the following two cases:
Whenever a Transaction T issues a W_item(X) operation, check the following conditions:
 If R_TS(X) > TS(T) and if W_TS(X) > TS(T), then abort and rollback T and reject the
operation. else,
 Execute W_item(X) operation of T and set W_TS(X) to TS(T) to the larger of TS(T) and
current W_TS(X).
Whenever a Transaction T issues a R_item(X) operation, check the following conditions:
 If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
 If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set R_TS(X) to
the larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects two conflicting operations that occur in an
incorrect order, it rejects the latter of the two operations by aborting the Transaction that
issued it.
Advantages of Basic TO Protocol
 Conflict Serializable: Ensures all conflicting operations follow the timestamp order.
 Deadlock-Free: Transactions do not wait for resources, preventing deadlocks.
 Strict Ordering: Operations are executed in a predefined, conflict-free order based on
timestamps.
Drawbacks of Basic Timestamp Ordering (TO) Protocol
 Cascading Rollbacks : If a transaction is aborted, all dependent transactions must
also be aborted, leading to inefficiency.
 Starvation of Newer Transactions : Older transactions are prioritized, which can
delay or starve newer transactions.
 High Overhead: Maintaining and updating timestamps for every data item adds
significant system overhead.
 Inefficient for High Concurrency: The strict ordering can reduce throughput in
systems with many concurrent transactions.

Strict Timestamp Ordering


The Strict Timestamp Ordering Protocol is an enhanced version of the Basic Timestamp
Ordering Protocol. It ensures a stricter control over the execution of transactions to avoid
cascading rollbacks and maintain a more consistent schedule.
Key Features
 Strict Execution Order: Transactions must execute in the exact order of their
timestamps. Operations are delayed if executing them would violate the timestamp
order, ensuring a strict schedule.
 No Cascading Rollbacks: To avoid cascading aborts, a transaction must delay its
operations until all conflicting operations of older transactions are either committed or
aborted.
 Consistency and Serializability: The protocol ensures conflict-serializable schedules
by following strict ordering rules based on transaction timestamps.
For Read Operations (R_item(X)):
 A transaction T can read a data item X only if: W_TS(X), the timestamp of the last
transaction that wrote to X, is less than or equal to TS(T), the timestamp of T and the
transaction that last wrote to X has committed.
 If these conditions are not met, T’s read operation is delayed until they are satisfied.
For Write Operations (W_item(X)):
 A transaction T can write to a data item X only if: R_TS(X), the timestamp of the last
transaction that read X, and W_TS(X), the timestamp of the last transaction that wrote
to X, are both less than or equal to TS(T) and all transactions that previously read or
wrote X have committed.
 If these conditions are not met, T’s write operation is delayed until all conflicting
transactions are resolved.

Validation Based Protocol is also called Optimistic Concurrency Control Technique. This
protocol is used in DBMS (Database Management System) for avoiding concurrency in
transactions. It is called optimistic because of the assumption it makes, i.e. very less
interference occurs, therefore, there is no need for checking while the transaction is
executed.
In this technique, no checking is done while the transaction is been executed. Until the
transaction end is reached updates in the transaction are not applied directly to the
database. All updates are applied to local copies of data items kept for the transaction. At
the end of transaction execution, while execution of the transaction, a validation
phase checks whether any of transaction updates violate serializability. If there is no
violation of serializability the transaction is committed and the database is updated; or
else, the transaction is updated and then restarted.
Optimistic Concurrency Control is a three-phase protocol. The three phases for validation
based protocol:

1. Read Phase:
Values of committed data items from the database can be read by a transaction.
Updates are only applied to local data versions.

2. Validation Phase:
Checking is performed to make sure that there is no violation of serializability when the
transaction updates are applied to the database.

3. Write Phase:
On the success of the validation phase, the transaction updates are applied to the
database, otherwise, the updates are discarded and the transaction is slowed down.

The idea behind optimistic concurrency is to do all the checks at once; hence transaction
execution proceeds with a minimum of overhead until the validation phase is reached. If
there is not much interference among transactions most of them will have successful
validation, otherwise, results will be discarded and restarted later. These circumstances
are not much favourable for optimization techniques, since, the assumption of less
interference is not satisfied.
Validation based protocol is useful for rare conflicts. Since only local copies of data are
included in rollbacks, cascading rollbacks are avoided. This method is not favourable for
longer transactions because they are more likely to have conflicts and might be repeatedly
rolled back due to conflicts with short transactions.
In order to perform the Validation test, each transaction should go through the various
phases as described above. Then, we must know about the following three time-stamps
that we assigned to transaction T i, to check its validity:
1. Start(Ti): It is the time when T i started its execution.
2. Validation(Ti): It is the time when T i just finished its read phase and begin its validation
phase.
3. Finish(Ti): the time when Ti end it’s all writing operations in the database under write-
phase.
Two more terms that we need to know are:
1. Write_set: of a transaction contains all the write operations that T i performs.
2. Read_set: of a transaction contains all the read operations that T i performs.
In the Validation phase for transaction T i the protocol inspect that T i doesn’t overlap or
intervene with any other transactions currently in their validation phase or in committed.
The validation phase for T i checks that for all transaction T j one of the following below
conditions must hold to being validated or pass validation phase:
1. Finish(Tj)<Starts(Ti), since Tj finishes its execution means completes its write-phase
before Ti started its execution(read-phase). Then the serializability indeed maintained.
2. Ti begins its write phase after T j completes its write phase, and the read_set of T i should
be disjoint with write_set of T j.
3. Tj completes its read phase before T i completes its read phase and both read_set and
write_set of Ti are disjoint with the write_set of T j.
Ex: Here two Transactions T i and Tj are given, since TS(T j)<TS(Ti) so the validation
phase succeeds in the Schedule-A. It’s noteworthy that the final write operations to the
database are performed only after the validation of both Ti and T j. Since Ti reads the old
values of x(12) and y(15) while print(x+y) operation unless final write operation take
place.
Schedule-A
Schedule-A is a validated schedule

Advantages:
1. Avoid Cascading-rollbacks : This validation based scheme avoid cascading rollbacks
since the final write operations to the database are performed only after the transaction
passes the validation phase. If the transaction fails then no updation operation is
performed in the database. So no dirty read will happen hence possibilities cascading-
rollback would be null.
2. Avoid deadlock: Since a strict time-stamping based technique is used to maintain the
specific order of transactions. Hence deadlock isn’t possible in this scheme.
Disadvantages:
1. Starvation: There might be a possibility of starvation for long-term transactions, due to
a sequence of conflicting short-term transactions that cause the repeated sequence of
restarts of the long-term transactions so on and so forth. To avoid starvation, conflicting
transactions must be temporarily blocked for some time, to let the long-term transactions to
finish.

Deadlocks:

A deadlock is a condition wherein two or more tasks are waiting for each
other in order to be finished but none of the task is willing to give up the
resources that other task needs. In this situation no task ever gets finished
and is in waiting state forever.

Coffman conditions
Coffman stated four conditions for a deadlock occurrence. A deadlock may
occur if all the following conditions holds true.

 Mutual exclusion condition: There must be at least one resource that


cannot be used by more than one process at a time.
 Hold and wait condition: A process that is holding a resource can
request for additional resources that are being held by other processes in
the system.
 No preemption condition: A resource cannot be forcibly taken from a
process. Only the process can release a resource that is being held by it.
 Circular wait condition: A condition where one process is waiting for a
resource that is being held by second process and second process is
waiting for third process ….so on and the last process is waiting for the
first process. Thus making a circular chain of waiting.

Deadlock Handling
Ignore the deadlock (Ostrich algorithm)
Did that made you laugh? You may be wondering how ignoring a deadlock
can come under deadlock handling. But to let you know that the windows you
are using on your PC, uses this approach of deadlock handling and that is
reason sometimes it hangs up and you have to reboot it to get it working. Not
only Windows but UNIX also uses this approach.
The question is why? Why instead of dealing with a deadlock they
ignore it and why this is being called as Ostrich algorithm?

Well! Let me answer the second question first, This is known as Ostrich
algorithm because in this approach we ignore the deadlock and pretends that
it would never occur, just like Ostrich behavior “to stick one’s head in the sand
and pretend there is no problem.”

Let’s discuss why we ignore it: When it is believed that deadlocks are very
rare and cost of deadlock handling is higher, in that case ignoring is better
solution than handling it. For example: Let’s take the operating system
example – If the time requires handling the deadlock is higher than the time
requires rebooting the windows then rebooting would be a preferred choice
considering that deadlocks are very rare in windows.

Deadlock detection
Resource scheduler is one that keeps the track of resources allocated to and
requested by processes. Thus, if there is a deadlock it is known to the
resource scheduler. This is how a deadlock is detected.

Once a deadlock is detected it is being corrected by following methods:

 Terminating processes involved in deadlock: Terminating all the


processes involved in deadlock or terminating process one by one until
deadlock is resolved can be the solutions but both of these approaches
are not good. Terminating all processes cost high and partial work done
by processes gets lost. Terminating one by one takes lot of time because
each time a process is terminated, it needs to check whether the
deadlock is resolved or not. Thus, the best approach is considering
process age and priority while terminating them during a deadlock
condition.
 Resource Preemption: Another approach can be the preemption of
resources and allocation of them to the other processes until the
deadlock is resolved.

Deadlock prevention
We have learnt that if all the four Coffman conditions hold true then a
deadlock occurs so preventing one or more of them could prevent the
deadlock.
 Removing mutual exclusion: All resources must be sharable that
means at a time more than one processes can get a hold of the
resources. That approach is practically impossible.
 Removing hold and wait condition: This can be removed if the process
acquires all the resources that are needed before starting out. Another
way to remove this to enforce a rule of requesting resource when there
are none in held by the process.
 Preemption of resources: Preemption of resources from a process can
result in rollback and thus this needs to be avoided in order to maintain
the consistency and stability of the system.
 Avoid circular wait condition: This can be avoided if the resources are
maintained in a hierarchy and process can hold the resources in
increasing order of precedence. This avoid circular wait. Another way of
doing this to force one resource per process rule – A process can
request for a resource once it releases the resource currently being held
by it. This avoids the circular wait.

Deadlock Avoidance
Deadlock can be avoided if resources are allocated in such a way that it
avoids the deadlock occurrence. There are two algorithms for deadlock
avoidance.

 Wait/Die
 Wound/Wait

Here is the table representation of resource allocation for each algorithm. Both
of these algorithms take process age into consideration while determining the
best possible way of resource allocation for deadlock avoidance.

One of the famous deadlock avoidance algorithm is Banker’s algorithm


Failure Classification:

To find that where the problem has occurred, we generalize a failure into the following categories:

1. Transaction failure

2. System crash

3. Disk failure

1. Transaction failure

The transaction failure occurs when it fails to execute or when it reaches a point from where it can't
go any further. If a few transaction or process is hurt, then this is called as transaction failure.

Reasons for a transaction failure could be –

1. Logical errors: If a transaction cannot complete due to some code error or an internal error
condition, then the logical error occurs.

2. Syntax error: It occurs where the DBMS itself terminates an active transaction because the
database system is not able to execute it. For example, The system aborts an active transaction, in
case of deadlock or resource unavailability.

2. System Crash

System failure can occur due to power failure or other hardware or software failure. Example:
Operating system error.

Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be corrupted.
3. Disk Failure

It occurs where hard-disk drives or storage drives used to fail frequently. It was a common problem
in the early days of technology evolution.

Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability to the
disk or any other failure, which destroy all or part of disk storage.

Recovery and Atomicity:

 When a system crashes, it may have several transactions being executed and
various files opened for them to modify the data items.
 But according to ACID properties of DBMS, atomicity of transactions as a
whole must be maintained, that is, either all the operations are executed or
none.
 Database recovery means recovering the data when it get deleted, hacked or
damaged accidentally.
 Atomicity is must whether is transaction is over or not it should reflect in the
database permanently or it should not effect the database at all.
When a DBMS recovers from a crash, it should maintain the following −

 It should check the states of all the transactions, which were being executed.
 A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
 It should check whether the transaction can be completed now or it needs to
be rolled back.
 No transactions would be allowed to leave the DBMS in an inconsistent state.
When a DBMS recovers from a crash, it should maintain the following −

 It should check the states of all the transactions, which were being executed.
 A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
 It should check whether the transaction can be completed now or it needs to
be rolled back.
 No transactions would be allowed to leave the DBMS in an inconsistent state.

Log-Based Recovery
Log-based recovery is a widely used approach in database management
systems to recover from system failures and maintain atomicity and
durability of transactions. The fundamental idea behind log-based recovery
is to keep a log of all changes made to the database, so that after a failure,
the system can use the log to restore the database to a consistent state.

How Log-Based Recovery Works

1. Transaction Logging:
For every transaction that modifies the database, an entry is made in the log. This
entry typically includes:

 Transaction ID: A unique identifier for the transaction.


 Data item identifier: Identifier for the specific item being modified.
 OLD value: The value of the data item before the modification.
 NEW value: The value of the data item after the modification.
We represent an update log record as <\(T_i\) , \(X_j\) , \(V_1\), \(V_2\)>, indicating
that transaction \(T_i\) has performed a write on data item \(X_j\). \(X_j\) had value \
(V_1\) before the write, and has value \(V_2\) after the write. Other special log
records exist to record significant events during transaction processing, such as the
start of a transaction and the commit or abort of a transaction. Among the types of
log records are:

 <\(T_i\) start>. Transaction Ti has started.


 <\(T_i\) commit>. Transaction Ti has committed.
 <\(T_i\) abort>. Transaction Ti has aborted.

2. Writing to the Log


Before any change is written to the actual database (on disk), the corresponding log
entry is stored. This is called the Write-Ahead Logging (WAL) principle. By
ensuring that the log is written first, the system can later recover and apply or undo
any changes.

3. Checkpointing
Periodically, the DBMS might decide to take a checkpoint. A checkpoint is a point of
synchronization between the database and its log. At the time of a checkpoint:

 All the changes in main memory (buffer) up to that point are written to disk.
 A special entry is made in the log indicating a checkpoint. This helps in
reducing the amount of log that needs to be scanned during recovery.

4. Recovery Process
 Redo: If a transaction is identified (from the log) as having committed but its
changes have not been reflected in the database (due to a crash before the
changes could be written to disk), then the changes are reapplied using the
'After Image' from the log.
 Undo: If a transaction is identified as not having committed at the time of the
crash, any changes it made are reversed using the 'Before Image' in the log to
ensure atomicity.

5. Commit/Rollback
Once a transaction is fully complete, a commit record is written to the log. If
a transaction is aborted, a rollback record is written, and using the log, the
system undoes any changes made by this transaction.

Benefits of Log-Based Recovery


 Atomicity: Guarantees that even if a system fails in the middle of a transaction, the
transaction can be rolled back using the log.
 Durability: Ensures that once a transaction is committed, its effects are permanent
and can be reconstructed even after a system failure.
 Efficiency: Since logging typically involves sequential writes, it is generally faster
than random access writes to a database.

Shadow paging - Its Working principle


Shadow Paging is an alternative disk recovery technique to the more common
logging mechanisms. It's particularly suitable for database systems. The fundamental
concept behind shadow paging is to maintain two page tables during the lifetime of a
transaction: the current page table and the shadow page table.
Here's a step-by-step breakdown of the working principle of shadow paging:

Initialization
When the transaction begins, the database system creates a copy of the current
page table. This copy is called the shadow page table.
The actual data pages on disk are not duplicated; only the page table entries are.
This means both the current and shadow page tables point to the same data pages
initially.

During Transaction Execution


When a transaction modifies a page for the first time, a copy of the page is made.
The current page table is updated to point to this new page.
Importantly, the shadow page table remains unaltered and continues pointing to the
original, unmodified page.
Any subsequent changes by the same transaction are made to the copied page, and
the current page table continues to point to this copied page.

On Transaction Commit
Once the transaction reaches a commit point, the shadow page table is discarded,
and the current page table becomes the new "truth" for the database state.
The old data pages that were modified during the transaction (and which the shadow
page table pointed to) can be reclaimed.

Recovery after a Crash


If a crash occurs before the transaction commits, recovery is straightforward. Since
the original data pages (those referenced by the shadow page table) were never
modified, they still represent a consistent database state.
The system simply discards the changes made during the transaction (i.e., discards
the current page table) and reverts to the shadow page table.

Storage
There are three types of storage:
 volatile storage
 nonvolatile storage
 stable storage

In stable storage, several copies of information are stored


on various disk blocks.
Data Access :
The transactions input information from the disk to the
main memory and then output the information back onto
the disk.

Physical Block: The blocks residing on the disk are


known as ‘Physical Block’.
Buffer Block: The blocks residing temporarily in the main
memory
are known as ‘Buffer Block’.

Disk Buffer: The area of memory where a block resides


temporarily is called the ‘Disk Buffer’.

Block Transfer:
a) Input (B): Transfers the physical block B to main
memory.
b) Output (B): Transfers the buffer block B to the disk
and replaces the appropriate physical block there.
c) Read (X): Assigns the value of data item (X) to the
local variable x. It executes the following operations :

i) If block Bx on which X resides is not in main memory, it


issues input(Bx).
ii) It assigns to x the value of X from the buffer block.
d) write (X): Assigns the value of local variable x to data
item X in the buffer block. It executes the following
operations :
i) If block Bx on which X resides is not in main memory, it
issues input(Bx).
ii) It assigns the value of x to X in buffer Bx.

Recovery Algorithm
ARIES recovers from a system crash in three phases.

a) Analysis Pass: This pass determines which


transactions to undo, which pages were dirty at the time
of the crash, and the LSN from which the redo pass should
start.
b) Redo Pass: This pass starts from a position
determined during analysis, and performs a redo,
repeating history, to bring the database to a state it was
in before the crash.

c) Undo Pass: This pass rolls back all transactions that


were incomplete at the time of the crash.

Indexing
Indexing is used to optimise the performance of a
database system.
It reduces the required number of disk accesses to a
minimum when a query is executed.

There are 4 types of Indexing methods:


1. Ordered Indices
2. Primary Index
3. Clustering Index
4. Secondary Index

An index for a file in a database system works in the same


way as the index in a textbook.

The index record contains two parts :

 Search-key value
 Pointer

Here, the search-key value is the input and the pointer is


the output.

For a given search-key value the database system looks


for the record, to which the pointer indicates.
Indices are categorized in two ways:

1. Ordered Indices: Based on a sorted ordering of the values.

1. Hash Indices: Based on a uniform distribution of values across a


range of buckets.

The bucket to which a value is assigned is determined by


a function, called a ‘hash function’.
A file may have several indices, based on the different
search keys.

An index entry consists of a search-key value and the


pointers to one or more records with that value as their
search-key value.

B+ Tree Index Files


A B+ Tree Index is a multilevel index.

A B+ Tree is a rooted tree satisfying the following


properties :

1. All paths from the root to leaf are equally long.


2. If a node isn’t a root or a leaf, it has between [n / 2]
and ‘n’ children.
3. A leaf node has between [(n-1) / 2] and ‘n-1’ values.
The structure of any node of this tree is :

Example- 1: Construct a B+ Tree for the following search


key values,
{10, 20, 30, 40 }
where n = 3 ( n is number of pointers)
Example- 2: Construct a B+ Tree for the following search
key values, Where n = 4.
{10, 30, 40, 50, 60, 70, 90 }

Now, Let’s Insert and Delete some elements into this tree.
Insert 25,75
When we insert an element, we add it on the next right
node of the value lower than the inserting element.

Delete 70

Here, when you delete any element. The element that has
been deleted will be replaced with the element on the
right.

You might also like