Understanding Database Transactions and ACID
Understanding Database Transactions and ACID
Transaction: Any logical work or set of works that are done on the data of a
database is known as a transaction. Logical work can be inserting a new value in
the current database, deleting existing values, or updating the current values in
the database.
o Initialization if transaction
o Inserting the ATM card into the machine
o Choosing the language
o Choosing the account type
o Entering the cash amount
o Entering the pin
o Collecting the cash
o Aborting the transaction
What is a Transaction State?
A transaction is a set of operations or tasks performed to complete a logical
process, which may or may not change the data in a database. To handle
different situations, like system failures, a transaction is divided into different
states.
A transaction state refers to the current phase or condition of a transaction
during its execution in a database. It represents the progress of the transaction
and determines whether it will successfully complete (commit) or fail (abort).
A transaction involves two main operations:
1. Read Operation: Reads data from the database, stores it temporarily in
memory (buffer), and uses it as needed.
2. Write Operation: Updates the database with the changed data using the
buffer.
From the start of executing instructions to the end, these operations are
treated as a single transaction. This ensures the database remains consistent
and reliable throughout the process.
Different Types of Transaction States in DBMS
These are different types of Transaction States :
1. Active State – This is the first stage of a transaction, when the transaction’s
instructions are being executed.
It is the first stage of any transaction when it has begun to execute. The
execution of the transaction takes place in this state.
Operations such as insertion, deletion, or updation are performed during this
state.
During this state, the data records are under manipulation and they are not
saved to the database, rather they remain somewhere in a buffer in the
main memory.
2. Partially Committed –
The transaction has finished its final operation, but the changes are still not
saved to the database.
After completing all read and write operations, the modifications are initially
stored in main memory or a local buffer. If the changes are made permanent
on the DataBase then the state will change to “committed state” and in
case of failure it will go to the “failed state”.
3. Failed State –If any of the transaction-related operations cause an error
during the active or partially committed state, further execution of the
transaction is stopped and it is brought into a failed state. Here, the database
recovery system makes sure that the database is in a consistent state.
5. Aborted State- If a transaction reaches the failed state due to a failed
check, the database recovery system will attempt to restore it to a consistent
state. If recovery is not possible, the transaction is either rolled back or
cancelled to ensure the database remains consistent.
In the aborted state, the DBMS recovery system performs one of two actions:
Kill the transaction: The system terminates the transaction to prevent it
from affecting other operations.
Restart the transaction: After making necessary adjustments, the system
reverts the transaction to an active state and attempts to continue its
execution.
6. Commited- This state of transaction is achieved when all the transaction-
related operations have been executed successfully along with the Commit
operation, i.e. data is saved into the database after the required manipulations
in this state. This marks the successful completion of a transaction.
7. Terminated State – If there isn’t any roll-back or the transaction comes
from the “committed state”, then the system is consistent and ready for new
transaction and the old transaction is terminated.
Atomicity:
By this, we mean that either the entire transaction takes place at once or
doesn’t happen at all. There is no midway i.e. transactions do not occur
partially. Each transaction is considered as one unit and either runs to
completion or is not executed at all. It involves the following two operations.
— Abort : If a transaction aborts, changes made to the database are not
visible.
— Commit : If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2 : Transfer of 100
from account X to account Y .
Suppose T has been executed till Read (Y) and then T’’ starts. As a result,
interleaving of operations takes place due to which T’’ reads the correct value
of X but the incorrect value of Y and sum computed by
T’’: (X+Y = 50, 000+500=50, 500) .
is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450) .
This results in database inconsistency, due to a loss of 50 units. Hence,
transactions must take place in isolation and changes should be visible only
after they have been made to the main memory.
Durability:
This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk and
they persist even if a system failure occurs. These updates now become
permanent and are stored in non-volatile memory. The effects of the
transaction, thus, are never lost.
Advantages of ACID Properties in DBMS
1. Data Consistency: ACID properties ensure that the data remains
consistent and accurate after any transaction execution.
2. Data Integrity: ACID properties maintain the integrity of the data by
ensuring that any changes to the database are permanent and cannot be
lost.
3. Concurrency Control: ACID properties help to manage multiple
transactions occurring concurrently by preventing interference between
them.
4. Recovery: ACID properties ensure that in case of any failure or crash, the
system can recover the data up to the point of failure or crash.
Disadvantages of ACID Properties in DBMS
1. Performance: The ACID properties can cause a performance overhead in
the system, as they require additional processing to ensure data consistency
and integrity.
2. Scalability: The ACID properties may cause scalability issues in large
distributed systems where multiple transactions occur concurrently.
3. Complexity: Implementing the ACID properties can increase the complexity
of the system and require significant expertise and resources.
Overall, the advantages of ACID properties in DBMS outweigh the
disadvantages. They provide a reliable and consistent approach to data
management, ensuring data integrity, accuracy, and reliability. However, in
some cases, the overhead of implementing ACID properties can cause
performance and scalability issues. Therefore, it’s important to balance the
benefits of ACID properties against the specific needs and requirements of
the system.
Concurrent Executions in DBMS
Concurrent execution refers to the simultaneous execution of more than one transaction.
This is a common scenario in multi-user database environments where many users or
applications might be accessing or modifying the database at the same time. Concurrent
execution is crucial for achieving high throughput and efficient resource utilization. However,
it introduces the potential for conflicts and data inconsistencies.
In a DBMS, concurrent execution might offer a number of issues that need to be resolved in
order to guarantee accurate and dependable database operation. Some of the issues with
concurrent execution in DBMS include the following –
• 1. Lost Update: When two or more transactions try to update the same data item at
the same time, a lost update happens, and the outcome relies on the sequence in
which the transactions are executed. The modifications made by the other
transaction will be lost if one transaction overwrites them before they are committed.
Inconsistent data and inaccurate findings might occur from lost updates.
• 2. Dirty Read: When a transaction accesses data that has already been updated but
hasn't been committed, it's known as a dirty read. The information read by the first
transaction will be invalid if the modifying transaction rolls back. Data discrepancies
and inaccurate outcomes might occur from dirty readings.
• 3. Non-Repeatable Read: When a transaction reads the same data item twice and
the data is updated by another transaction between the two reads, this is known as a
non-repeatable read. This might result in discrepancies in outcomes and data.
• 4. Phantom Read: When a transaction reads a group of rows that meet a given
criterion and a subsequent transaction adds or deletes rows that meet the same
requirement, this is known as a phantom read. The same set of data will have new
rows that were not present the first time when the initial transaction reads them.
Results and data discrepancies may emerge from this.
What is a Schedule?
• Serial Schedule: When one transaction completely executes before starting another
transaction, the schedule is called serial schedule. A serial schedule is always
consistent. e.g.; If a schedule S has debit transaction T1 and credit transaction T2,
possible serial schedules are T1 followed by T2 (T1->T2) or T2 followed by T1 ((T1-
>T2). A serial schedule has low throughput and less resource utilization.
• When multiple transactions are running concurrently then there is a possibility that
the database may be left in an inconsistent state.
• Serializability is a concept that helps us to check which schedules are serializable.
• A serializable schedule is the one that always leaves the database in consistent
state.
Types of Serializability:
CONFLICT SERIALIZABILITY:
• Conflicting operations: Two operations are said to be in conflict, if they satisfy all
the following three conditions:
• 1. Both the operations should belong to different transactions.
• 2. Both the operations are working on same data item.
• 3. At least one of the operation is a write operation.
• Example 1: Operation W(X) of transaction T1 and operation R(X) of transaction T2
are conflicting operations, because they satisfy all the three conditions mentioned
above. They belong to different transactions, they are working on same data item X,
one of the operation in write operation.
• Example 2: Similarly Operations W(X) of T1 and W(X) of T2 are conflicting
operations.
• Example 3: Operations W(X) of T1 and W(Y) of T2 are non-conflicting operations
because both the write operations are not working on same data item so these
operations don’t satisfy the second condition.
• Example 4: Similarly R(X) of T1 and R(X) of T2 are non-conflicting operations
because none of them is write operation.
• Example 5: Similarly W(X) of T1 and R(X) of T1 are non-conflicting operations
because both the operations belong to same transaction T1.
Two schedules are said to be conflict Equivalent if one schedule can be converted into other
schedule after swapping non-conflicting operations.
If a schedule is conflict Equivalent to its serial schedule then it is called Conflict Serializable
schedule. Lets take few examples of schedules.
Conflict Equivalent:
VIEW SERIALIZABILITY: View Serializability is a process to find out that a given schedule is
view serializable or not.
• 1. Initial Read: Initial read of each data item in transactions must match
in both schedules. For example, if transaction T1 reads a data item X
before transaction T2 in schedule S1 then in schedule S2, T1 should read X
before T2.
• 2. Final Write: Final write operations on each data item must match in
both the schedules. For example, a data item X is last written by
Transaction T1 in schedule S1 then in S2, the last write operation on X
should be performed by the transaction T1.
If a schedule is view equivalent to its serial schedule then the given schedule is
said to be View Serializable. Lets take an example.
• Lets check the three conditions of view serializability:
• Initial Read
• Final Write
• Update Read
• In S1, transaction T2 reads the value of X, written by T1. In S2, the same
transaction T2 reads the X after it is written by T1.
• In S1, transaction T2 reads the value of Y, written by T1. In S2, the same
transaction T2 reads the value of Y after it is updated by T1.
• The update read condition is also satisfied for both the schedules.
• Result: Since all the three conditions that checks whether the two
schedules are view equivalent are satisfied in this example, which means
S1 and S2 are view equivalent. Also, as we know that the schedule S2 is
the serial schedule of S1, thus we can say that the schedule S1 is view
serializable schedule.
Recoverable Schedule:
Implementation of Isolation:
Isolation is one of the core ACID properties of a database transaction, ensuring that the
operations of one transaction remain hidden from other transactions until completion. It
means that no two transactions should interfere with each other and affect the other's
intermediate state.
Isolation Levels
Isolation levels defines the degree to which a transaction must be isolated from the data
modifications made by any other transaction in the database system. There are four levels of
transaction isolation defined by SQL -
• Read Uncommitted – Read Uncommitted is the lowest isolation level. In
this level, one transaction may read not yet committed changes made by
other transaction, thereby allowing dirty reads. In this level, transactions
are not isolated from each other.
• Read Committed – This isolation level guarantees that any data read is
committed at the moment it is read. Thus it does not allows dirty read.
The transaction holds a read or write lock on the current row, and thus
prevent other transactions from reading, updating or deleting it.
Assume a schedule S. For S, we construct a graph known as precedence graph. This graph has a pair
G = (V, E), where V consists a set of vertices, and E consists a set of edges. The set of vertices is used
to contain all the transactions participating in the schedule. The set of edges is used to contain all
edges Ti ->Tj for which one of the three conditions holds:
If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are executed
before the first instruction of Tj is executed.
If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the precedence
graph has no cycle, then S is known as serializable.
For example:
Explanation:
The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is nonserializable.
Example2:
Explanation:
Concurrency Protocols: Concurrency control protocols in DBMS ensure that multiple transactions
execute concurrently without leading to data inconsistency. They maintain ACID (Atomicity,
Consistency, Isolation, Durability) properties by preventing issues like lost updates, dirty reads, and
uncommitted data dependencies.
• Lock-Based Protocols
• Timestamp-Based Protocols
Lock-Based Protocols: lock-based concurrency control (BCC) is a method used to manage how
multiple transactions access the same data. This protocol ensures data consistency and integrity
when multiple users interact with the database simultaneously.
This method uses locks to manage access to data, ensuring transactions don’t clash and everything
runs smoothly when multiple transactions happen at the same time.
What is a Lock?
• A lock is a variable associated with a data item that indicates whether it is currently in use or
available for other operations. Locks are essential for managing access to data during
concurrent transactions. When one transaction is accessing or modifying a data item, a lock
ensures that other transactions cannot interfere with it, maintaining data integrity and
preventing conflicts. This process, known as locking, is a widely used method to ensure
smooth and consistent operation in database systems.
• Lock-Based Protocols in DBMS ensure that a transaction cannot read or write data until it
gets the necessary lock. Here’s how they work:
• These protocols prevent concurrency issues by allowing only one transaction to access a
specific data item at a time.
• Locks help multiple transactions work together smoothly by managing access to the
database items.
• A transaction must acquire a read lock or write lock on a data item before performing any
read or write operations on it.
Types of Lock
• Shared Lock (S): Shared Lock is also known as Read-only lock. As the name suggests it can be
shared between transactions because while holding this lock the transaction does not have
the permission to update data on the data item. S-lock is requested using lock-S instruction.
• Exclusive Lock (X): Data item can be both read as well as written. This is Exclusive and
cannot be held simultaneously on the same data item. X-lock is requested using lock-X
instruction.
Rules of Locking
• If a Transaction has a Read lock on a data item, it can read the item but not update it.
• If a transaction has a Read lock on the data item, other transaction can obtain Read Lock on
the data item but no Write Locks.
• If a transaction has a write Lock on a data item, it can both read and update the data item.
• If a transaction has a write Lock on the data item, then other transactions cannot obtain
either a Read lock or write lock on the data item.
1. Simplistic Lock Protocol: It is the simplest method for locking data during a transaction. Simple
lock-based protocols enable all transactions to obtain a lock on the data before inserting, deleting, or
updating it. It will unlock the data item once the transaction is completed.
Example:
• Transactions:
Steps:
• T2 now gets the shared lock and reads the updated value X = 20.
This example shows how simplistic lock protocols handle concurrency but do not prevent problems
like deadlocks or limits concurrency.
2. Pre-Claiming Lock Protocol: The Pre-Claiming Lock Protocol evaluates a transaction to identify all
the data items that require locks. Before the transaction begins, it requests the database
management system to grant locks on all necessary data elements. If all the requested locks are
successfully acquired, the transaction proceeds. Once the transaction is completed, all locks are
released. However, if any of the locks are unavailable, the transaction rolls back and waits until all
required locks are granted before restarting.
Example:
– A write lock on X.
– A read lock on Y.
• Since both locks are available, the system grants them. T1 starts execution:
– It updates X.
– A read lock on X.
• However, since T1 already holds a write lock on X, T2’s request is denied. T2 must wait until
T1 completes its operations and releases the locks.
• Once T1 finishes, it releases the locks on X and Y. The system now grants the read lock
on X to T2, allowing it to proceed.
This method is simple but may lead to inefficiency in systems with a high number of
transactions.
3. Two-phase locking (2PL): A transaction is said to follow the Two-Phase Locking protocol if Locking
and Unlocking can be done in two phases :
• Growing Phase: New locks on data items may be acquired but none can be released.
• Shrinking Phase: Existing locks may be released but no new locks can be acquired.
• Does not prevent cascading rollbacks (if a transaction releases a lock early and another
transaction reads uncommitted data, a rollback of the first transaction may force others to
rollback).
• Prone to deadlocks because locks are released at different times.
Example:
T1 T2
X(A)
W(A)
Unlock(A)
X(A)
W(A)
X(B)
Commit
In the above example X(B) in not allowed since T1 released A early if, it later rollbacks T2 may read
incorrect data causing cascading rollback. i.e., In two phase locking protocol once a transaction
releases any lock, it enters shrinking phase meaning no new locks can be acquired.
4. Strict Two-Phase Locking Protocol: Strict Two-Phase Locking requires that in addition to the 2-
PL all Exclusive(X) locks held by the transaction be released until after the Transaction Commits.
• Prevents cascading rollbacks (ensures that transactions read only committed data).
• In the given execution scenario, T1 holds an exclusive lock on B, while T2 holds a shared lock
on A. At Statement 7, T2 requests a lock on B, and at Statement 8, T1 requests a lock on A.
This situation creates a deadlock, as both transactions are waiting for resources held by the
other, preventing either from proceeding with their execution.
2. Starvation
• Starvation is also possible if concurrency control manager is badly designed. For example: A
transaction may be waiting for an X-lock on an item, while a sequence of other transactions
request and are granted an S-lock on the same item. This may be avoided if the concurrency
control manager is properly designed.
Conclusion
• For example:
• If Transaction T1 enters the system first, it gets a timestamp TS(T1) = 007 (assumption).
• This means T1 is “older” than T2 and T1 should execute before T2 to maintain consistency.
• Transaction Priority:
• Older transactions (those with smaller timestamps) are given higher priority.
• For example, if transaction T1 has a timestamp of 007 times and transaction T2 has a
timestamp of 009 times, T1 will execute first as it entered the system earlier.
• Ensuring Serializability:
• The protocol ensures that the schedule of transactions is serializable. This means the
transactions can be executed in an order that is logically equivalent to their timestamp
order.
Basic Timestamp ordering protocol:
The basic timestamp ordering methods make sure that any conflicting read and
write operations are executed.
Once restarted the transaction T1 will acquire new timestamp (say 5:31 pm).
It can be verified that schedule produces the correct result even if T1 has
aborted but this is not always the case. This method guarantees that the
transactions are conflict serializable and the results are equivalent to a serial
schedule in which transaction are executed in the timestamp order. The basic
timestamp ordering method would be the same as if all transactions were
executed one after the other without any interference.
Validation Based Protocol is also called Optimistic Concurrency Control Technique. This
protocol is used in DBMS (Database Management System) for avoiding concurrency in
transactions. It is called optimistic because of the assumption it makes, i.e. very less
interference occurs, therefore, there is no need for checking while the transaction is
executed.
In this technique, no checking is done while the transaction is been executed. Until the
transaction end is reached updates in the transaction are not applied directly to the
database. All updates are applied to local copies of data items kept for the transaction. At
the end of transaction execution, while execution of the transaction, a validation
phase checks whether any of transaction updates violate serializability. If there is no
violation of serializability the transaction is committed and the database is updated; or
else, the transaction is updated and then restarted.
Optimistic Concurrency Control is a three-phase protocol. The three phases for validation
based protocol:
1. Read Phase:
Values of committed data items from the database can be read by a transaction.
Updates are only applied to local data versions.
2. Validation Phase:
Checking is performed to make sure that there is no violation of serializability when the
transaction updates are applied to the database.
3. Write Phase:
On the success of the validation phase, the transaction updates are applied to the
database, otherwise, the updates are discarded and the transaction is slowed down.
The idea behind optimistic concurrency is to do all the checks at once; hence transaction
execution proceeds with a minimum of overhead until the validation phase is reached. If
there is not much interference among transactions most of them will have successful
validation, otherwise, results will be discarded and restarted later. These circumstances
are not much favourable for optimization techniques, since, the assumption of less
interference is not satisfied.
Validation based protocol is useful for rare conflicts. Since only local copies of data are
included in rollbacks, cascading rollbacks are avoided. This method is not favourable for
longer transactions because they are more likely to have conflicts and might be repeatedly
rolled back due to conflicts with short transactions.
In order to perform the Validation test, each transaction should go through the various
phases as described above. Then, we must know about the following three time-stamps
that we assigned to transaction T i, to check its validity:
1. Start(Ti): It is the time when T i started its execution.
2. Validation(Ti): It is the time when T i just finished its read phase and begin its validation
phase.
3. Finish(Ti): the time when Ti end it’s all writing operations in the database under write-
phase.
Two more terms that we need to know are:
1. Write_set: of a transaction contains all the write operations that T i performs.
2. Read_set: of a transaction contains all the read operations that T i performs.
In the Validation phase for transaction T i the protocol inspect that T i doesn’t overlap or
intervene with any other transactions currently in their validation phase or in committed.
The validation phase for T i checks that for all transaction T j one of the following below
conditions must hold to being validated or pass validation phase:
1. Finish(Tj)<Starts(Ti), since Tj finishes its execution means completes its write-phase
before Ti started its execution(read-phase). Then the serializability indeed maintained.
2. Ti begins its write phase after T j completes its write phase, and the read_set of T i should
be disjoint with write_set of T j.
3. Tj completes its read phase before T i completes its read phase and both read_set and
write_set of Ti are disjoint with the write_set of T j.
Ex: Here two Transactions T i and Tj are given, since TS(T j)<TS(Ti) so the validation
phase succeeds in the Schedule-A. It’s noteworthy that the final write operations to the
database are performed only after the validation of both Ti and T j. Since Ti reads the old
values of x(12) and y(15) while print(x+y) operation unless final write operation take
place.
Schedule-A
Schedule-A is a validated schedule
Advantages:
1. Avoid Cascading-rollbacks : This validation based scheme avoid cascading rollbacks
since the final write operations to the database are performed only after the transaction
passes the validation phase. If the transaction fails then no updation operation is
performed in the database. So no dirty read will happen hence possibilities cascading-
rollback would be null.
2. Avoid deadlock: Since a strict time-stamping based technique is used to maintain the
specific order of transactions. Hence deadlock isn’t possible in this scheme.
Disadvantages:
1. Starvation: There might be a possibility of starvation for long-term transactions, due to
a sequence of conflicting short-term transactions that cause the repeated sequence of
restarts of the long-term transactions so on and so forth. To avoid starvation, conflicting
transactions must be temporarily blocked for some time, to let the long-term transactions to
finish.
Deadlocks:
A deadlock is a condition wherein two or more tasks are waiting for each
other in order to be finished but none of the task is willing to give up the
resources that other task needs. In this situation no task ever gets finished
and is in waiting state forever.
Coffman conditions
Coffman stated four conditions for a deadlock occurrence. A deadlock may
occur if all the following conditions holds true.
Deadlock Handling
Ignore the deadlock (Ostrich algorithm)
Did that made you laugh? You may be wondering how ignoring a deadlock
can come under deadlock handling. But to let you know that the windows you
are using on your PC, uses this approach of deadlock handling and that is
reason sometimes it hangs up and you have to reboot it to get it working. Not
only Windows but UNIX also uses this approach.
The question is why? Why instead of dealing with a deadlock they
ignore it and why this is being called as Ostrich algorithm?
Well! Let me answer the second question first, This is known as Ostrich
algorithm because in this approach we ignore the deadlock and pretends that
it would never occur, just like Ostrich behavior “to stick one’s head in the sand
and pretend there is no problem.”
Let’s discuss why we ignore it: When it is believed that deadlocks are very
rare and cost of deadlock handling is higher, in that case ignoring is better
solution than handling it. For example: Let’s take the operating system
example – If the time requires handling the deadlock is higher than the time
requires rebooting the windows then rebooting would be a preferred choice
considering that deadlocks are very rare in windows.
Deadlock detection
Resource scheduler is one that keeps the track of resources allocated to and
requested by processes. Thus, if there is a deadlock it is known to the
resource scheduler. This is how a deadlock is detected.
Deadlock prevention
We have learnt that if all the four Coffman conditions hold true then a
deadlock occurs so preventing one or more of them could prevent the
deadlock.
Removing mutual exclusion: All resources must be sharable that
means at a time more than one processes can get a hold of the
resources. That approach is practically impossible.
Removing hold and wait condition: This can be removed if the process
acquires all the resources that are needed before starting out. Another
way to remove this to enforce a rule of requesting resource when there
are none in held by the process.
Preemption of resources: Preemption of resources from a process can
result in rollback and thus this needs to be avoided in order to maintain
the consistency and stability of the system.
Avoid circular wait condition: This can be avoided if the resources are
maintained in a hierarchy and process can hold the resources in
increasing order of precedence. This avoid circular wait. Another way of
doing this to force one resource per process rule – A process can
request for a resource once it releases the resource currently being held
by it. This avoids the circular wait.
Deadlock Avoidance
Deadlock can be avoided if resources are allocated in such a way that it
avoids the deadlock occurrence. There are two algorithms for deadlock
avoidance.
Wait/Die
Wound/Wait
Here is the table representation of resource allocation for each algorithm. Both
of these algorithms take process age into consideration while determining the
best possible way of resource allocation for deadlock avoidance.
To find that where the problem has occurred, we generalize a failure into the following categories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from where it can't
go any further. If a few transaction or process is hurt, then this is called as transaction failure.
1. Logical errors: If a transaction cannot complete due to some code error or an internal error
condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction because the
database system is not able to execute it. For example, The system aborts an active transaction, in
case of deadlock or resource unavailability.
2. System Crash
System failure can occur due to power failure or other hardware or software failure. Example:
Operating system error.
Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be corrupted.
3. Disk Failure
It occurs where hard-disk drives or storage drives used to fail frequently. It was a common problem
in the early days of technology evolution.
Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability to the
disk or any other failure, which destroy all or part of disk storage.
When a system crashes, it may have several transactions being executed and
various files opened for them to modify the data items.
But according to ACID properties of DBMS, atomicity of transactions as a
whole must be maintained, that is, either all the operations are executed or
none.
Database recovery means recovering the data when it get deleted, hacked or
damaged accidentally.
Atomicity is must whether is transaction is over or not it should reflect in the
database permanently or it should not effect the database at all.
When a DBMS recovers from a crash, it should maintain the following −
It should check the states of all the transactions, which were being executed.
A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs to
be rolled back.
No transactions would be allowed to leave the DBMS in an inconsistent state.
When a DBMS recovers from a crash, it should maintain the following −
It should check the states of all the transactions, which were being executed.
A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs to
be rolled back.
No transactions would be allowed to leave the DBMS in an inconsistent state.
Log-Based Recovery
Log-based recovery is a widely used approach in database management
systems to recover from system failures and maintain atomicity and
durability of transactions. The fundamental idea behind log-based recovery
is to keep a log of all changes made to the database, so that after a failure,
the system can use the log to restore the database to a consistent state.
1. Transaction Logging:
For every transaction that modifies the database, an entry is made in the log. This
entry typically includes:
3. Checkpointing
Periodically, the DBMS might decide to take a checkpoint. A checkpoint is a point of
synchronization between the database and its log. At the time of a checkpoint:
All the changes in main memory (buffer) up to that point are written to disk.
A special entry is made in the log indicating a checkpoint. This helps in
reducing the amount of log that needs to be scanned during recovery.
4. Recovery Process
Redo: If a transaction is identified (from the log) as having committed but its
changes have not been reflected in the database (due to a crash before the
changes could be written to disk), then the changes are reapplied using the
'After Image' from the log.
Undo: If a transaction is identified as not having committed at the time of the
crash, any changes it made are reversed using the 'Before Image' in the log to
ensure atomicity.
5. Commit/Rollback
Once a transaction is fully complete, a commit record is written to the log. If
a transaction is aborted, a rollback record is written, and using the log, the
system undoes any changes made by this transaction.
Initialization
When the transaction begins, the database system creates a copy of the current
page table. This copy is called the shadow page table.
The actual data pages on disk are not duplicated; only the page table entries are.
This means both the current and shadow page tables point to the same data pages
initially.
On Transaction Commit
Once the transaction reaches a commit point, the shadow page table is discarded,
and the current page table becomes the new "truth" for the database state.
The old data pages that were modified during the transaction (and which the shadow
page table pointed to) can be reclaimed.
Storage
There are three types of storage:
volatile storage
nonvolatile storage
stable storage
Block Transfer:
a) Input (B): Transfers the physical block B to main
memory.
b) Output (B): Transfers the buffer block B to the disk
and replaces the appropriate physical block there.
c) Read (X): Assigns the value of data item (X) to the
local variable x. It executes the following operations :
Recovery Algorithm
ARIES recovers from a system crash in three phases.
Indexing
Indexing is used to optimise the performance of a
database system.
It reduces the required number of disk accesses to a
minimum when a query is executed.
Search-key value
Pointer
Now, Let’s Insert and Delete some elements into this tree.
Insert 25,75
When we insert an element, we add it on the next right
node of the value lower than the inserting element.
Delete 70
Here, when you delete any element. The element that has
been deleted will be replaced with the element on the
right.