Unit - 4 - Transaction Processing
Unit - 4 - Transaction Processing
• One criterion for classifying a database system is according to the number of users who can
use the system concurrently.
• A DBMS is single-user if at most one user at a time can use the system, and it is multiuser if
many users can use the system at a time.
• Single-user DBMSs are mostly restricted to personal computer systems; most other DBMSs are
multiuser.
• For example, an airline reservations system is used by hundreds of travel agents and
reservation clerks concurrently.
• Multiple users can access databases simultaneously because of the concept of
multiprogramming, which allows the computer to execute multiple programs or processes at
the same time.
• If only a single central processing unit (CPU) exists, it can actually execute at most one process
at a time. However, multiprogramming operating systems execute some commands from one
process, then suspend that process and execute some commands from the next process and
so on.
• A process is resumed at the point where it was suspended whenever it gets its turn to use the
CPU again.
• Hence, concurrent execution of processes is actually interleaved, as illustrated in figure below,
which shows two processes A and B executing concurrently in an interleaved fashion. If the
computer system has multiple hardware processors (CPUs), parallel processing of multiple
processes is possible, as illustrated by processes C and D in the figure given below.
Write about Various Transaction Operations/
• write_item(X): Writes the value of program variable X into the database item named X.
3. Copy item X from the program variable named X into its correct location in the buffer.
Several problems can occur when concurrent transactions execute in an uncontrolled manner. The
types of problems we may encounter with these transactions if they run concurrently.
• This problem occurs when two transactions that access the same database items have their
operations interleaved in a way that makes the value of some database items incorrect.
• Suppose the transactions T1 and T2 are submitted at approximately the same time, and
suppose that their operations are interleaved as shown in the above figure (a); then the final
value of item X is incorrect because T2 reads the value of X before T1 changes it in the
database, and hence the updated value resulting from T1 is lost.
• For example, if X=80 at the start, N=5, and M=4, the final result should be x=79; but in the
interleaving of operations shown in the above figure (a), it is X=84 because the update in T1,
that removed the five seats from X was lost.
• This problem occurs when one transaction updates a database item and then the transaction
fails for some reason.
• The updated item is accessed by another transaction before it is changed back to its original
value.
• The above figure (b) shows an example where T1 updates item X and then fails before
completion, so the system must change X back to its original value. Before it can do so,
however, transaction T2 reads the temporary value of X, which will not be recorded
permanently in the database because of the failure of T1. This type of problem is known as
dirty read problem.
Failures are generally classified as transaction, system, and media failures. There are several possible
reasons for a transaction to fail in the middle of execution:
• A hardware, software, or network error occurs in the computer system during transaction
execution.
• Hardware crashes are usually media failures – for example, main memory failure.
• Some operations in the transaction may cause it to fail, such as integer overflow or division by
zero.
• Transaction failure may also occur because of erroneous parameter values or because of a
logical programming error.
• The concurrency control method may decide to abort the transaction, to be restarted later,
because several transactions are in a state of deadlock
5. Disk failure:
• Some disk blocks may lose their data because of a read or write malfunction of because of a
disk read/write head crash.
• This refers to an endless list of problems that includes power or air-conditioning failure, fire,
theft, overwriting disks or tapes by mistake, etc.
• A transaction is an atomic unit of work that is either completed in its entirety or not done at
all.
• For recovery purposes, the system needs to keep track of when the transaction starts,
terminates, and commits or aborts.
• Therefore, the recovery manager keeps track of the following operations:
o begin_transaction: This marks the beginning of transaction execution.
o read or write: These specify read or write operations on the database items that are
executed as part of a transaction.
o end_transaction: This specifies that read and write transaction operations have ended
and marks the end of transaction execution.
o commit_transaction: This signals a successful end of the transaction so that any
changes (updates) executed by the transaction can be safely committed to the
database and will not be undone.
o rollback (or abort): This signals that the transaction has ended unsuccessfully; so that
any changes or effects that the transaction may have applied to the database must be
undone.
Transaction should possess several properties, often called the ACID properties. They should be
enforced by the concurrency control and recovery methods of the DBMS. The following are the ACID
properties:
1. Atomicity: A transaction is an atomic unit of processing. It is either performed in its entirety or not
performed at all.
4. Durability or permanency: The changes applied to the database by a committed transaction must
persist in the database. These changes must not be lost because of any failure.
Write about various Concurrency Control Techniques and different types of Locks.
Concurrency Control Technique: Some of the main techniques used to control concurrent execution
of transactions are based on the concept of locking data items.
A lock is a variable associated with a data item that describes the status of the item with respect to
possible operations that can be applied to it.
Generally, there is one lock for each data item in the database. Locks are used as a means of
synchronizing the access by concurrency transactions to the database items.
Types of locks: Several types of locks are used in concurrency control such as binary locks and
shared/exclusive locks.
• Binary Locks: A binary lock can have two states or values: locked and unlocked (or 1 and 0, for
simplicity).
o A distinct lock is associated with each database item X. If the value of the lock on X is
1, item X cannot be accessed by a database operation that requests the item. If the
value of the lock on X is 0, the item can be accessed when requested. We refer to the
current value (or state) of the lock associated with item X as lock(X).
o Two operations, lock_item and unlock_item, are used with binary locking.
o Lock_item(X): A transaction requests access to an item X by first issuing a lock_item(X)
operation. If LOCK(X) = 1, the transaction is forced to wait. If LOCK(X) = 0, it is set to 1
(the transaction locks the item) and the transaction is allowed to access item X.
o Unlock_item (X): When the transaction is through using the item, it issues an
unlock_item(X) operation, which sets LOCK(X) to 0 (unlocks the item) so that X may be
accessed by other transactions.
o Hence, a binary lock enforces mutual exclusion on the data item; i.e., at a time only
one transaction can hold a lock.
• Shared/Exclusive (or Read/Write) Lock:
o Shared lock: These locks are referred to as read locks.
o If a transaction T has obtained Shared-lock on data item X, then T can read X, but
cannot write X.
o Multiple Shared lock can be placed simultaneously on a data item.
• Deadlocks:
o A deadlock is a condition in which two (or more) transactions in a set are waiting
simultaneously for locks held by some other transaction in the set.
o Neither transaction can continue because each transaction in the set is on a waiting
queue, waiting for one of the other transactions in the set to release the lock on an
item.
o Thus, a deadlock is an impasse that may result when two or more transactions are
each waiting for locks to be released that are held by the other.
o Transactions whose lock requests have been refused are queued until the lock can be
granted.
o A deadlock is also called a circular waiting condition where two transactions are
waiting (directly or indirectly) for each other.
o Thus, in a deadlock, two transactions are mutually excluded from accessing the next
record required to complete their transactions.
o Example: A deadlock exists two transactions A and B exist in the following example:
Transaction A=access data items X and Y Transaction B=access data items Y and X Here,
Transaction-A has acquired lock on X and is waiting to acquire lock on y. While,
Transaction-B has acquired lock on Y and is waiting to acquire lock on X. But, none of
them can execute further.
• Time-Stamp Methods for Concurrency control:
o Timestamp is a unique identifier created by the DBMS to identify the relative starting
time of transaction.
o Typically, timestamp values are assigned in the order in which the transactions are
submitted to the system.
o So, a timestamp can be thought of as the transaction start time.
o Therefore, time stamping is a method of concurrency control in which each
transaction is assigned a transaction timestamp.
• Multiversion concurrency control (MVCC):
o Multi-version protocol aims to reduce the delay for read operations.
o It maintains multiple versions of data items. Whenever a write operation is performed,
the protocol creates a new version of the transaction data to ensure conflict-free and
successful read operations.
o The newly created version contains the following information −
▪ Content − This field contains the data value of that version.
▪ Write_timestamp − This field contains the timestamp of the transaction that
created the new version.
▪ Read_timestamp − This field contains the timestamp of the transaction that
will read the newly created value.
With MVCC, the database can allow multiple transactions to read and write data without
locking the entire database.
• Fewer issues with multiple transactions trying to access the same data
MVCC helps reduce conflicts between transactions accessing the same data.
Since MVCC allows multiple transactions to read data at the same time, it improves the speed
of reading data.
MVCC ensures that data is protected from being changed by other transactions while a
transaction is making changes to it.
Deadlocks occur when two or more transactions are waiting for each other to release a lock,
causing the system to come to a halt. MVCC can reduce the number of these occurrences.
• The database can become bloated with multiple versions of records, which increases its overall
size.
Write about Characterizing schedules based on Serializability.
• Serializability is a concept that is used to ensure that the concurrent execution of multiple
transactions does not result in inconsistencies or conflicts in a database management system.
• In other words, it ensures that the results of concurrent execution of transactions are the same
as if the transactions were executed one at a time in some order.
• This means that if a schedule is serializable, it does not result in any inconsistencies or conflicts
in the database.
Types of Schedules:
There are two types of schedules: serial schedules and concurrent schedules.
A serial schedule is one where all transactions are executed one at a time, and a concurrent schedule
is one where multiple transaction are executed simultaneously.
A schedule is considered to view serializable if it is equivalent to some serial schedule, but the order
of the transactions may be different.
Conflict serializability − A schedule is conflict serializable if it is equivalent to some serial schedule and
does not contain any conflicting operations.
View serializability − A schedule is a view serializable if it is equivalent to some serial schedule, but
the order of the transactions may be different.
Write about Characterizing schedules based on Recoverability.
Recoverability refers to the ability of a system to restore its state in the event of a failure. The
recoverability of a system is directly impacted by the type of schedule that is used.
A serial schedule is considered to be the most recoverable, as there is only one transaction executing
at a time, and it is easy to determine the state of the system at any given point in time.
A parallel schedule is less recoverable than a serial schedule, as it can be more difficult to determine
the state of the system at any given point in time.
A concurrent schedule is the least recoverable, as it can be very difficult to determine the state of the
system at any given point in time.
If any transaction that performs a dirty read operation from an uncommitted transaction and also its
committed operation becomes delayed till the uncommitted transaction is either committed or
rollback such type of schedules is called as Recoverable Schedules.
There are three types of recoverable schedules which are explained below with relevant examples −
• Cascading schedules
• Cascadeless Schedules
• Strict Schedules.
Recoverable schedule:
T1 T2
R(X)
W(X)
W(X)
R(X)
commit
Commit
Here, transaction T2 is reading value written by transaction T1 and the commit of T2 occurs after the
commit of T1. Hence, it is a recoverable schedule.
• Cascading Schedule:
o A cascading schedule is classified as a recoverable schedule.
o A recoverable schedule is basically a schedule in which the commit operation of a
particular transaction that performs read operation is delayed until the uncommitted
transaction either commits or roll backs.
o A cascading rollback is a type of rollback in which if one transaction fails, then it will cause
rollback of other dependent transactions.
o The main disadvantage of cascading rollback is that it can cause CPU time wastage.
T1 T2 T3 T4
Read(A)
Write(A)
Read (A)
Write(A)
Read(A)
Write(A)
Read(A)
Write(A)
Failure
The above transaction is cascading rollback because of T1 failure, T2 is rollback and rollback of T2
causes T3 to rollback and rollback T3 causes the T4 to rollback.
• Cascadeless Schedule:
o When a transaction is not allowed to read data until the last transaction which has
written it is committed or aborted, these types of schedules are called cascadeless
schedules.
T1 T2
R(X)
W(X)
W(X)
commit
T1 T2
R(X)
Commit
Here, the updated value of X is read by transaction T2 only after the commit of transaction T1. Hence,
the schedule is cascadeless schedule.
• Strict Schedule:
T1 T2
R(X)
R(X)
W(X)
commit
W(X)
R(X)
Commit
Here, transaction T2 reads and writes the updated or written value of transaction T1 only after the
transaction T1 commits. Hence, the schedule is strict schedule.