DBMS Solved Paper
DBMS Solved Paper
1. What is DBMS.
DBMS contains information about a particular enterprise
Collection of large interrelated data
Set of programs to create, maintain and access a database.
An environment that is both convenient and efficient to use
Ex. MySQL, Microsoft SQL Server, Oracle, Sybase, Microsoft Access
Models real world enterprise
i. Entities – students, faculty, courses
ii. Relationships between entities – students’ enrollment in courses, faculty teaching courses.
5. Responsibilities of DB A.
DBA (Database Administrator) is responsible for
i. Schema Definition: Creates original database schema by executing a set of data definition statements in
DDL.
ii. Storage structure and access method definition
iii. Security and Authorization: responsible for ensuring that unauthorized data access is not permitted
iv. Data Availability and recovery from failures: must ensure that if the system fails, user can continue to
access as much of the uncorrupted data as possible
v. Database Tuning: responsible for modifying the database
vi. Maintenance: Periodic back-up, ensuring performance, Upgrading disk space if required.
Advantages
– Network Model is able to model complex relationships
– Can handle most situations for modeling using record types and relationship types
– Flexible pointer
Disadvantages
– Navigational and procedural nature of processing
– Restructuring can be difficult
Advantages
– Simple to construct and operate on
– Easy to search and add new branches
– Corresponds to a number of natural hierarchically organized domains - e.g., personnel organization in
companies
Disadvantages
– Deletion of parent from the database also deletes all the child nodes
– Difficult to depict all types of relationships
Advantages
Conceptual simplicity
Design, implementation, maintenance and usage ease
Flexibility
Complex query
Security
Disadvantages
Hardware overheads
Ease of design leads to bad design
2. Differentiate between
i. Generalization – Specialization
Specialization
Process of identifying subsets of an entity set that share some distinguishing characteristics
Consider an entity set Person {name, street, city}. A Person may be – Customer, Employee
Additional attributes – Customer {credit_rating}, Employee {salary}
The ISA relationship is also referred to as a superclass-subclass relationship.
It is a top-down design approach
An entity of a subclass inherits all attributes of the superclass
Generalization
Process of identifying some common characteristics of a collection of entity sets and creating a new entity set
that contains entities possessing these common characteristics
Used to express similarities between entity sets
Subclasses are defined before the superclass
Bottom up approach
Specialization and generalization are simple inversions of each other
iv. ER model
Describes data involved in a real-world enterprise in terms of objects and relationships among these
objects
Defines the conceptual view of a database.
Used to develop an initial database design
The basic things contained in an ER Model are-
i. Entity
ii. Attributes
iii. Relationship
v. Association
An association is a relationship among two or more entities.
Ex. Instructor Crick instructs Tanaka. Relationship instructs has Student and Instructor as the
participating entity sets.
Set of relationships of the same type is known as relationship set
If E1, E2, ………..En are entity sets, then a relationship set R is a subset of
{ (e1, e2,…….en) | e1 Є E1 , e2 Є E2 , …en Є En }
where (e1, e2,…….en) is a relationship and entity sets E1, E2, ………..En participate in the relation
Module 3
1. Relational data model – advantages/ disadvantages
Data is organized in two-dimensional tables called relations. Table (collection of table) is used to represent data and
relationships among those data. Each row in a relation contains a unique value. Each column in a relation contains
values from a same domain.
Properties of a relational model
Each relation (or table) in a database has a unique name
Each row is unique
Each relation is defined to be a set of unique tuples
Each attribute within a table has a unique name
For all relations, the domain of all attributes should be atomic
The sequence of attributes (left to right) is insignificant
The sequence of tuples is insignificant
Advantages
Conceptual simplicity
Design, implementation, maintenance and usage ease
Flexibility
Complex query
Security
Disadvantages
Hardware overheads
Ease of design leads to bad design
Module 4
1. Explain hash based indexing. Discuss use of hash function in identifying a bucket to search.
Hashing allow us to avoid accessing an index structure. Hashing provides a way of constructing indices.
Static hashing
A bucket is a unit of storage containing one or more records. Let K denote set of all search-key values and B denote set of
all bucket addresses. Hash function h is a function from the set of all search-key values K to the set of all bucket addresses
B. Hash function is used to locate records for access, insertion as well as deletion. To insert a record with search key value
Ki, compute h(Ki), which gives the address of the bucket for that record. If there is space in that bucket to store the record,
the record is stored in that bucket. Records with different search-key values may be mapped to the same bucket. Thus entire
bucket has to be searched sequentially to locate a record. Suppose two search-key value k5 and k7 have the same hash
value; h(k5) = h(k7). In a hash file organization we obtain the bucket of a record directly from its search-key value using a
hash function
An ideal hash function distributes the stored keys uniformly across all buckets, so that every bucket has the same number of
records. Worst hash function maps all search-key values to the same bucket. Choose a hash function that assigns search-key
value to bucket with following qualities:
An ideal hash function is uniform, i.e., each bucket is assigned the same number of search-key values from the set
of all possible values
Ideal hash function is random, so each bucket will have the same number of records assigned to it irrespective of the
actual distribution of search-key values in the file
Example of Hash File Organization
▪ Hash Function :
▪ Sum of binary representation of characters of key
▪ Sum % number of buckets
▪ 10 buckets
▪ Binary representation of the ith character is assumed to be integer i.
▪ Result of hash function. Ex
▪ h(Perryridge) = 5
▪ h(Round Hill) = 3 h(Brighton) = 3
▪ Hashing can be used not only for file organization, but also for index-structure creation
▪ A hash index organizes the search keys, with their associated record pointers, into a hash file structure
▪ Hash index is constructed as follows:
▪ Apply a hash function on the search key to identify a bucket
▪ Store the key and its associated pointers in the bucket
2. Differentiate-
i. B & B+ tree
B+ tree
▪ A B+-tree is a balanced rooted tree of order n if it satisfies following properties:
▪ All paths from root to leaf are of the same length
▪ Each node that is not a root or a leaf has between n/2 and n children.
▪ Each node, including leaf nodes (except root) has between (n–1)/2 and n–1 values
▪ Special cases:
▪ If the root is not a leaf, it has at least 2 children.
▪ If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n–1) values.
▪ Contains less number of levels.
▪ B+-Tree Node Structure
Unclustered Index
▪ An index whose search key specifies an order different from the sequential order of the file.
▪ A non-clustered index has no effect on which the order of the rows will be stored.
▪ A table can have multiple non-clustered indexes.
▪ Also called Secondary index.
▪ Has to be dense.
Sparse Index
▪ It contains index records for only some search-key values
▪ Used when records are sequentially ordered on search-key, i.e. with clustering index.
▪ To locate a record with search-key value K we:
– Find index record with largest search-key value < = K
– Search file sequentially starting at the record to which the index record points
3. What is an index on a file of records? What is a search key for an index? Why do we need indexes?
Indexing mechanisms used to speed up random access to desired data
An index record consists of a search-key value & pointers to one or more records with that value as their search-key
value
Search Key - attribute or set of attributes used to look up records in a file. (Not same as primary/ candidate key.)
Pointer consists of identifier of a disk block & an offset within the disk block to identify record within the block
An index file consists of records (called index entries) of the form
Index files are typically much smaller than the original file
Two basic kinds of indices:
o Ordered indices: search keys are stored in sorted order
o Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”
Example
Module V
Normalisation - Advantages
Normalisation - Disadvantages
• Reduction in efficiency of certain data retrieval as relations may be joined during retrieval.
- Increase join
A relation is in First Normal Form (1NF) if ALL its attributes are ATOMIC.
A relation is in 2NF if it is in 1NF and every non-key attribute is dependent on the whole key
3) Optimise relations
A relation is in 3NF if it is in 2NF and each non-key attribute is only dependent on the whole key, and not
dependent on any non-key attribute. i.e. no transitive dependencies
3NF - Actions Required
1) Check each non-key attribute for dependency against other non-key fields
3) Create new relation comprising the attribute and non-key attribute which it depends on
5) Optimise
a) It is in BCNF
Multivalued Dependencies
Suppose that we have a relation with attributes course, teacher, and book, which we denote as CTB.
There are no FDs; the key is CTB. Recommended texts for a course are independent of the instructor.
The relation schema CTB is in BCNF; thus we would not consider decomposing it further if we looked only at
the FDs that hold over CTB.
There is redundancy. The fact that Green can teach Physics101 is recorded once per recommended text for the
course. Similarly, the fact that Optics is a text for Physics101 is recorded once per potential teacher.
The redundancy in this example is due to the constraint that the Books for a Course are independent of the
Teachers, which cannot be expressed in terms of FDs.
multivalued dependency XY is said to hold over R if, each X value is associated with a set of Y
values and this set is independent of the values in the other attributes.
Let R be a relation schema, X and Y be nonempty subsets of the attributes of R, and F be a set of
dependencies that includes both FDs and MVDs.
R is said to be in fourth normal form (4NF) if for every MVD X Y that holds over R, one of the
following statements is true:
Join Dependencies
A join dependency (JD) {R1, …. ,Rn} is said to hold over a relation R if R1, …. ,Rn is a lossless-join
decomposition of R.
Unlike FDs and MVDs, there is no set of sound and complete inference rules for JDs.
A relation R is said to be in 5NF or PJNF if & only if, every join dependency that holds on R is implied by
candidate keys of R
DECOMPOSITION
A decomposition of a relation schema R consists of replacing the relation schema by two (or more) relation
schemas that each contain a subset of the attributes of R and together include all attributes in R
Intuitively, we want to store the information in any given instance of R by storing projections of the instance
Lossless Join Decomposition
Dependency-Preserving Decomposition
Functional dependency: Attribute B has a functional dependency on attribute A if, for each value of attribute A,
there is exactly one value of attribute B. For example, Employee Address has a functional dependency on Employee
ID, because a particular Employee Address value corresponds to every Employee ID value. An attribute may be
functionally dependent either on a single attribute or on a combination of attributes.
• Or in other words, An attribute B is said to be functionally dependent on attribute A if, every value of A
uniquely determines the value of B. In this case, attribute A is determinant of B
Trivial functional dependency: A trivial functional dependency is a functional dependency of an attribute on a
superset of itself. {Employee ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address}
→ {Employee Address}.
Full functional dependency: An attribute is fully functionally dependent on a set of attributes X if it is
a) functionally dependent on X, and
b) not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on
{Employee ID, Skill}, but not a full functional dependency, for it is also dependent on {Employee ID}.
Transitive dependency: A transitive dependency is an indirect functional dependency, one in which X→Z only by
virtue of X→Y and Y→Z.
Multivalued dependency: A multivalued dependency is a constraint according to which the presence of certain
rows in a table implies the presence of certain other rows. Or in other words, a multivalued dependency is said to be
existing between A and B if for each value of attribute A, there is one or more associated value of attribute B.
Join dependency: A table T is subject to a join dependency if T can always be recreated by joining multiple tables
each having a subset of the attributes of T.
Non-prime attribute: A non-prime attribute is an attribute that does not occur in any candidate key. Employee
Address would be a non-prime attribute in the "Employees' Skills" table.
Given a set F set of functional dependencies, there are certain other functional dependencies that are logically
implied by F.
1. E.g. If A B and B C, then we can infer that A C
The set of all functional dependencies logically implied by F is the closure of F.
We denote the closure of F by F+.
We can find all of F+ by applying Armstrong’s Axioms:
1. if , then (reflexivity)
2. if , then (augmentation)
3. if , and , then (transitivity)
These rules are
sound (generate only functional dependencies that actually hold) and
complete (generate all functional dependencies that hold).
8. Pseudo transitivity rule. If a→ ß holds and γß→ d holds, then aγ → d holds.
Module VI
Query optimization is a function of many RDBMS in which multiple query plans for satisfying a query are
examined and a good query plan is identified
This may or not be the absolute best strategy because there are many ways of doing plans
There is a trade-off between the amount of time spent figuring out the best plan and the amount running the
plan
Cost based query optimizers evaluate the resource footprint of various query plans and use this as the basis for
plan selection
Typically the resources which adds to cost are CPU path length, amount of disk buffer space, disk storage
service time, and interconnect usage between units of parallelism
The set of query plans examined is formed by examining possible access
paths (e.g., primary index access, secondary index access, full file scan) and various relational table join
techniques (e.g., merge join, hash join, product join)
The search space can become quite large depending on the complexity of the SQL query
There are two types of optimization. These consist of logical optimization which generates a sequence of
relational algebra to solve the query
In addition there is physical optimization which is used to determine the means of carrying out each operation
A query evaluation plan (or simply plan) consists of an extended relational algebra tree, with additional
annotations at each node indicating the access methods to use for each table and the implementation method
to use for each relational operator
Consider the following SQL query:
• To obtain a fully specified evaluation plan, we must decide on an implementation for each of the algebra
operations involved
• We can use a page-oriented simple nested loops join with Reserves as the outer table and apply selections
and projections to each tuple in the result of the join as it is produced
• The result of the join before the selections and projections is never stored in its entirety
Basic Steps in Query Processing: Evaluation of query
Relational Optimization
For each enumerated plan, we have to estimate its cost
There are two parts to estimating the cost of an evaluation plan for a query block:
1. For each node in the tree, we must estimate the cost of performing the corresponding operation. Costs are
affected significantly by whether pipelining is used or temporary relations are created to pass the output of an
operator to its parent.
2. For each node in the tree, we must estimate the size of the result and whether it is sorted. This result is the
input for the operation that corresponds to the parent of the current node, and the size and sort order in turn
affect the estimation of size, cost, and sort order for the parent
As we see, estimating costs requires knowledge of various parameters of the input relations, such as the
number of pages and available indexes
Such statistics are maintained in the DBMS's system
We use the number of page l/Os as the metric of cost and ignore issues such as blocked access, for the sake of
simplicity
Module VII
Transaction Concept
A transaction is a unit of program execution that accesses and possibly updates various data items.
Transactions access data using two operations:
read(X), which transfers the data item X from the database to a
local buffer belonging to the transaction that executed the read operation.
write(X), which transfers the data item X from the local buffer of
the transaction that executed the write back to the database.
E.g. transaction to transfer $50 from account A to account B:
read(A)
A := A – 50
write(A)
read(B)
B := B + 50
write(B)
ACID properties
Consistency requirement in above money transfer example:
the sum of A and B is unchanged by the execution of the transaction
Without the consistency requirement, money could be created or destroyed by the transaction
In general, consistency requirements include
primary keys and foreign keys
Implicit integrity constraints
e.g. sum of balances of all accounts, minus sum of loan amounts must equal value of
cash-in-hand
A transaction must see a consistent database.
During transaction execution the database may be temporarily inconsistent.
When the transaction completes successfully the database must be consistent
Erroneous transaction logic can lead to inconsistency
Atomicity requirement
if the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent
database state
Thus, if the transaction never started or was guaranteed to complete, such an inconsistent state would
not be visible except during the execution of the transaction
the system should ensure that updates of a partially executed transaction are not reflected in the
database
Durability requirement — once the user has been notified that the transaction has completed (i.e.,
the transfer of the $50 has taken place), the updates to the database by the transaction must persist
even if there are software or hardware failures.
We can guarantee durability by ensuring that either
1. The updates carried out by the transaction have been written to disk before the transaction
completes.
2. Information about the updates carried out by the transaction and written to disk is sufficient to
enable the database to reconstruct the updates when the database system is restarted after the failure.
Isolation requirement — if between steps 3 and 6, another transaction T2 is allowed to access the partially
updated database, it will see an inconsistent database (the sum A + B will be less than it should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
Isolation can be ensured trivially by running transactions serially
that is, one after the other.
However, executing multiple transactions concurrently has significant benefits, as we will see later.
Schedules
Schedule – a sequences of instructions that specify the chronological order in which instructions of concurrent
transactions are executed
a schedule for a set of transactions must consist of all instructions of those transactions
must preserve the order in which the instructions appear in each individual transaction.
A transaction that successfully completes its execution will have a commit instructions as the last statement
(will be omitted if it is obvious)
A transaction that fails to successfully complete its execution will have an abort instructions as the last
statement (will be omitted if it is obvious)
Serial schedule
Each serial schedule consists of a sequence of instructions from various transactions, where the
instructions belonging to one single transaction appear together in that schedule.
Thus, for a set of n transactions, there exist n! different valid serial schedules.
Concurrent schedule
When the database system executes several transactions concurrently, the corresponding schedule no
longer needs to be serial.
Several execution sequences are possible, since the various instructions from both transactions may
now be interleaved. In general, it is not possible to predict exactly how many instructions of a
transaction will be executed before the CPU switches to another transaction. Thus, the number of
possible schedules for a set of n transactions is much larger than n!.
Serializability
A serializable schedule over a set S of committed transactions is a schedule whose effect on any consistent
database instance is identical to that of some complete serial schedule over S
The database instance that results from executing the given schedule is identical to the database instance that
results from executing the transactions in some serial order
Basic Assumption – Each transaction preserves database consistency.
Thus serial execution of a set of transactions preserves database consistency.
A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule
equivalence give rise to the notions of:
1. conflict serializability
2. view serializability
If S is a schedule in which there are 2 consecutive Instructions li and lj of transactions Ti and Tj respectively, then li
and lj will conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these
instructions is write (Q).
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a (logical) temporal order between them.
If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if
they had been interchanged in the schedule.
View Serializability
Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the following
three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti must, in
schedule S´, also read the initial value of Q.
2. For each data item Q, if transaction Ti executes read(Q) in schedule S, and if that value was produced by a
write(Q) operation executed by transaction Tj , then the read(Q) operation of transaction Ti must, in schedule S, also
read the value of Q that as produced by the same write(Q) operation of transaction Tj
3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must
perform the final write(Q) operation in schedule S´.
As can be seen, view equivalence is also based purely on reads and writes alone.
Conditions 1 and 2 ensure that each transaction reads the same values in both schedules and, therefore,
performs the same computation
Condition 3, coupled with conditions 1 and 2, ensures that both schedules result in the same final
system state
Schedule 1 is not view equivalent to schedule 2, since, in schedule 1, the value of account A read by
transaction T2 was produced by T1,whereas this case does not hold in schedule 2
However, schedule 1 is view equivalent to schedule 3, because the values of account A and B read by
transaction T2 were produced by T1 in both schedules.
Concurrency Control
Lock-Based Protocols
n One way to ensure serializability is to require that data items be accessed in a mutually exclusive manner; that
is, while one transaction is accessing a data item, no other transaction can modify that data item
n The most common method used to implement this requirement is to allow a transaction to access a data item
only if it is currently holding a lock on that item
n A lock is a mechanism to control concurrent access to a data item
n Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X
instruction.
2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction.
n Lock requests are made to concurrency-control manager. Transaction can proceed only after request is
granted.
n We require that every transaction request a lock in an appropriate mode on data item Q, depending on the
types of operations that it will perform on Q
n The transaction makes the request to the concurrency-control manager. The transaction can proceed with the
operation only after the concurrency-control manager grants the lock to the transaction
n Given a set of lock modes, we can define a compatibility function on them as follows: Let A and B represent
arbitrary lock modes
n Suppose that a transaction Ti requests a lock of mode A on item Q on which transaction Tj (Ti != Tj )
currently holds a lock of mode B
n If transaction Ti can be granted a lock on Q immediately, in spite of the presence of the mode B lock, then we
say mode A is compatible with mode B. Such a function can be represented conveniently by a matrix
n An element comp(A, B) of the matrix has the value true if and only if mode A is compatible with mode B
Lock-compatibility matrix
n A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on
the item by other transactions
n Any number of transactions can hold shared locks on an item,
l but if any transaction holds an exclusive on the item no other transaction may hold any lock on the
item.
n If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other
transactions have been released. The lock is then granted.
n Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified protocol called
strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits/aborts.
n This requirement ensures that any data written by an uncommitted transaction are locked in exclusive mode
until the transaction commits preventing any other transaction from reading the data
n Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol
transactions can be serialized in the order in which they commit
n With rigorous two-phase locking, transactions can be serialized in the order in which they commit
Lock Conversions
– First Phase:
– Second Phase:
n This protocol assures serializability. But still relies on the programmer to insert the various locking
instructions.
Deadlock Handling
write(Y) write(X)
l Deadlock prevention ensures that system will never enter a deadlock state.
l Deadlock detection & recovery scheme allows the system to enter a deadlock state, & then try to
recover by using detection & recovery scheme.
Deadlock prevention protocols ensure that the system will never enter into a deadlock state.
n Prevention is used if the probability that system enters a deadlock state is relatively high, otherwise detection
& recovery are more efficient.
l Require that each transaction locks all its data items before it begins execution (predeclaration). i.e.
either all data items are locked in one step or none are locked.
n Hard to predict
n Data item utilization may be very low, as locked data items may be unused for a long time
l Impose ordering of all data items and require that a transaction can lock data items only in a sequence
consistent with the ordering. (Graph based protocols)
l A variation of the above approach is to use a total order of data items, in conjunction with two-phase
locking. Once a transaction has locked a data item, it cannot request locks on items that precede that
item in the ordering.
l Another approach is to use preemption & transaction rollbacks. It uses transaction timestamps to
control preemption. Two different deadlock prevention schemes using timestamps have been
proposed:
n Wait-die
n Wound-wait
1. wait-die scheme — non-preemptive
– older transaction may wait for younger one to release data item. Younger transactions
never wait for older ones; they are rolled back instead.
– a transaction may die several times before acquiring needed data item
2. wound-wait scheme — preemptive
– older transaction wounds (forces rollback) of younger transaction instead of waiting for
it. Younger transactions may wait for older ones.
– may be fewer rollbacks than wait-die scheme.
NOTE : Both in wait-die and in wound-wait schemes, a rolled back transactions is restarted with its original
timestamp. Older transactions thus have precedence over newer ones, and starvation is hence avoided.
Deadlock Detection
n If a system does not employ some protocol that ensures deadlock freedom, then a detection and recovery
scheme must be used
n An algorithm that examines the state of the system is invoked periodically to determine whether a deadlock
has occurred
n If one has, then the system must attempt to recover from the deadlock
n To do so, the system must:
• Maintain information about the current allocation of data items to transactions, as well as any outstanding data item
requests.
• Provide an algorithm that uses this information to determine whether the system has entered a deadlock state.
• Recover from the deadlock when the detection algorithm determines that a deadlock exists
n To ensure atomicity despite failures, we first output information describing the modifications to stable storage
without modifying the database itself.
n We study two approaches:
l log-based recovery, and
l shadow-paging
n We assume (initially) that transactions run serially, that is, one after the other.
l The log is a sequence of log records, and maintains a record of update activities on the database.
n Before Ti executes write(X), a log record <Ti, X, V1, V2> is written, where V1 is the value of X before the
write, and V2 is the value to be written to X.
l Log record notes that Ti has performed a write on data item Xj Xj had value V1 before the write, and
will have value V2 after the write.
n When Ti finishes it last statement, the log record <Ti commit> is written
n Whenever a transaction performs a write, it is essential that the log record for that write be created before the
database is modified. Once a log record exists, we can output the modification to the database if that is
desirable. Also, we have the ability to undo a modification that has already been output to the database. We
undo it by using the old-value field in log records Two approaches using logs
n The deferred database modification scheme records all modifications to the log, but defers all the writes to
after partial commit.
n A write(X) operation results in a log record <Ti, X, V> being written, where V is the new value for X
n Finally, the log records are read and used to actually execute the previously deferred writes.
n redo(Ti) sets the value of all data items updated by transaction Ti to the new
values .The set of data items updated by Ti and their respective new values can be found in the log
n The redo operation must be idempotent; that is, executing it several times must be equivalent to executing
it once
n This characteristic is required if we are to guarantee correct behavior even if a failure occurs during the
recovery process
n After a failure, the recovery subsystem consults the log to determine which transactions need to be redone.
Transaction Ti needs to be redone if and only if the log contains both the record <Ti start> and the record <Ti
commit>
n The immediate-modification technique allows database modifications to be output to the database while the
transaction is still in the active state
n In the event of a crash or a transaction failure, the system must use the old-value field of the log records
n Before a transaction Ti starts its execution, the system writes the record <Ti start> to the log
n During its execution, any write(X) operation by Ti is preceded by the writing of the appropriate new update
record to the log
n When Ti partially commits, the system writes the record <Ti commit> to the log
n As the information in the log is used in reconstructing the state of the database, we cannot allow the actual
update to the database to take place before the corresponding log record is written out to stable storage
n So it is required that, before execution of an output(B) operation, the log records corresponding to B be
written onto stable storage
n Undo (Ti) restores the value of all data items updated by Ti to their old values, going backwards from
the last log record for Ti
n Redo (Ti) sets the value of all data items updated by Ti to the new values, going forward from the first
log record for Ti
n That is, even if the operation is executed multiple times the effect is the same as if it is executed once
n Transaction Ti needs to be redone if the log contains both the record <Ti start> and the record <Ti
commit>.