0% found this document useful (0 votes)
14 views95 pages

Database Schema Design and Refinement

This document covers the design and refinement of relational database schemas, focusing on schema design problems, redundancy, anomalies, and normalization forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF). It emphasizes the importance of functional dependencies, transaction processing, and guidelines for creating effective relational schemas. The document also discusses the implications of poor design, including insertion, deletion, and update anomalies, and the significance of minimizing NULL values and ensuring lossless joins.

Uploaded by

likithmr8277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views95 pages

Database Schema Design and Refinement

This document covers the design and refinement of relational database schemas, focusing on schema design problems, redundancy, anomalies, and normalization forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF). It emphasizes the importance of functional dependencies, transaction processing, and guidelines for creating effective relational schemas. The document also discusses the implications of poor design, including insertion, deletion, and update anomalies, and the significance of minimizing NULL values and ensuring lossless joins.

Uploaded by

likithmr8277
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Relational

Database Management Systems

Module 3
Designing and Refining Database
Schema

Presidency University, Bengaluru


Topics:-
Schema Design:
• Problems in schema design,
• Redundancy and anomalies.
Schema Refinement:
• Functional Dependancies
• Normal Forms based on Primary Keys-(1NF, 2NF, 3NF),
• Boyce-Codd Normal Form,
• Multi valued Dependency (Fourth Normal Form),
• Join Dependencies (Fifth Normal Form),
Fundamentals of Transaction:
 Introduction to Transaction Processing
 Transaction and System concepts,
 Desirable properties (ACID) of Transactions.

2
Introduction
Each Relation schema consists of a number of attributes.
The Relational Database schema consists of a number of relation schemas.
What is relational database design?
• The grouping of attributes to form "good" relation schemas
• Produces set of relations.

Two levels of relation schemas


 The logical level-users interpret the relation schemas and the meaning of their
attributes.
 The storage level (Implementation level) – how the tuples in a base relation are
stored and updated.

3
[Link] Design Guidelines for Relation Schemas
Used as measures to determine the quality of relation schema design
Making sure attribute semantics are clear
Reducing redundant information in tuples
Reducing NULL values in tuples
Disallowing possibility of generating spurious tuples

4
1.1 Semantics to Attributes in Relations

Semantics of a relation Attributes : Informally, each tuple in a relation should


represent one entity or relationship instance.
• Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not
be mixed in the same relation
• Only foreign keys should be used to refer to other entities
• Entity and relationship attributes should be kept apart as much as possible

5
Bottom Line: Design a schema that can be explained easily
relation by relation. The semantics of attributes should be easy to
interpret.

6
Guideline 1
Design relation schema so that it is easy to explain its meaning
Do not combine attributes from multiple entity types and relationship types into a
single relation
Example of violating Guideline 1: Figure 15.3

7
Redundancy and Anomaly
• If a table is not properly normalized and have data redundancy then
• it will not only occupy extra memory space
• but will also make it difficult to handle and update the database, without facing
data loss.

• Anomalies are very frequent if a database is not normalized or when there


is redundancy in a relation.

8
Student table

• In the table above, we have data of 4 CSE students .


• Data for the fields branch, hod and office_tel is repeated for the students who are in the same branch in the college, this is Data
Redundancy

9
1.2 Redundant Information in Tuples and Update Anomalies

Mixing attributes of multiple entities may cause problems


Information is stored redundantly
• wastes storage
• Problems with update anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies

10
Insertion Anomaly
• An insertion anomaly is the inability to insert data to the database due to the
absence of other data or attributes.

Ex:
Consider the relation:
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Insert Anomaly: Cannot insert a project unless an employee is assigned to it.
Inversely - Cannot insert an employee unless he/she is assigned to a project.

11
12
Deletion Anomaly
• A deletion anomaly is the unintended loss of data due to deletion of other data.
Ex: Consider the relation:
EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)
Delete Anomaly:
• When a project is deleted, it will result in deleting all the employees who work on that project.
• Alternately, if an employee is the sole employee on a project, deleting that employee would
result in deleting the corresponding project

13
Updation Anomaly
• These anomalies occur when modifying data in a database and can result in inconsistencies
or errors.

(or)

• This occurs when the same data items are repeated with the same values and are not linked
to each other.

Ex: Consider the relation:


EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)
Update Anomaly: Changing the name of project number 1 from “ProductX” to “Billing”
may cause this update to be made for all Other employees working on project 1.
14
Two relation schemas suffering from update anomalies

15
16
Guideline 2

• Design a schema that does not suffer from the insertion, deletion and
update anomalies.
•If there are any anomalies present, then note them so that applications
can be made to take them into account.

17
1.3 Null Values in Tuples
GUIDELINE 3:
• Relations should be designed such that their tuples will have as few NULL values as
possible
• Attributes that are NULL frequently could be placed in separate relations (with the
primary key)
Reasons for nulls:
• attribute not applicable or invalid
• attribute value unknown (may exist)
• value known to exist, but unavailable

18
1.4 Generation of Spurious Tuples
•Bad designs for a relational database may result in erroneous results for
certain JOIN operations
• The "lossless join" property is used to guarantee meaningful results for join
operations
GUIDELINE 4:
•The relations should be designed to satisfy the lossless join condition.
• No spurious tuples should be generated by doing a natural join of any
relations.

19
20
21
22
Functional

Dependencies
Are used to specify formal measures of the "goodness" of relational designs
• And keys are used to define normal forms for relations
• Are constraints that are derived from the meaning and interrelationships of the data attributes
• It is denoted as X → Y, where X is a set of attributes that is capable of determining the value
of Y.
X - Determinant, Y - Dependent
A set of attributes X functionally determines a set of attributes Y if the value of X determines a
unique value for Y
• X -> Y holds if whenever two tuples have the same value for X, they must have
the same value for Y.
• For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y]
• X -> Y in R specifies a constraint on all relation instances r(R)
Examples of FD constraints
• Social security number determines employee name
• SSN -> ENAME
• Project number determines project name and location
• PNUMBER -> {PNAME, PLOCATION}
• Employee SSN and project number determines the hours per week that the employee works
on the project
• {SSN, PNUMBER} -> HOURS

24
• A FD is a property of the attributes in the schema R
• The constraint must hold on every relation instance r(R)
• If K is a key of R, then K functionally determines all attributes in R
(since we never have two distinct tuples with t1[K]=t2[K])
X Y
1 1
X->Y
2 1
If t1.x=t2.x
3 2
Then t1.y=t2.y
4 3
5 5

25
[Link] NAME MARKS DEPT COURSE
1 A 78 CS C1
2 B 60 EE C1
3 A 78 CS C2
4 B 60 EE C3
5 C 80 IT C3
6 D 80 EC C2

[Link] ->NAME
NAME ->[Link]
[Link]->MARKS
DEPT->COURSE
NAME,MARKS->DEPT
NAME,MARKS->DEPT,COURSE
Name,Marks->Marks

26
Practice
[Link] ->NAME, MARKS
DEPT ,COURSE->NAME
[Link],MARKS->DEPT
NAME->COURSE
NAME,MARKS,DEPT->[Link]

27
Inference Rules for FDs

• Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F
hold
Armstrong's inference rules:
– IR1. (Reflexive) If Y subset-of X, then X -> Y
– IR2. (Augmentation) If X -> Y, then XZ -> YZ
(Notation: XZ stands for X U Z)
– IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z
• IR1, IR2, IR3 form a sound and complete set of inference rules
– These are rules hold and all other rules that hold can be deduced from these

28
Some additional inference rules that are useful:
(Decomposition) If X -> YZ, then X -> Y and X -> Z
(Union) If X -> Y and X -> Z, then X -> YZ
(Psuedotransitivity) If X -> Y and WY -> Z, then WX -> Z
• The last three inference rules, as well as any other inference rules,
can be deduced from IR1, IR2, and IR3 (completeness property)

29
• Closure of a set F of FDs is the set F+ of all FDs that can be inferred from
F

• Closure of a set of attributes X with respect to F is the set X +


of all
attributes that are functionally determined by X

• X + can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs
in F

30
Definitions of Keys and Attributes
Participating
• A superkey of a relation schema R = {A1, A2, ...., An}in
is aKeys
set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S]

• A key K is a superkey with the additional property that removal of any attribute from K will
cause K not to be a superkey any more.
• If a relation schema has more than one key, each is called a candidate key.
– One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary
keys.

31
Q1.R(A,B,C,D,E)
A->B
B->C
C->D
D->E
[Link] the closure of A,AD,B
A + ={A,B,C,D,E}
{AD} + ={A,D,B,C,E}
B + ={B,C,D,E}
{CD} + ={C,D,E}

32
2. Find the super keys.

F = {SSN -> ENAME


PNO -> PNAME, PLOC
SSN,PNO -> HRS }

{SSN} + ={SSN,ENAME}
{PNO} + ={PNO,PNAME,PLOC}
{SSN,PNO} + ={SSN,PNO,ENAME,PNAME,PLOC,HRS}

33
Exercis
e 1a relation R(A, B, C, D), with FDs AB -> C, BC -> D, CD -> A.
Consider
• (a) Find the closure of AB.
• (b) Find candidate keys.

Exercise 2
Consider relation R(A,B,C,D,E) with the following functional
dependencies: AB -> C, D -> E, DE -> B.
(a) Find the closure of AB
(b) Find super key.

34
Partial Functional Dependency
• An FD, X->Y is said to be partially FD, if Y can be determined by any of the proper
subset of X.
Ex: AB->C; Partially dependent if C can be determined by A or B
A->C
B->C

35
Fully Functional Dependency
• If X->Y, then Y is said to be fully functional dependency; if Y cannot be determined
by any of the proper subset of X.
Ex: ABC->D; FFD, if D cannot be determined by any of the subset of ABC
BC->D
C->D
A->D
B->D
{Emp_num,Proj_num} -> Hour

36
Normalization
• This is the process which allows you to find and eliminate the
redundant data within your database.
• This involves restructuring the tables to successively meeting
higher forms of Normalization.
• Normalization rules divides larger tables into smaller tables and links them
using relationships
• A properly normalized database should have the following
characteristics
• Scalar values in each fields
• Absence of redundancy.
• Minimal use of null values.
• Minimal loss of information.
Normalization

• Normalization is process of minimising redundancy from a relation or a set of

relations.

• The process of decomposing unsatisfactory "bad" relations by breaking up their

attributes into smaller relations

38
Practical Use of Normal Forms
• Normalization is carried out in practice so that the resulting designs are of high quality and
meet the desirable properties
• The practical utility of these normal forms becomes questionable when the constraints on
which they are based are hard to understand or to detect
• The database designers need not normalize to the highest possible normal form. (usually up
to 3NF, BCNF or 4NF)
• Denormalization: the process of storing the join of higher normal form relations as a base
relation—which is in a lower normal form
25

39
Levels of
• Levels of Normalization
normalization based on the amount of
redundancy in the database.

Redundancy
• Various levels of normalization are:
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)

Number of Tables
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)

Complexity
• Fifth Normal Form (5NF)
• Domain Key Normal Form (DKNF)

Most
Mostdatabases
databasesshould
shouldbe
be3NF
3NFor
orBCNF
BCNFin inorder
orderto
toavoid
avoidthe
thedatabase
database
anomalies.
anomalies.
Levels of
Normalization
1NF
2NF
3NF
4NF
5NF
DKNF

Each
Eachhigher
higherlevel
levelisisaasubset
subsetofofthe
thelower
lowerlevel
level
Definitions of Keys and Attributes
Participating in Keys
• A Prime attribute must be a member of some candidate key
• A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate
key.

42
First Normal Form
(1NF)
A table is considered to be in 1NF if all the fields contain only scalar values
(as opposed to list of values).
i.e., Only attribute values permitted are single atomic (or indivisible) values
Example (Not 1NF)
ISBN Title AuName AuPhone PubName PubPhone Price

0-321-32132-1 Balloon Sleepy, 321-321-1111, Small House 714-000-0000 $34.00


Snoopy, 232-234-1234,
Grumpy 665-235-6532

0-55-123456-9 Main Street Jones, 123-333-3333, Small House 714-000-0000 $22.95


Smith 654-223-3455
0-123-45678-0 Ulysses Joyce 666-666-6666 Alpha Press 999-999-9999 $34.00

1-22-233700-0 Visual Basic Roman 444-444-4444 Big House 123-456-7890 $25.00

Author
Authorand
andAuPhone
AuPhonecolumns
columnsare
arenot
notscalar
scalar
First Normal Form
• Disallows
• composite attributes
• multivalued attributes
• nested relations; attributes whose values for an individual tuple are non-atomic
• Only attribute values permitted are single atomic (or indivisible) values
• Techniques to achieve first normal form
– Remove attribute and place in separate relation
– Expand the key
– Use several atomic attributes

44
1NF -
To change to 1NF: Decomposition
– Remove nested relation attributes into a new relation
– Propagate the primary key into it
– Unnest relation into a set of 1NF relations
Example (1NF) ISBN AuName AuPhone

ISBN Title PubName PubPhone Price 0-321-32132-1 Sleepy 321-321-1111

0-321-32132-1 Balloon Small House 714-000-0000 $34.00 0-321-32132-1 Snoopy 232-234-1234

0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-321-32132-1 Grumpy 665-235-6532

0-123-45678-0 Ulysses Alpha Press 999-999-9999 $34.00 0-55-123456-9 Jones 123-333-3333

1-22-233700-0 Visual Basic Big House 123-456-7890 $25.00 0-55-123456-9 Smith 654-223-3455

0-123-45678-0 Joyce 666-666-6666

1-22-233700-0 Roman 444-444-4444


46
Second Normal Form (2NF)
A relation is said to be in 2NF, if
• The relation is in first normal form
• All non-key (non-prime) attributes in the table must be fully functionally dependent on the
entire primary key; i.e, has no partial dependency

Examples:
{SSN, PNUMBER} -> HOURS is a full FD since neither
SSN -> HOURS nor PNUMBER -> HOURS hold
{SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial
dependency ) since SSN -> ENAME also holds
48
Second Normal Form (2NF)
More Examples
Example (Not 2NF)
Scheme  {City, Street, HouseNumber, HouseColor, CityPopulation}
1. key  {City, Street, HouseNumber}
2. {City, Street, HouseNumber}  {HouseColor}
3. {City}  {CityPopulation}
4. CityPopulation does not belong to any key.
5. CityPopulation is functionally dependent on the City which is a proper subset of the key
2NF
Old Scheme  {City, Street, HouseNumber,
HouseColor, CityPopulation}
New Scheme  {City, Street, HouseNumber,
HouseColor}
New Scheme  {City, CityPopulation}
Transitive Dependency
• A FD, X->Y is said to be transitive, if there exists a set of attribute Z, such that X-
>Z & Z->Y holds
Ex:
X->Z
Z->Y
X->Y

50
Third Normal Form (3NF)
A relation will be in 3NF, if
• The relation is in second normal form
• No attribute is transitively dependent on the primary key

Examples:
 SSN -> DMGRSSN is a transitive FD
Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold
 SSN -> ENAME is non-transitive
Since there is no set of attributes X where SSN -> X and X -> ENAME
Problematic FD X->Y
Left-hand side X is part of primary key (violates 2NF)
Left-hand side X is a nonkey attribute (violates 3NF)
NOTE:
•In X -> Y and Y -> Z, with X as the primary key, we consider
this a problem only if Y is not a candidate key.
•When Y is a candidate key, there is no problem with the
transitive dependency .
•E.g., Consider EMP (SSN, Emp#, Salary ).
• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

52
A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds
in R, then either:
(a) X is a superkey of R, or
(b) A is a prime attribute of R
NOTE: Boyce-Codd normal form disallows condition (b) above
53
Example (Not in 3NF)

Scheme  {Title, PubID, PageCount, Price }


1. Key  {Title, PubId}

2. {Title, PubId}  {PageCount}

3. {PageCount}  {Price}

4. Both Price and PageCount depend on a key hence 2NF

5. Transitively {Title, PubID}  {Price} hence not in 3NF

54
3NF - Decomposition
1. Move all items involved in transitive dependencies to a
new entity.
2. Identify a primary key for the new entity.
3. Place the primary key for the new entity as a foreign
key on the original entity.
Example 1 (Convert to 3NF)
Old Scheme  {Title, PubID, PageCount, Price }
New Scheme  {PubID, PageCount, Price}
New Scheme  {Title, PubID, PageCount}
56
General Definitions of Second
and Third Normal Forms

57
Boyce-Codd Normal Form
(BCNF)

• BCNF does not allow dependencies between attributes that belong to candidate keys.
• BCNF is a refinement of the third normal form in which it drops the restriction of a non-key
attribute from the 3rd normal form.
• Third normal form and BCNF are not same if the following conditions are true:
• The table has two or more candidate keys
• At least two of the candidate keys are composed of more than one attribute
• The keys are not disjoint i.e. The composite candidate keys share some attributes

• Every relation in BCNF is also in 3NF


– Relation in 3NF is not necessarily in BCNF
59
Example 1 - Address (Not in BCNF)
Scheme  {City, Street, ZipCode }
1. Key1  {City, Street }
2. Key2  {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street}  {ZipCode}
5. {ZipCode}  {City}
6. Dependency between attributes belonging to a key

60
BCNF -
1. Decomposition
Place the two candidate primary keys in separate entities
2. Place each of the remaining data items in one of the resulting entities
according to its dependency on the primary key.
Example 1 (Convert to BCNF)
Old Scheme  {City, Street, ZipCode }
New Scheme1  {ZipCode, Street}
New Scheme2  {City, Street}
• Loss of relation {ZipCode}  {City}
Alternate New Scheme1  {ZipCode, Street }
Alternate New Scheme2  {ZipCode, City}
Decomposition – Loss
of Information
1. If decomposition does not cause any loss of information it is called a
lossless decomposition.
2. If a decomposition does not cause any dependencies to be lost it is called
a dependency-preserving decomposition.
3. Any table scheme can be decomposed in a lossless way into a collection
of smaller schemas that are in BCNF form. However the dependency
preservation is not guaranteed.
4. Any table can be decomposed in a lossless way into 3rd normal form that
also preserves the dependencies.
• 3NF may be better than BCNF in some cases

Use
Useyour
yourown
ownjudgment
judgmentwhen
whendecomposing
decomposingschemas
schemas
Multivalued Dependency
• A multivalued dependency X →>Y specified on relation schema R, where X and Y are both
subsets of R, specifies the following constraint on any relation state r of R: If two tuples t1 and
t2 exist in r such that t1[X] = t2[X], then two tuples t3 and t4 should also exist in r with the
following properties, where we use Z to denote (R– (X ∪ Y)):

■ t3[X] = t4[X] = t1[X] = t2[X].

■ t3[Y] = t1[Y] and t4[Y] = t2[Y].

■ t3[Z] = t2[Z] and t4[Z] = t1[Z].

Note: An MVD X →> Y in R is called a trivial MVD ,if


(a) Y is a subset of X, or (b) X U Y =R 63
Fourth Normal Form

(4NF)
Fourth normal form eliminates independent many-to-one relationships between columns.

• To be in Fourth Normal Form,


• a relation must first be in Boyce-Codd Normal Form.

• a given relation should not have any Multi-valued Dependency (multi-valued attribute).

Definition:

• A relation schema R is in 4 NF with respect to a set of dependencies F (that includes functional dependencies and
multivalued dependencies) if, for every nontrivial multivalued dependency X ->>Y in F+, X is a superkey for R
Multivalued Dependencies and
Fourth Normal Form
(a)The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>> DNAME.
(b)Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and
EMP_DEPENDENTS.

65
Join Dependencies and Fifth
Normal Form
Definition:
A join dependency (JD) denoted by JD( R1, R2,…. Rn), specified on relation schema
R, specifies a constraint on the states r of R

• The constraint states that every legal state r of R should have a


non additive join decomposition into R1,R2,…. Rn that is, for every such r we have

• *(ΠR1(r), ΠR2(r),..., ΠRn(r))=r


Note: an MVD is a special case of a JD where n 2
A join dependency JD( R1,R2,Rn specified on relation schema R is a trivial JD if one of the
relation schemas R i in JD( R1, R2,…. Rn is equal to R
66
Fifth Normal Form (5NF)
• Fifth normal form is satisfied when all tables are broken into as many
tables as possible in order to avoid redundancy. Once it is in fifth normal
form it cannot be broken into smaller relations without changing the facts

or the meaning.

Definition:
A relation schema R is in fifth normal form(5NF) (or Project-Join Normal
Form(PJNF)) with respect to a set F of functional, multivalued, and join
dependencies if,
–for every nontrivial join dependency JD (R1,R2,...,Rn) in F+ (that is,implied by F),
•every Ri is a superkey of R.
68
Domain Key Normal Form
(DKNF)

• The relation is in DKNF when there can

be no insertion or deletion anomalies

in the database.
Exercis
e 1 the
Compute closure of the following set F of functional
dependencies for relation schema R = {A, B, C, D, E}.
A -> BC
CD -> E
B -> D
E -> A
List the candidate keys for R.
Exercise 2
Consider a relation R(A,B,C,D,E) with the following
dependencies:
{AB-> C, CD -> E, DE -> B} List all candidate keys.

70
Exercise 3
R(A,B,C,D) and FDs {AB -> C, C -> D, D -> A}.

(1) List all nontrivial FDs that can be inferred from the given FDs.

(2) Find all candidate keys.

(3) Find all BCNF violations.

(4) Decompose R into relations in BCNF.

(5) What FDs are not preserved by BCNF.

71
Transaction Management

• Introduction to Transaction
Processing
• Transaction and System concepts
• Desirable properties of transactions

Presidency University, Bengaluru


Introduction to Transaction
Processing

Transaction processing systems- are systems with large


databases and hundreds of concurrent users executing database
transactions.
Examples of such systems include
• airline reservations, banking, credit card processing, online
retail purchasing, stock markets, supermarket checkouts, and
many other applications.

Presidency University, Bengaluru


Introduction to Transaction Processing
• A transaction is a unit of program execution that accesses and
possibly updates various data items.
– Read operations (database retrieval, such as SQL SELECT)
– Write operations (modify database, such as SQL INSERT, UPDATE,
DELETE)
• E.g., transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Note: Each execution of a program is a distinct transaction with different
parameters
– Bank transfer program parameters: savings account number, checking
account number, transfer amount

Presidency University, Bengaluru


Introduction to Transaction
Processing (contd.)

• A transaction (set of operations) may be:


– stand-alone, specified in a high level language like SQL
submitted interactively, or
– consist of database operations embedded within a program
(most transactions)
• Transaction boundaries: Begin and End transaction.
– Note: An application program may contain several
transactions separated by Begin and End transaction
boundaries

Presidency University, Bengaluru


Introduction to Transaction
Processing (contd.)

• Transaction Processing Systems: Large multi-user database systems


supporting thousands of concurrent transactions (user processes)
per minute
• Two Modes of Concurrency
– Interleaved processing: concurrent execution of processes is
interleaved in a single CPU
– Parallel processing: processes are concurrently executed in
multiple CPUs
– Basic transaction processing theory assumes interleaved
concurrency

Presidency University, Bengaluru


Presidency University, Bengaluru
Introduction to Transaction
Processing (contd.)
For transaction processing purposes, a simple database model is used:
• A database - collection of named data items
• Granularity (size) of a data item - a field (data item value), a record,
or a whole disk block
– Transaction Processing concepts are independent of
granularity
• Basic operations on an item X:
– read_item(X): Reads a database item named X into a
program variable. To simplify our notation, we assume that
the program variable is also named X.
– write_item(X): Writes the value of program variable X into
the database item named X.

Presidency University, Bengaluru


Introduction to Transaction Processing
(contd.)
READ AND WRITE OPERATIONS:
Basic unit of data transfer from the disk to the computer main memory is one disk
block (or page). A data item X (what is read or written) will usually be the field of some
record in the database, although it may be a larger unit such as a whole record or even
a whole block.
o read_item(X) command includes the following steps:
 Find the address of the disk block that contains item X.
 Copy that disk block into a buffer in main memory (if that disk block is not
already in some main memory buffer).
 Copy item X from the buffer to the program variable named X.
o write_item(X) command includes the following steps:
 Find the address of the disk block that contains item X.
 Copy that disk block into a buffer in main memory (if it is not already in some
main memory buffer).
 Copy item X from the program variable named X into its correct location in
the buffer.
 Store the updated block from the buffer back to disk (either immediately or
at some later point in time).
Presidency University, Bengaluru
ansaction Notation

• Notation focuses on the read and write operations


• Can also write in shorthand notation:
– T1: b1; r1(X); w1(X); r1(Y); w1(Y); e1;
– T2: b2; r2(Y); w2(Y); e2;
• bi and ei specify transaction boundaries (begin and end)
• i specifies a unique transaction identifier (Tid)

Presidency University, Bengaluru


Why Concurrency Control Is Needed

Without Concurrency Control, problems may occur with concurrent


transactions:
• Lost Update Problem (Write-Write Conflict).
Occurs when two transactions update the same data item, but
both read the same original value before update.

Presidency University, Bengaluru


Why Concurrency Control Is Needed

• The Temporary Update (or Dirty Read) Problem.


This occurs when one transaction T1 updates a database item X,
which is accessed (read) by another transaction T2; then T1 fails
for some reason (Figure b); X was (read) by T2 before its value is
changed back (rolled back or UNDONE) after T1 fails.

Presidency University, Bengaluru


Why we need concurrency control (contd.)

• The Incorrect Summary Problem


One transaction is calculating an aggregate summary function on a
number of records (for example, sum (total) of all bank account
balances) while other transactions are updating some of these records
(for example, transferring a large amount between two accounts, see
Figure c); the aggregate function may read some values before they are
updated and others after they are updated.

Presidency University, Bengaluru


Why we need concurrency control (contd.)

• The Unrepeatable Read Problem (Read-Write Conflict)


A transaction T1 may read an item (say, available seats on a flight);
later, T1 may read the same item again and get a different value
because another transaction T2 has updated the item (reserved
seats on the flight) between the two reads by T1

Presidency University, Bengaluru


Why recovery is needed
Causes of transaction failure:
1. A computer failure (system crash): A hardware or software error
occurs during transaction execution. If the hardware crashes, the
contents of the computer’s internal main memory may be lost.
2. A transaction or system error : Some operation in the transaction
may cause it to fail, such as integer overflow or division by zero.
Transaction failure may also occur because of erroneous parameter
values or because of a logical programming error. In addition, the
user may interrupt the transaction during its execution.
3. Local errors or exception conditions detected by the transaction:
• For example, data for the transaction may not be found.
• A condition, such as insufficient account balance in a banking
database, may cause a transaction, such as a fund withdrawal, to
be canceled
• a programmed abort causes the transaction to fail.

Presidency University, Bengaluru


Why recovery is needed (contd.)

4. Concurrency control enforcement: The concurrency control


method may decide to abort the transaction, to be restarted later,
because it violates serializability or because several transactions
are in a state of deadlock.
5. Disk failure: Some disk blocks may lose their data because of a
read or write malfunction or because of a disk read/write head
crash.
6. Physical problems and catastrophes: This refers to an endless list
of problems that includes power or air-conditioning failure, fire,
theft, sabotage, overwriting disks or tapes by mistake, and
mounting of a wrong tape by the operator.

Presidency University, Bengaluru


Transaction and System Concepts

A transaction is an atomic unit of work that is either completed in its


entirety or not done at all. A transaction passes through several states
(Figure 20.4, similar to process states in operating systems).
Transaction states:
• Active state (executing read, write operations)
• Partially committed state (ended but waiting for system checks to
determine success or failure)
• Committed state (transaction succeeded)
• Failed state (transaction failed, must be rolled back)
• Terminated State (transaction leaves system)

Presidency University, Bengaluru


Transaction and System Concepts
(contd.)
DBMS Recovery Manager needs system to keep track of the
following operations (in the system log file):
• begin_transaction: Start of transaction execution.
• read or write: Read or write operations on the database items
that are executed as part of a transaction.
• end_transaction: Specifies end of read and write transaction
operations have ended. System may still have to check whether
the changes (writes) introduced by transaction can be
permanently applied to the database (commit transaction); or
whether the transaction has to be rolled back (abort transaction)
because it violates concurrency control or for some other reason.
• commit_transaction: Signals successful end of transaction; any
changes (writes) executed by transaction can be safely committed
to the database and will not be undone.

Presidency University, Bengaluru


Transaction and System Concepts
(contd.)
Recovery manager keeps track of the following operations (cont.):
• abort_transaction (or rollback): Signals transaction has ended
unsuccessfully; any changes or effects that the transaction may have
applied to the database must be undone.

System operations used during recovery:


• undo(X): Similar to rollback except that it applies to a single write
operation rather than to a whole transaction.
• redo(X): This specifies that a write operation of a committed
transaction must be redone to ensure that it has been applied
permanently to the database on disk.

Presidency University, Bengaluru


Transaction and System Concepts
(contd.)
The System Log File
• The log is a sequential , append-only file to keep track of all operations
of all transactions in the order in which they occurred. This information
is needed during recovery from failures
• Log is kept on disk - not affected except for disk or catastrophic failure
• As with other disk files, a log main memory buffer is kept for holding the
records being appended until the whole buffer is appended to the end
of the log file on disk
• Log is periodically backed up to archival storage (tape) to guard against
catastrophic failures
• protocols for recovery that avoid cascading rollbacks do not require that
read operations be written to the system log; most recovery protocols
fall in this category
• strict protocols require simpler write entries that do not include
new_value

Presidency University, Bengaluru


Transaction and System Concepts
(contd.)
Types of records (entries) in log file:
• [start_transaction,T]: Records that transaction T has started
execution.
• [write_item,T,X,old_value,new_value]: T has changed the value of
item X from old_value to new_value.
• [read_item,T,X]: T has read the value of item X (not needed in
many cases).
• [end_transaction,T]: T has ended execution
• [commit,T]: T has completed successfully, and committed.
• [abort,T]: T has been aborted.

Presidency University, Bengaluru


Transaction and System Concepts
(contd.)

Commit Point of a Transaction:


o Definition: A transaction T reaches its commit point when all its
operations that access the database have been executed
successfully and the effect of all the transaction operations on the
database has been recorded in the log file (on disk). The
transaction is then said to be committed.

Presidency University, Bengaluru


Desirable Properties of
Transactions
Called ACID properties – Atomicity, Consistency, Isolation, Durability:

• Atomicity: A transaction is an atomic unit of processing; it is either


performed in its entirety or not performed at all.
• Enforced by the recovery protocol.

• Consistency preservation: A correct execution of the transaction must take


the database from one consistent state to another.
• Specifies that each transaction does a correct action on the database on its own.
Application programmers and DBMS constraint enforcement are responsible for this.

Presidency University, Bengaluru


Desirable Properties of
Transactions (contd.)

ACID properties (cont.):


• Isolation: Even though transactions are executing concurrently, they should appear to
be executed in isolation – that is, their final effect should be as if each transaction was
executed in isolation from start to finish.
• Responsibility of the concurrency control protocol.

• Durability: Once a transaction is committed, its changes (writes) applied to the


database must never be lost because of subsequent failure.
• Enforced by the recovery protocol.

Presidency University, Bengaluru


95

You might also like