Distributed Database Chapter 3 Modified
Distributed Database Chapter 3 Modified
Chapter - 3
• Note: We added a new attribute (LOC) to the PROJ relation that indicates the place of each
project (Location).
13
Distribution Design Issues (2. Fragmentation Alternatives con…)
• Horizontal Fragmentation Example:
➡ The Figure below shows the PROJ relation divided horizontally into two sub-relations:
14
Distribution Design Issues (2. Fragmentation Alternatives con…)
• Vertical Fragmentation Example:
➡ The Figure below shows the PROJ relation divided vertically into two sub-relations:
PROJ1: information about project budgets
PROJ2: information about project names and locations
• Hybrid Fragmentation:
➡ The fragmentation may be nested. If the nestings are of different types, one gets
hybrid fragmentation
✦ many real-life partitioning are hybrid.
15
Distribution Design Issues (3. Degree of Fragmentation)
•This decides the extent to which the database should be
fragmented that affects the performance of query
execution:
➡not to fragment at all, or
➡to the other extreme, or
➡to fragment to the level of individual tuples (in the case
of horizontal fragmentation) or
➡to the level of individual attributes (in the case of
vertical fragmentation)
16
Distribution Design Issues (4. Correctness Rules of Fragmentation)
• Completeness
➡ Decomposition of relation R into fragments R1, R2, ..., Rn is complete if
and only if each data item in R can also be found in some Ri
• Reconstruction
➡ If relation R is decomposed into fragments FR ={ R1, R2, ..., Rn }, it
should be possible to define a relational operator ∇ such that
• Disjointness
➡ If relation R is decomposed into fragments FR ={R1, R2, ..., Rn},and data
item di is in Rj, then di should not be in any other fragment Rk (k ≠ j).
17
Distribution Design Issues (5. Allocation Alternatives)
• After the database is fragmented properly, one has to decide on the
allocation of the fragments to various sites on the network.
➡ Non-replicated
✦ partitioned : each fragment resides at only one site
➡ Replicated
✦ fully replicated : each fragment at each site
✦ partially replicated : each fragment at some of the sites
• Rule of thumb:
read - only queries
➡ If update queries , replication is advantageous, otherwise
1
18
Distribution Design Issues (5. Allocation Alternatives con…)
• Comparison of Allocation Alternatives:
19
Distribution Design Issues (6. Information Requirements)
•Four categories of information needed for distribution
design:
✦ data base information
✦ application information
✦ communication network information, and
✦ computer system information.
➡The first two categories are used in fragmentation
algorithms
➡The latter two categories are used in allocation models
rather than in fragmentation algorithms.
20
Fragmentation strategies and algorithms
Three alternatives of fragmentation:
✦ Horizontal Fragmentation (HF)
✦ Vertical Fragmentation (VF)
✦ Hybrid Fragmentation (HF)
• Horizontal Fragmentation:
o There are two versions:
➡ Primary horizontal fragmentation (PHF)- performed using predicates
that are defined on that relation.
➡ Derived horizontal fragmentation (DHF)- partitioning of a relation that
results from predicates being defined on another relation.
21
Horizontal Fragmentation
• Information Requirements of Horizontal Fragmentation :
➡ Database Information: Concerns the global conceptual schema
✦ Important to know how database relations are connected to one another
with joins
✦ In relational models, directed links are drawn between relations that are
related to each other by an equijoin operation
✓ The relation at the tail of a link is called the owner of the link and the relation
at the head is called the member
✤ both provide mappings from the set of links to the set of relations.
✦ The quantitative information required about the database is the cardinality
of each relation R, denoted card(R).
22
Horizontal Fragmentation(cont.…)
Example: Database Information
E(())
24
Vertical Fragmentation
• A vertical fragmentation of a relation R produces fragments R1, R2,…, Rr,
each of which contains a subset of R’s attributes as well as the primary key
of R.
• Has been studied within the centralized context
➡ design methodology
➡ physical clustering
• More difficult than horizontal, because more alternatives exist.
• Two approaches :
➡ Grouping: starts by assigning each attribute to one fragment, and at
each step, joins some of the fragments until some criteria is satisfied
➡ Splitting: starts with a relation and decides on beneficial partitionings
based on the access behavior of applications to the attributes.
25
Vertical Fragmentation(con…)
•Splitting generates non-overlapping fragments whereas
grouping typically results in overlapping fragments.
➡ We prefer non-overlapping fragments for disjointness.
✦ Non-overlapping refers only to non-primary key attributes.
✓ We do not consider the replicated key attributes to be
overlapping.
✤ Advantage: Easier to enforce functional dependencies
(for integrity checking etc.)
26
Vertical Fragmentation(con…)
• Information Requirements of Vertical Fragmentation:
o Application Information: The major information required for vertical
fragmentation is related to applications.
✦ Attribute affinities
✓ a measure that indicates how closely related the attributes are
27
Vertical Fragmentation(con…)
Information Requirements of Vertical Fragmentation:
• Example: Attribute usage values
Consider the following 4 queries for relation PROJ
q1: SELECT BUDGET q2: SELECT PNAME,BUDGET
FROM PROJ FROM PROJ
WHERE PNO=Value
q3: SELECT PNAME q4: SELECT SUM(BUDGET)
FROM PROJ FROM PROJ
WHERE LOC=Value WHERE LOC=Value
➡ Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC. The usage values are defined in the
matrix where (i,j) denotes use (qi, Aj)
A1 A2 A3 A4
q1 1 0 1 0
q2 0 1 1 0
q3 0 1 0 1
q4 0 0 1 1
28
Vertical Fragmentation(con…)
Information Requirements of Vertical Fragmentation:
• Attribute Affinity Measure:
➡ The attribute affinity measure between two attributes Ai and Aj of a relation R[A1, A2, …,
An] with respect to the set of applications Q = (q1, q2, …, qq) is defined as follows :
✦ refl(qk) is the number of accesses to attributes (Ai, Aj) for each execution of application
qk at site Sl and
✦ accl(qk) is the application access frequency measure of application qk at site Sl .
29
Vertical Fragmentation(con…)
Information Requirements of Vertical Fragmentation:
• Example: Attribute Affinity Measure
➡ Assume each query in the previous example accesses the attributes once during each
execution.
➡ Also assume the access frequencies (acc)
➡ That is,
✦ acc1(q1) = 15, acc2(q1) = 20, acc3(q1) = 10
✦ acc1(q2) = 5, acc2(q2) = 0, acc3(q2) = 0
✦ acc1(q3) = 25, acc2(q3) = 25, acc3(q3) = 25
✦ acc1(q4) = 3, acc2(q4) = 0, acc3(q4) = 0
30
Vertical Fragmentation(con…)
Information Requirements of Vertical Fragmentation:
• Example: Attribute Affinity Measure
➡ Then the affinity measure between attributes A1 and A3:
l 3
aff (A1, A3) = accl(qk)
k=1 l=1
= acc1(q1) + acc2(q1)+ acc3(q1) =45
➡ and the attribute affinity matrix (AA) =
➡ Note: The diagonal values are not computed since they are
meaningless.
31
Hybrid Fragmentation
Reading assignment !
32
Fragment Allocation
• Problem Statement
Given
F = {F1, F2, …, Fn} fragments
S ={S1, S2, …, Sm} network sites
Q = {q1, q2,…, qq} applications
Find the "optimal" distribution of F to S.
• Optimality
➡ Minimal cost
✦ Communication + storage + processing (read & update)
✦ Cost in terms of time (usually)
➡ Performance
Response time and/or throughput
➡ Constraints
✦ Per site constraints (storage & processing)
33
Allocation
File Allocation (FAP) vs Database Allocation (DAP):
➡ Fragments are not individual files
Decision Variable
➡ Retrieval Cost
39
Allocation Model(cont.…)
• Constraints
➡ Response Time
execution time of query ≤ max. allowable response time for
that query
➡ Storage Constraint (for a site)