Homework #3 Join Algorithms After - 12
Homework #3 Join Algorithms After - 12
IMPORTANT:
• Upload this PDF with your answers to Gradescope by 11:59pm on Sunday Oct 24, 2021.
• Plagiarism: Homework may be discussed with other students, but all homework is to be
completed individually.
• You have to use this PDF for all of your answers.
For your information:
• Graded out of 100 points; 2 questions total
• Rough time estimate: ≈ 1 - 2 hours (0.5 - 1 hours for each question)
Revision : 2021/10/12 17:35
1
15-445/645 (Fall 2021) Homework #3 Page 2 of 4
(b) [5 points] Again, assuming that the DBMS has six buffers. What is the total I/O cost to
sort the file?
2 60,000,000 2 120,000,000 2 144,000,000 2 240,000,000 2 480,000,000
(c) [10 points] What is the smallest number of buffers B that the DBMS can sort the target
file using only two passes?
2 172 2 173 2 174 2 2,450 2 2,451 2 2,452 2 2,827 2 2,828
2 2,829 2 3,999,999 2 4,000,000 2 4,000,001
(d) [10 points] What is the smallest number of buffers B that the DBMS can sort the target
file using only six passes?
2 14 2 15 2 16 2 1,240 2 1,241 2 1,242 2 1,256 2 1,257
2 1,258 2 2,934 2 2,935 2 2,936 2 3,999,999 2 4,000,000
2 4,000,001
(e) [5 points] Suppose the DBMS has twenty-four buffers. What is the largest database file
(expressed in terms of N , the number of pages) that can be sorted with external merge
sort using six passes?
2 65,610 2 65,601 2 131,071 2 131,072 2 3,590,490 2 3,590,940
2 49,251,980 2 49,521,980 2 154,472,230 2 154,472,232
Homework #3 continues. . .
15-445/645 (Fall 2021) Homework #3 Page 3 of 4
(a) [5 points] Block nested loop join with R as the outer relation and S as the inner relation:
2 11,200 2 23,000 2 56,400 2 85,000 2 92,600
(b) [5 points] Block nested loop join with S as the outer relation and R as the inner relation:
2 31,200 2 43,000 2 43,600 2 52,900 2 55,400
(c) Hash join with S as the outer relation and R as the inner relation. You may ignore recursive
partitioning and partially filled blocks.
i. [5 points] What is the cost of the partition phase?
2 2,800 2 4,400 2 5,000 2 5,800 2 7,200
ii. [5 points] What is the cost of the probe phase?
2 2,800 2 4,400 2 3,600 2 4,800 2 7,200
(d) [10 points] Assume that the tables do not fit in main memory and that a high cardinality
of distinct values hash to the same bucket using your hash function h1 . Which of the
following approaches works the best?
2 Create hashtables for the inner and outer relation using h1 and rehash into an embed-
ded hash table using h1 for large buckets
2 Create hashtables for the inner and outer relation using h1 and rehash into an em-
bedded hash table using h2 != h1 for large buckets
2 Use linear probing for collisions and page in and out parts of the hashtable needed
at a given time
2 Create 2 hashtables half the size of the original one, run the same hash join algo-
rithm on the tables, and then merge the hashtables together
Question 2 continues. . .
15-445/645 (Fall 2021) Homework #3 Page 4 of 4
(e) Sort-merge join with S as the outer relation and R as the inner relation:
i. [4 points] What is the cost of sorting the tuples in R on attribute a?
2 3,000 2 5,600 2 7,400 2 9,600 2 10,800
ii. [4 points] What is the cost of sorting the tuples in S on attribute a?
2 3,400 2 4,000 2 6,400 2 7,600 2 8,800
iii. [10 points] What is the cost of the merge phase assuming there are no duplicates in
the join attribute?
2 1,400 2 1,800 2 3,600 2 4,400 2 4,800
iv. [10 points] What is the cost of the merge phase in the worst-case scenario?
2 1,080,000 2 2,880,000 2 3, 080,000 2 4, 750,000 2 10,080,000
v. [2 points] Now consider joining R, S and then joining the result with T. What is the
cost of the merge phase assuming there are no duplicates in the join attribute?
2 1,000 2 2,000 2 3,000 2 5,000 2 2,000,000
End of Homework #3