0% found this document useful (0 votes)
20 views

Homework #3 Join Algorithms After - 12

The document is a homework assignment on database systems with questions about sorting and join algorithms. It provides details on sorting a file of 6 million pages using external merge sort with different numbers of buffers. It also provides details on computing the I/O costs of different join algorithms, including block nested loop, hash, and sort-merge joins, on relations of varying sizes.

Uploaded by

m
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Homework #3 Join Algorithms After - 12

The document is a homework assignment on database systems with questions about sorting and join algorithms. It provides details on sorting a file of 6 million pages using external merge sort with different numbers of buffers. It also provides details on computing the I/O costs of different join algorithms, including block nested loop, hash, and sort-merge joins, on relations of varying sizes.

Uploaded by

m
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

C ARNEGIE M ELLON U NIVERSITY

C OMPUTER S CIENCE D EPARTMENT


15-445/645 – DATABASE S YSTEMS (FALL 2021)
A NDREW C ROTTY & L IN M A

Homework #3 (by Sophie Qiu)


Due: Sunday Oct 24, 2021 @ 11:59pm

IMPORTANT:
• Upload this PDF with your answers to Gradescope by 11:59pm on Sunday Oct 24, 2021.
• Plagiarism: Homework may be discussed with other students, but all homework is to be
completed individually.
• You have to use this PDF for all of your answers.
For your information:
• Graded out of 100 points; 2 questions total
• Rough time estimate: ≈ 1 - 2 hours (0.5 - 1 hours for each question)
Revision : 2021/10/12 17:35

Question Points Score


Sorting Algorithms 40
Join Algorithms 60
Total: 100

1
15-445/645 (Fall 2021) Homework #3 Page 2 of 4

Question 1: Sorting Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [40 points]


We have a database file with six million pages (N = 6,000,000 pages), and we want to sort it
using external merge sort. Assume that the DBMS is not using double buffering or blocked
I/O, and that it uses quicksort for in-memory sorting. Let B denote the number of buffers.
(a) [10 points] Assume that the DBMS has six buffers. How many passes does the DBMS
need to perform in order to sort the file?
2 5 2 7 2 8 2 10 2 12

(b) [5 points] Again, assuming that the DBMS has six buffers. What is the total I/O cost to
sort the file?
2 60,000,000 2 120,000,000 2 144,000,000 2 240,000,000 2 480,000,000

(c) [10 points] What is the smallest number of buffers B that the DBMS can sort the target
file using only two passes?
2 172 2 173 2 174 2 2,450 2 2,451 2 2,452 2 2,827 2 2,828
2 2,829 2 3,999,999 2 4,000,000 2 4,000,001

(d) [10 points] What is the smallest number of buffers B that the DBMS can sort the target
file using only six passes?
2 14 2 15 2 16 2 1,240 2 1,241 2 1,242 2 1,256 2 1,257
2 1,258 2 2,934 2 2,935 2 2,936 2 3,999,999 2 4,000,000
2 4,000,001

(e) [5 points] Suppose the DBMS has twenty-four buffers. What is the largest database file
(expressed in terms of N , the number of pages) that can be sorted with external merge
sort using six passes?
2 65,610 2 65,601 2 131,071 2 131,072 2 3,590,490 2 3,590,940
2 49,251,980 2 49,521,980 2 154,472,230 2 154,472,232

Homework #3 continues. . .
15-445/645 (Fall 2021) Homework #3 Page 3 of 4

Question 2: Join Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [60 points]


Consider relations R(a, b), S(a, c, d), and T(a, e) to be joined on the common attribute
a. Assume that there are no indexes available on the tables to speed up the join algorithms.
• There are B = 60 pages in the buffer
• Table R spans M = 1,400 pages with 60 tuples per page
• Table S spans N = 2,200 pages with 200 tuples per page
• The joining result of R and S spans K = 2,000 pages
• Table T spans L = 1,000 pages with 200 tuples per page
Answer the following questions on computing the I/O costs for the joins. You can assume the
simplest cost model where pages are read and written one at a time. You can also assume that
you will need one buffer block to hold the evolving output block and one input block to hold
the current input block of the inner relation. You may ignore the cost of the writing of the final
results.

(a) [5 points] Block nested loop join with R as the outer relation and S as the inner relation:
2 11,200 2 23,000 2 56,400 2 85,000 2 92,600

(b) [5 points] Block nested loop join with S as the outer relation and R as the inner relation:
2 31,200 2 43,000 2 43,600 2 52,900 2 55,400

(c) Hash join with S as the outer relation and R as the inner relation. You may ignore recursive
partitioning and partially filled blocks.
i. [5 points] What is the cost of the partition phase?
2 2,800 2 4,400 2 5,000 2 5,800 2 7,200
ii. [5 points] What is the cost of the probe phase?
2 2,800 2 4,400 2 3,600 2 4,800 2 7,200

(d) [10 points] Assume that the tables do not fit in main memory and that a high cardinality
of distinct values hash to the same bucket using your hash function h1 . Which of the
following approaches works the best?
2 Create hashtables for the inner and outer relation using h1 and rehash into an embed-
ded hash table using h1 for large buckets

2 Create hashtables for the inner and outer relation using h1 and rehash into an em-
bedded hash table using h2 != h1 for large buckets

2 Use linear probing for collisions and page in and out parts of the hashtable needed
at a given time

2 Create 2 hashtables half the size of the original one, run the same hash join algo-
rithm on the tables, and then merge the hashtables together

Question 2 continues. . .
15-445/645 (Fall 2021) Homework #3 Page 4 of 4

(e) Sort-merge join with S as the outer relation and R as the inner relation:
i. [4 points] What is the cost of sorting the tuples in R on attribute a?
2 3,000 2 5,600 2 7,400 2 9,600 2 10,800
ii. [4 points] What is the cost of sorting the tuples in S on attribute a?
2 3,400 2 4,000 2 6,400 2 7,600 2 8,800
iii. [10 points] What is the cost of the merge phase assuming there are no duplicates in
the join attribute?
2 1,400 2 1,800 2 3,600 2 4,400 2 4,800
iv. [10 points] What is the cost of the merge phase in the worst-case scenario?
2 1,080,000 2 2,880,000 2 3, 080,000 2 4, 750,000 2 10,080,000
v. [2 points] Now consider joining R, S and then joining the result with T. What is the
cost of the merge phase assuming there are no duplicates in the join attribute?
2 1,000 2 2,000 2 3,000 2 5,000 2 2,000,000

End of Homework #3

You might also like