The Chinese University of Hong Kong
Department of Systems Engineering and Engineering
Management
SEEM3550 Fundamentals of Information Systems
2019-2020 Assignment 3: Query Processing and Query
Optimization
This is an individual assignment.1 This assignment is about query processing
and query optimization, and is worth of 14% of the total assessment. The due
date for the assignment 3 is 11:59pm, May 4, 2020. Submissions need to be
submitted to the online Blackboard system. The late penalty will be 10% per
day. A submission will not be accepted five days after the deadline.
• As requested by CUHK, every assignment handed in should be accompa-
nied by a signed declaration. The best is to do as follows.
1. submit your assignment to Veriguide and then sign the declaration
form, and
2. submit both your assignment and declaration form together to the
blackboard system.
• The declaration form is also available at
https://www.cuhk.edu.hk/policy/academichonesty/Eng htm files (2013-14)/p10.htm
• You must submit your assignment together with the declaration form.
Without your declaration form, we cannot give you a mark for your as-
signment.
In the website for the textbook http://db-book.com, the solutions to practice
exercises in most chapters are given. It is strongly recommended for you to try
the practice exercises and see whether your answers are correct by checking the
answers provided at the website.
1Departmental Guideline for Plagiarism (Department of Systems Engineering and Engi-
neering Management): If a student is found plagiarizing, his/her case will be reported to the
Department Examination Panel. If the case is proven after deliberation, the student will auto-
matically fail the course in which he/she committed plagiarism. The definition of plagiarism
includes copying of the whole or parts of written assignments, programming exercises, reports,
quiz papers, mid-term examinations and final examinations. The penalty will apply to both
the one who copies the work and the one whose work is being copied, unless the latter can
prove his/her work has been copied unwittingly. Furthermore, inclusion of others’ works or
results without citation in assignments and reports is also regarded as plagiarism with similar
penalty to the offender. A student caught plagiarizing during tests or examinations will be
reported to the Faculty office and appropriate disciplinary authorities for further action, in
addition to failing the course.
1
2019-2020 2
Question 1: Two relational algebra expressions are said to be equiva-
lent if the two expressions generate the same set of tuples. To obtain a re-
lational algebra expression from another, we use equivalence rules, as discussed
in ch13.pptx.
Consider the following SQL for a user query: “Find the names of all instructors
in the Music department who have taught a course in 2009, along with the titles
of the courses that they taught.”
select I.name, C.title
from instructor I, teaches T, course C
where I.dept_name = "Music"
and T.year = 2009
and I.ID = T.ID
and T.course_id = C.course_id
(1) For the SQL given above, show its initial relational algebra expression.
Hint: follow the slide 6.38 in ch6.pptx.
(2) Given the relational algebra expression you have in (1), show how to get
the relational algebra expression as shown in Figure (a) on the slide 1.14
in ch13.pptx, without the projection of Πcourse id,title.
(3) Consider the relational algebra expression you get in (2). Show how you
add projections based on Rule 8(a) and Rule 8(b). You need to add a
projection for each of the three input relations.
(4) Let the initial relational algebra expression you have for (1) be E. Show
how you add 4 different equivalent relational algebra expressions beside
E into EQ following the algorithm given in the slide 1.19 in ch13.pptx
in details. You need to show which one you pick up as Ei, which ei is
selected, and which rule you use for each new relational algebra expression
E′ to be added.
Question 2: We discussed the merge-join algorithm for an equi-join
r ✶r.A=s.B s, when both relations are sorted in ch12.pptx. The pseudo code
is shown in the slide 12.32 in ch12.pptx and discussed in Section 12.5.4 in the
textbook.
(1) Suppose that relation r is sorted on the attribute A but relation s is not
sorted on the attribute B. Here, we assume that there is B+-tree, as a
secondary index, built for the join attribute B in s. Recall that there
are pointer and search key pairs, (pi, ki), in a leaf node in the secondary
B+-tree. For a secondary B+-tree, such a pointer pi points to a list of
pointers, Di = {d1, d2, · · · }, where each dj = (bl, sk) points to a data
record (or a tuple) that has the search key ki located at the slot sk in the
block bi in the data file using the slotted page structure (refer to the slide
2019-2020 3
10.14 in ch11.ppt for the slotted page structure and refer to the slide
12.10 in ch12.pptx for Di). Give a pseudo code to do this “hybrid merge
join” using the secondary B+-tree built on the attribute s.B.
• A naive approach is to do something using the similar idea presented
in indexed nested-loop. Note that it does not make use of sorting for
both the sorted attribute A and the secondary B+-tree built on B.
• You need to give an algorithm that utilizes the sorting on both sides
(e.g., the sorted attribute A and the secondary B+-tree built on the
attribute B), and minimizes the block transfers. Hint: for minimiz-
ing the block transfer, you need to know the drawback of secondary
B+-tree (refer to the slide 12.9 when we discuss the selection using
indices).
(2) Suppose that relation r is not sorted on the attribute A and the relation
s is not sorted on the attribute B. However, there is a secondary B+-tree
built for relation r on the attribute A and there is a secondary B+-tree
built for relation s on the attribute B. Show a pseudo code to do this
“hybrid merge join” making use of both secondary B+-trees to minimize
the block transfer.