COP 4710 Final Exam (take home)
Due: 11:30pm, May 9, 2020
Please type or clearly write your answers. Scan the solutions into a single PDF file named S20Final-
xxx.pdf where xxx is your netid. Only pdf files submitted via Canvas will be graded. This is an open-book
exam, however any discussions / exchange of information between two students are considered
cheating.
Problem I. Queries. (25 points)
Consider the following schema:
Suppliers(sid: integer, sname: string, address: string)
Parts(pid: integer, pname: string, color: string)
Catalog(sid: integer, pid: integer, price: real)
The Suppliers relation describes suppliers of parts. The Parts relation contains information about each
part. The Catalog relation lists the prices in dollars charged for parts by suppliers. (The keys are
underlined: sid is a key for Suppliers, (sid,pid) is a key for Catalog, and pid is a key for Parts.)
Write the following queries. For each query, it is stated in which query language you should express the
query.
1. Find the sids of suppliers who supply all blue parts;
2. Find the names of the most expensive parts supplied by suppliers named `Santa Claus’. Write
this query in relational algebra; you can use any relational algebra operator;
3. Find the names and pids of parts that are supplied by every supplier at a price of strictly less
than 200 dollars. (If a supplier does not supply the part or if a supplier charges more than 200
dollars for it, the part should not be output.) Write this query in SQL;
4. Find the name of the part(s) with the second highest price. Write this query in relational
algebra;
5. Can you pose a query such that the information in the database is sufficient to answer the query
(e.g., you can use the contents of the Suppliers, Parts, and Catalog relation to somehow
compute the answer to the query), but you cannot express the query in SQL? State such a query
in English.
Problem II. Functional dependencies (FD), Normal Forms (25pts)
Let a relational schema R = (A, B, C, D, E, F) and we have (and only have) the following the functional
dependencies:
A à B
B à C
B à D
CE à F
1. (2pts) Compute B+;
2. (2pts) Compute A+;
3. (5pts) Find all the candidate keys of R;
Suppose we have another schema R’ = (A, B, C, D) and the following are the only functional
dependencies we identified about R’: B à C, D à A .
4. (4pts) Is R’ in Boyce-Codd Normal Form? Explain your answer.
5. (4pts) Is R’ in 3rd Normal Form? Explain your answer.
Let us again consider table R’ but the set of all FDs are changed to: ABC à D, D à A. (i.e., the functional
dependency Bà C is no longer true).
6. (4 pts) Is R’ in Boyce-Codd Normal Form? Explain your answer. To get full credits, you have to list
all steps of reasoning including what candidate keys you found.
7. (4 pts) Is R’ in 3rd Normal Form? Explain your answer.
Problem III. Indexing (23pts)
In this problem, you should follow two rules: 1) when splitting an overflowing node with an odd number
of values, you put one more value in the right hand side node; 2) In redistribution, you can only borrow
values from the left hand side sibling.
Draw the intermediate trees for partial credits.
1. (8pts) Construct a B+-tree for the following set of key values:
27, 18, 45, 41, 68, 19, 56, 34, 12, 76, 50, 11, 92, 62, 70, 103
Assume the tree is initially empty and the values are added in the order specified by the above list. The
number of search key values that will fit in one node is FOUR.
Given the following B+-tree:
2. (4pts) Draw the tree after deleting key values 1 and 6 from the above tree;
3. (4pts) Draw the tree after deleting key value 32 from the original tree (not the one you got from
question 2) shown in the above figure.
4. (4pts) Draw the tree after deleting key value 73 from the original tree (not the one you got from
question 3) shown in the figure.
5. (3pts) Can you think of three key values that, once deleted from the original tree, will decrease the
tree height by one?
Problem IV. Query Processing (27pts)
Given two relations r and s, and the following statistics:
Relation Total number of pages Average rows per page
r 10,000 10
s 5000 20
Estimate the number of tuples in the following relational algebraic expressions
1. (3 pts) r x s ;
2. (2 pts) ⋈ , where the join condition is on and (only) on a foreign key in s referencing r;
3. (2 pts) !"∩!$(), where θ1 and θ2 are two selection conditions with selectivity 0.5 and 0.8,
respectively. The selectivity of a selection condition is defined as the ratio of the number of returned
rows selected by the condition to that of the original table;
The following questions are about the I/O cost:
4. (5 pts) What is the I/O cost of joining r and s using a page-based nested loop join?
5. (5 pts) What is the I/O cost of joining r and s using a block-based nested loop join?
6. (5 pts) What is the I/O cost of the join if a B+-tree exists for every attribute of s?
7. (2 pts) What is the best way of processing the following simple query if no indexes exist, and what is
the I/O cost of your solution?
SELECT r.a, r.b FROM r WHERE r.c < 30000 AND r.d = 15;
The following problem is about relational algebra:
8. (3pts) Show that the following equivalence rule for transforming relational expressions holds true:
!(" − #) = !(") − !(#)
where θ is a selection condition, and E1 and E2 are two relations whose schemas are designed such
that the set operations between them make sense. A formal proof is not required but you should
clearly write down your reasoning. If you use a graph to show this, you need to clearly state what
set each section of the graph represents.
Problem V. Extra Credit – Transaction Management (15pts) Content related to this problem was not
specifically covered in class, therefore you need to read your book (Chapter 17.1) and relevant
documents (Wiki page for serializability) to solve the problem.
1. (8pts) Explain what the ACID features are for transaction management in databases. Briefly explain
each item.
A
C
I
D
Consider the following transaction schedule, where time increases from top to bottom.
T1 T2 T3 T4
Read (X)
Read(Y)
Read(Z)
Read(Y)
Write(Y)
Write(Z)
Read(U)
Read(Y)
Write(Y)
Read(Z)
Write(Z)
Read(U)
Write(U)
Answer the following questions:
2. (4 pts) Draw the precedence graph of the above schedule.
3. (4 pts) Is this schedule conflict serializable? If yes, show what serial schedule(s) it is equivalent to. If
no, explain why.
4. (4 pts) Is this schedule view serializable? If yes, show what serial schedule(s) it is equivalent to. If no,
explain your answer.