ISAD363  
ADVANCED DATABASES   
COURSEWORK ASSIGNMENT  
MODULE LEADER  
Dr Marco Palomino   
DEADLINE  
Tuesday 5 May 2020 at 16:00  
ASSESSED MODULE LEARNING OUTCOMES  
ALO1: Critically compare and contrast the differences between  
relational and non-relational (NoSQL) databases.  
ALO2: Critically appraise NoSQL database strengths and  
weaknesses.  
ALO3: Demonstrate the ability to perform all the CRUD  
operations (namely, create, retrieve, update and delete) on  
NoSQL databases.  
1. BACKGROUND  
Computer systems that make recommendations are called recommender systems. Netflix  
recommends movies you may like to watch, Amazon recommends products you may be interested  
in purchasing, and Facebook recommends people you may want to contact and become friends  
with. The actual algorithms used by these companies to make such suggestions are closely  
guarded trade secrets, and they are not available for public scrutiny. However, in this assignment,  
you will have an opportunity to learn the basics behind recommender systems. You will use  
Facebook as a case study for recommender systems. First, you will have to create a database to  
store Facebook users and such a database will have to be a Neo4j graph database.  
2. FRIENDSHIP GRAPHS  
Figure 1 displays the relationships (friendships) graph for the main characters in Romeo and  
Juliet, the tragedy published by William Shakespeare in 1597.  
A link between character A and character B in Figure 1 means that A considers B a friend, and  
also B considers A a friend—these links are undirected, despite the arrows. Note that Figure 1  
shows the characters who belong to the House of Montague in orange, the characters who belong  
to Ruling House of Verona in green, the characters who belong to the House of Capulet in pink,  
the Franciscans in purple, and the Nurse who works for the House of Capulet in blue. The colours  
are just to help us interpret the graph—colours are not mandatory in your assignment.  
Figure 1.  Romeo and Juliet friendship graph  
3. FRIEND RECOMMENDATIONS  
Formally speaking, recommending a new friend is equivalent to answering the following  
question: “For user X, list some non-friends of X in order, starting with the best friend  
recommendation and ending with the worst.” A non-friend is a user who is not X and is not a  
friend of X. For example, for the user Mercutio in Figure 1, the list of non-friends is:  
 Benvolio  
 Capulet  
 Friar Lawrence  
 Juliet  
 Montague  
 The Nurse  
 Tybalt  
3.1. Recommend by number of common friends  
If non-friend Y is your friend’s friend, then maybe Y should be your friend too. If user Y is the  
friend of many of your friends, then Y is an even better recommendation. The best friend  
recommendation is the user with whom you have the largest number of mutual friends. For  
example, consider Mercutio in Figure 1.  
 Mercutio has one friend in common with Benvolio (Romeo).  
 Mercutio has two friends in common with Capulet (Paris and Prince Escalus).  
 Mercutio has one friend in common with Friar Lawrence (Romeo).  
 Mercutio has one friend in common with Juliet (Romeo).  
 Mercutio has two friends in common with Montague (Prince Escalus and Romeo).  
 Mercutio has no friends in common with the Nurse.  
 Mercutio has no friends in common with Tybalt.  
Therefore, Capulet and Montague are the best friend recommendations for Mercutio, and the  
Nurse and Tybalt are the worst recommendations.  
Note that the recommendations might not be symmetric: the best friend recommendation for  
Montague might be Mercutio, but the best friend recommendation for Mercutio might be Capulet,  
rather than Montague.  
3.2. Recommend by influence scoring  
We will now study a different algorithm. Consider the following hypothetical situation.  
 Two of your friends are J. K. Rowling and James Corden.  
 J. K. Rowling has only two friends (you and one other person).  
 James Corden has 7,000,000,000 friends.  
 J. K. Rowling and James Corden have no friends in common (apart from you).  
Since J. K. Rowling is highly selective in terms of friendship, and is a friend of yours, you are  
likely to have a lot in common with J. K. Rowling’s other friend. On the other hand, James Corden  
is indiscriminate and there is little reason to believe that you should be friendly with any particular  
one of James Corden’s other friends.  
Let us incorporate the above idea into a friend recommendation algorithm. Here is the concrete  
way to do so—note that we call the technique “influence scoring”: Suppose that PersonA and  
PersonB have three friends in common: Friend1, Friend2, and Friend3. When recommending  
friends by influence scoring, the score for PersonB as a friend of PersonA is 1/numfriends(Friend1)  
+ 1/numfriends(Friend2) + 1/numfriends(Friend3), where numfriends(f) is the number of friends  
that f has. In other words, each friend F of PersonA has a total influence score of 1 to contribute,  
and divides it equally among all of her friends.  
In the example above, J. K. Rowling’s other friend would have a score of 1/2, and each of James  
Corden’s friends would have a score of 1/7000000000.  
3.3. Algorithm comparison  
Does the change of recommendation algorithm make any difference? As you should be able to  
find out, Mercutio gets the same friend recommendations with both recommendation algorithms  
(number of common friends and influence scoring). In fact, there are 5 characters in the Romeo  
and Juliet scenario for whom the recommendations are the same, regardless of the algorithm.  
However, there are 6 characters for whom the recommendations are different.  
4. THE DATASET  
The assignment employs a Facebook dataset provided as a courtesy by the Max Planck Institute  
for Software Systems. The data is described in more detail in the following publication:  
Viswanath, B., Mislove, A., Cha, M. and Gummadi, K.P., 2009, August. On  
the Evolution of User Interaction in Facebook. In Proceedings of the 2nd ACM  
Workshop on Online Social Networks (pp. 37-42). ACM.  
The Facebook dataset contains, exactly, 63,731 Facebook users—no duplicates are included— 
and 817,090 friendship relationships among them—this is the total number of links established  
via friendships. Do not be alarmed if importing the Facebook dataset into Neo4j takes a while.  
The number of friendships is large! However, if it takes longer than 10 minutes, it probably means  
that you are doing something wrong—perhaps, you did not index the right attributes in your  
database. Also, do not try to draw the whole Facebook graph at once—this may cause your  
computer to “freeze”. Besides, even if you successfully draw the entire graph, you would not learn  
much from a tangled mess of 817,090 links.  
The Facebook dataset for your coursework can be downloaded from the module DLE website  
under the Coursework Section—look for a number of files stored in a folder named Facebook  
Dataset. Although the dataset is spread across different files, you are expected to import every  
single data item into the database that you will submit—marks will be deducted for failing to  
include all the data provided.  
4. THE PROBLEM  
4.1. Recommend by number of common friends  
For every Facebook user in the dataset with an ID number that is a multiple of 980, provide a list  
containing the first 10 friend recommendations, as determined by the number of common friends.  
If there are fewer than 10 recommendations, provide all the recommendations. For each case, you  
will have to provide the Neo4j CQL query that generated the friend recommendations, as well as  
the actual recommendations. Note that the recommendations alone will not receive any marks,  
and the Neo4j CQL queries must be provided in text format (screenshots will be awarded zero  
marks). Also, note that your queries will be tested. Hence, queries that do not produce the  
submitted results will invalidate the answers and zero marks will be awarded.  
An example of the format in which your results should be delivered is listed below in Table 1.  
Please, note that the CQL query and the recommendations listed in Table 1 are shown only to  
illustrate the format, and they are not actual solutions to the problem in question.  
User ID CQL Query Friend recommendation  
… … …  
13720 Match (friend:FacebookUser {id:  
‘13790’})-[r:FRIENDS_WITH]-(  
user:FacebookUser) RETURN friend  
17125, 7033, 15462,  
33049, 51105, 16424,  
23, 7996, 1539,  
17420  
14700 Match (friend:FacebookUser {id:  
‘14775’})-[r:FRIENDS_WITH]-(  
user:FacebookUser) RETURN friend  
14473, 14495, 17951,  
19611, 22749, 23259,  
30002, 3154, 8269,  
862  
15680 Match (friend:FacebookUser {id:  
‘15760’})-[r:FRIENDS_WITH]-(  
user:FacebookUser) RETURN friend  
28606  
… … …  
Table 1  
4.2. Recommend by influence scoring  
For every Facebook user in the dataset with an ID that is a multiple of 980, provide a list  
containing the first 10 friend recommendations, as determined by influence scoring. If there are  
fewer than 10 recommendations, provide all the recommendations.  
Once again, the output format should be the same shown in Table 1. You are expected to provide  
the Neo4j CQL query that generated the friend recommendations and the actual  
recommendations. Note that the recommendations alone will not receive any marks, and the  
Neo4j CQL queries must be provided in text format (screenshots will be awarded zero marks).  
Also, note that your queries will be tested. Hence, queries that do not produce the submitted results  
will invalidate the answers and zero marks will be awarded.  
4.3. Algorithm comparison  
Considering only those 65 Facebook users with an ID that is a multiple of 980, identify the  
Facebook users who have the same first 10 friend recommendations under both recommendation  
algorithms, and the Facebook users who have different first 10 friend recommendations under the  
two algorithms. Your answers should appear in the format depicted in Table 2 (see the following  
page. Please, note that the answers listed in Table 2 are shown only to illustrate the format in  
which you should submit the solutions; yet, they may not coincide with the actual solutions to the  
problem in question.  
The code used for comparing the output of both algorithms has to be implemented using C# or  
Python (no other choices will be allowed), and it has to be included as part of your submission. If  
you do it manually, without using Python or C#, your solution will not be accepted and zero marks  
will be awarded.  
4.4. Report: Which algorithm and database model do you propose to use?  
Produce a report (1,000 words maximum) to explain which algorithm you propose to recommend  
new friends in Facebook, and whether you will implement it using a relational or non-relational  
database. Your report must be supported by evidence derived from the output you obtained in  
Section 4.1, Section 4.2 and Section 4.3. For example, if you propose to use influence scoring,  
you will have to use the results of the previous sections to explain why this is the best option.  
IDs of users having the same output under both algorithms:  
1960  
4900  
5880  
…  
IDs of users having different output under both algorithms:  
980  
2940  
3920  
…  
Table 2  
Before reaching a conclusion regarding which algorithm you wish to propose, you may want to  
compare both algorithms using more than the 65 Facebook users whose ID is a multiple of 980,  
but you will have to refer to the additional results as evidence to support your decision. It may  
also be the case that you propose a combination of the two algorithms explored above. While this  
is a suitable alternative, you have to be very clear in terms of explaining how exactly you attempt  
to combine the algorithms (which one you will apply first, how much weight is assigned to each  
algorithm, what happens if the algorithms agree or disagree, etc.).  
Finally, you may want to combine the two algorithms explored with other ideas of your own (or  
ideas you have researched separately). You will still have to be very clear in terms of how your  
proposed algorithm will work and what evidence you present to support this.   
5. DELIVERABLES  
Submit the following two files via the DLE.  
(1) A WORD document containing (documents that are not in WORD format will not be  
reviewed and zero marks will be awarded):  
(a) The solutions for Section 4.1 in the format illustrated by Table 1.  
(b) The solutions for Section 4.2. Once again, the solutions should be submitted  
following the format illustrated by Table 1.  
(c) The solutions for Section 4.3. The solutions should be submitted following the  
format of Table 2.  
(d) A report explaining which algorithm you propose to use to make friend  
recommendations and whether you will implement it using a relational or non- 
relational database.  
(2) The C# or Python code employed to obtain the answers for Section 4.3.  
6. ASSESSMENT AND GRADE CRITERIA  
Task Marks  
1 Recommend by number of common friends  
Explanation: See Section 4.1.  
Note: This is a core module for the MSc Data Science and Business  
Analytics; thus, if the submitted Neo4j CQL queries do not execute,  
marks will not be granted.  
Learning Outcomes: ALO2, ALO3.  
25  
2 Recommend by influence scoring  
Explanation: See Section 4.2.  
Note: This is a core module for the MSc Data Science and Business  
Analytics; thus, if the submitted Neo4j CQL queries do not execute,  
marks will not be granted.  
Learning Outcomes: ALO2, ALO3.  
25  
3 Algorithm comparison (answers) and the source code (C# or  
Python) employed to compare the results of both algorithms.  
Explanation: See Section 4.3.  
Note: The code you submit is expected to execute correctly. If the  
submitted code does not compile and execute, the mark for this part  
of the assignment will be zero (0).  
Learning Outcomes: ALO2, ALO3.  
25  
5 Report: Which algorithm and database model do you propose to  
use?  
Explanation: See Section 4.4.  
Learning Outcomes: ALO1, ALO2.  
25  
7. EXTENUATING CIRCUMSTANCES AND PLAGIARISM  
Please note that the University enforces a penalty of zero percent for work submitted after the  
published deadline without valid extenuating circumstances (ECs). The University’s policy on  
ECs and the EC application form are available at:  
https://www.plymouth.ac.uk/student-life/your- 
studies/essential-information/exams/exam-rules-and- 
regulations/extenuating-circumstances  
Also, note that this is an individual assignment and must reflect your own work. Thus, while you  
may discuss, in general terms, the assignment with your fellow students, the assignment MUST  
be your own work. Do not share designs or code with anyone, OR submit a program design or  
code that is, wholly or partially, someone else’s work.  
Marks will be awarded according to the following criteria  
Mark Grade Criteria  
Unprofessional  
(0-39%)  
The quality of the work has not met the learning outcomes. Understanding  
and application of fundamental concepts and techniques is questionable.  
Work of this quality would not be acceptable in professional employment.  
Poor  
(40-49%)  
The quality of work has only met the threshold level but still requires further  
work to get it to a better standard. Your submission contains logical and  
analytical errors related to analysis and design techniques. Also, it only  
demonstrates a basic understanding of the subject competence. Further  
improvement is required to demonstrate personal thoroughness, effort and  
independent learning.  
Fair  
(50-59%)  
The quality of work submitted suggests that you have demonstrated a fair  
understanding of the analysis and design techniques. Still, the work you have  
submitted contains some errors and incomplete analysis and design. Also, it  
demonstrates you are able to apply your knowledge but need to improve  
understanding of the subject competence and personal thoroughness, effort  
and independent learning.  
Good  
(60-69%)  
The quality of the work submitted suggests that you are able to apply the  
analysis and design techniques well. The work you have submitted is  
substantially correct and complete. Also, it demonstrates a good  
understanding of subject competence and personal thoroughness, effort and  
independent learning.  
Excellent  
(70-100%)  
The quality of work is outstanding with no significant flaws. It demonstrates  
a high level of subject knowledge and competence; personal thoroughness,  
effort and independent learning; and possibly significant additional  
analytical/critical thought. Well done!