The University of Sydney Page 1
QBUS6860
Visual Data Analytics
Weekly Assignment 6
Dr Demetris Christodoulou
Discipline of Accounting
MEAFA Research Group
http://sydney.edu.au/business/research/meafa
The University of Sydney Page 2
Weekly Assignment 6
o The dataset sailor_performance.xlsx provided on Canvas >
Datasets used in lectures and assignments, contains data on the
performance of young sailors from a sailing club. This is a real
dataset, but for confidentiality reasons we protect the identity of
the club and the sailors. All sailor names are therefore fictional.
o The dataset holds observations for a specific sailing class where all
young men compete against other men, and all young women
compete against each other women. That is, there is differentiation
by gender.
o Sailing performance is measured with variable rank, and it is the
target variable that the sailing club is interested in understanding
how it is determined. We need to help the sailing club discover any
potential determinants. The next slide gives more information about
how rank is specified.
The University of Sydney Page 3
Weekly Assignment 6
o Performance is measured using variable rank. This is the standing
rank of a sailor at the end of every given month. This rank indicates
the national rank for this class of young sailors across clubs.
o It holds that the lower the rank the better the performance, i.e. the
no.1 sailor is the best sailor in this class. So, rank is a relative form
of performance across all completing sailors.
o The rank can only be determined at the completion of a race and it
remains unchanged between races. Note that a race usually lasts
many days.
o Females compete in female-only races and males compete in male-
only races. This means that two sailors of the same gender cannot
share the same rank in some given month/date.
The University of Sydney Page 4
Weekly Assignment 6
o In addition to rank we are given the following information that
could help us with the investigation of what drives performance
– name: the name and information about the sailor
– year: the calendar year when observations were made.
– month: the month of the year when observations were made.
– date_joined: the date that the sailor joined the club.
– gender: the gender of the sailor.
– training_days: the number of training days that the sailor had
for each month.
– race_days: the number of competitive race days for each
month.
The University of Sydney Page 5
Weekly Assignment 6
o The graph objective is concerned with the analysis of
“Drivers of sailor performance". The sailing club wants to know what
drives individual performance in terms of sailor rank, given the
data provided.
o We do not know in advance the answer to this question and we
suspect that any of the provided data may be used to help
explain performance.
o You are required to adopt an exploratory type of approach in
order to discover what drives sailor performance. You are
required to produce suitable EDA analysis and appropriate data
graphs that would be presented to the sailing club to help
understand what drives sailor performance.
The University of Sydney Page 6
Weekly Assignment 6
o There is no need to do extra research on this topic, and you do not
need to describe the data generating process. The data has been
recorded by hand by the sailing coach at the end of each month
using this spreadsheet that is provided.
o However, it is important that you validate the data properties.
o You are required to analyse the graph objective with Tableau
using 2 data graphs.
o These graphs do not need to be interactive but do use any form of
interactivity if you it helps you encode the data. That is, I will
accept both interactive and non-interactive graphs.
The University of Sydney Page 7
Weekly Assignment 6
o You will be evaluated on your success to work with this dataset,
and your ability to apply basic EDA methods to learn important
insights about this data
Hint: not every EDA tool is useful with this data. It is important that
you first spend sufficient time to understand the data properties
before you start analysing the data using EDA.
o You will be evaluated on the quality of data report and data
management, the correct choice of visual implantations and retinal
variables, the appropriate application of graph identification and
graph enhancement tools, and the decoding discussion
o You are required to submit on Canvas a Word or PDF report, the
Tableau .twbx file and the Excel file of the managed data