STATS 2DA3 Fall 2024
ASSIGNMENT 2
1. (6 MARKS)
(a) Describe k-means clustering and k-medoids clustering.
(b) Would you expect k-means clustering to work well on the 3 clusters depicted below?
Explain your answer.
2. (2 MARKS) Consider the silhouette plots shown below:
(a) What is the best number of clusters to choose?
(b) What is the worst number of clusters to choose?
3. (10 MARKS) Using the zagat dataset from the smss package, complete the following tasks: (Note: you may need to load additional libraries to answer the questions.) This dataset contains information on 193 ratings of Italian restaurants in Boston, London, and New York. First read about the dataset (using ?? in the R console) then take a look at the data using the str and head functions.
(a) Create a box and whisker plot for “City” against “Cost” (“City” should be on the x-axis), and a bar chart displaying “City” (i.e. it should show the number of restaurantsrated per city). Give each graph a title.
(b) Display the 2 graphs in one image using R code (i.e. do not just screen grab the 2 images and combine them).
(c) Create a parallel co-ordinates plot using the predictor variables “Food”, “Decor”, “Service”, and “Cost”. The plot should be colour coded by “City”, and scale your graph using “uniminmax”.
4. (2 MARKS) Consider the Parallel Co-ordinates plot below; it displays 3 different types of response (which are colour coded).
(a) In your opinion, what is the best predictor variable for separating out the responses?