MTH6101 Introduction to Machine Learning
Coursework four: submit electronically at the latest 09.00
(GMT) hours on Thursday 30th April 2020 .
Read carefully the following instructions:
• Coursework is to be submitted individually, in the equivalent of a single A4 page
which could be written on both sides so the maximum is two sides of A4. Submission
will be by email, details of how to submit to be clarified closer to the submission date.
• Even if it is an electronic document, write clearly your name and student number
on the top of the front side of your coursework.
• You are asked to submit answers depending on the LAST digit of your student
number. Make sure you submit the answer the correct question, as submitting an
answer to a question not allocated to you will lead to zero marks. If in doubt, ask
the Module organizer which is your question.
• You will perform some computations numerically using R and your submission could
be written using markdown, typed with a word processor or even done by hand.
• Strictly speaking, all the activities described can be done without running R com-
mands, after all I have already trained the models. However, I give all the
necessary data in case you want to run analyses, but R is not strictly needed.
• You are expected to only include relevant material to the question, and anything
you put must be there for a reason. You are not to include raw R output.
• This and each coursework contributes to the Module mark, so polish what you
submit. Plagiarism will be punished.
Description of activities
1. Classification
The following are 0/1 classification data in variable y and associated variables
x1 and x2. The data are given below in training and testing/validation splits.
With the training data, three models were fitted:
M1 logistic classifier, for which the coefficients of the linear predictor are given;
M2 classification tree, for which predicted classes are given; and
M3 K-nearest neighbors, for which also predicted classes are given.
For model M1, compute predicted probabilities at each observation in the val-
idation data. By thresholding, compute confusion matrices and performance
1
figures FPR and TPR. Summarize your results of M1 with a plot of the ROC
curve of M1. Mark in the ROC curve the point for threshold equal to 1/2.
For each of models M2 and M3, compute performance figures FPR and TPR
and add the corresponding points into the ROC curve for model M1.
2. Regression
The following data are measurements obtained in a series of experiments of a
reaction rate y in terms of variables x1, x2 et al. Lasso was fitted to these
data, and a table of coefficients given.
Complete the given table of coefficients by determining at each breakpoint the
proportion of shrinkage ||βˆL(λ)||1/maxλ ||βˆL(λ)||1. With this information,
produce a plot with the lasso path (coefficients vs shrinkage proportion). In
your plot, clearly label the path for each variable; also your plot needs to
carefully consider a suitable scale for the vertical axis.
3. Report
For the classification data, report the fitted “predicted” probabilities for model
M1. For all models, report performance figures FPR and TPR for each model.
Include the ROC curve with the added points (FPR,TPR) and briefly com-
ment and compare classifiers.
For the regression problem, report the augmented table of coefficients and the
plot of the lasso path.
Grading: Classification 50%, Regression 35%, Presentation 15%; Total 100%.
Data sets
ID ending in 0. Classification data, training and validation results.
Training data
x1 x2 y
-3 -1 0
-2.2941 -1 0
-1.5882 -1 0
-0.8824 -1 1
-0.1765 -1 1
0.5294 1 1
1.2353 1 0
1.9412 1 1
2.6471 1 1
Test/validation data Predictions
x1 x2 y M2 M3
-2.6471 -1 0 1 0
-1.9412 -1 1 1 0
-1.2353 -1 0 1 0
-0.5294 -1 0 1 1
0.1765 1 1 1 1
0.8824 1 1 1 1
1.5882 1 1 1 1
2.2941 1 1 1 1
3 1 1 1 1
Training results from model M1: βˆ0 = 0.4945, βˆ1 = 2.3612, βˆ2 = −2.3958.
2
ID ending in 0. Regression data and training results.
Training data
x1 x2 x3 x4 x5 y
0 1 0 1 1 0
1 0 0 1 0 0
0 0 1 1 0 1
0 1 1 0 1 -1
0 -1 0 1 -1 1
-1 -1 -1 -1 1 0
0 -1 0 1 -1 -1
0 1 1 1 0 1
0 -1 1 0 -1 1
1 0 1 1 -1 -1
1 -1 1 0 -1 1
1 1 -1 0 -1 -1
0 1 -1 -1 -1 0
-3 0 -3 -5 4 -1
Lasso results
lambda x1 x2 x3 x4 x5
6 0 0 0 0 0
3.1351 0 0 0.1892 -0.027 0
2.7514 0 0 0.2144 -0.038 -0.0114
0.8008 0 -0.1816 0.314 -0.0467 -0.0231
0.4059 -0.2529 -0.1579 0.3595 0 -0.1343
0.3358 -0.4851 -0.1306 0.4366 0 -0.2575
0 -0.7002 -0.1104 0.4753 0.0397 -0.3521
ID ending in 1. Classification data, training and validation results.
Training data
x1 x2 y
-2 -1 0
-1.5294 -1 0
-1.0588 -1 1
-0.5882 -1 1
-0.1176 -1 0
0.3529 1 1
0.8235 1 0
1.2941 1 1
1.7647 1 1
Test/validation data Predictions
x1 x2 y M2 M3
-1.7647 -1 1 1 0
-1.2941 -1 1 1 1
-0.8235 -1 0 1 1
-0.3529 -1 1 1 1
0.1176 1 1 1 1
0.5882 1 1 1 1
1.0588 1 0 1 1
1.5294 1 1 1 1
2 1 1 1 1
Training results from model M1: βˆ0 = 0.3592, βˆ1 = 1.0108, βˆ2 = −0.261.
3
ID ending in 1. Regression data and training results.
Training data
x1 x2 x3 x4 x5 y
-1 -1 1 -1 1 -1
1 0 1 -1 0 1
0 0 -1 1 -1 -1
0 0 1 1 0 0
0 1 -1 1 0 0
1 0 1 -1 0 1
0 -1 -1 -1 1 -1
1 0 -1 1 1 -1
1 0 -1 -1 0 -1
1 -1 1 1 1 1
1 -1 0 0 0 -1
0 0 0 0 0 1
-5 3 0 0 -3 2
Lasso results
lambda x1 x2 x3 x4 x5
9 0 0 0 0 0
7 -0.0625 0 0 0 0
6.1429 -0.0536 0.0714 0 0 0
4.4286 0 0.2857 0.2143 0 0
0.7561 0 0.6098 0.6463 0 0
0.1111 0.1181 0.8056 0.75 0 0
0.0588 0.1287 0.8235 0.7574 -0.0074 0
0 0.1362 0.8632 0.7665 -0.0165 0.0283
ID ending in 2. Classification data, training and validation results.
Training data
x1 x2 y
-8 -1 0
-6.3158 -1 0
-4.6316 -1 1
-2.9474 -1 0
-1.2632 -1 1
0.4211 1 1
2.1053 1 0
3.7895 1 1
5.4737 1 1
7.1579 1 1
Test/validation data Predictions
x1 x2 y M2 M3
-7.1579 -1 0 0 0
-5.4737 -1 0 0 0
-3.7895 -1 0 0 1
-2.1053 -1 1 0 1
-0.4211 -1 1 0 1
1.2632 1 1 1 1
2.9474 1 1 1 1
4.6316 1 1 1 1
6.3158 1 1 1 1
8 1 1 1 1
4
Training results from model M1: βˆ0 = 0.8802, βˆ1 = 0.5544, βˆ2 = −1.1189.
ID ending in 2. Regression data and training results.
Training data
x1 x2 x3 x4 y
-1 -1 1 0 -1
-1 0 -1 1 1
0 -1 -1 1 0
0 -1 -1 -1 -1
0 -1 1 -1 -1
-1 0 1 1 -1
-1 -1 -1 1 0
1 -1 1 -1 -1
0 0 0 -1 1
0 0 1 0 1
3 6 -1 0 2
Lasso results
lambda x1 x2 x3 x4
16 0 0 0 0
3.1667 0 0.3056 0 0
1.0882 0 0.3272 -0.1949 0
0.5879 0 0.3335 -0.2261 0.05
0.1044 -0.1648 0.4138 -0.2742 0
0.0169 -0.1874 0.4252 -0.2806 0
0 -0.2004 0.4312 -0.2846 -0.0108
ID ending in 3. Classification data, training and validation results.
Training data
x1 x2 y
-2 -1 0
-1.5789 -1 0
-1.1579 -1 1
-0.7368 -1 1
-0.3158 -1 1
0.1053 1 1
0.5263 1 1
0.9474 1 0
1.3684 1 1
1.7895 1 1
Test/validation data Predictions
x1 x2 y M2 M3
-1.7895 -1 0 1 0
-1.3684 -1 1 1 1
-0.9474 -1 0 1 1
-0.5263 -1 0 1 1
-0.1053 -1 1 1 1
0.3158 1 0 1 1
0.7368 1 1 1 1
1.1579 1 1 1 1
1.5789 1 0 1 1
2 1 1 1 1
Training results from model M1: βˆ0 = 1.4488, βˆ1 = 2.2175, βˆ2 = −1.6875.
5
ID ending in 3. Regression data and training results.
Training data
x1 x2 x3 x4 x5 y
-1 1 0 1 -1 1
-1 0 1 0 -1 1
1 0 -1 0 0 -1
1 1 1 -1 0 -1
-1 1 -1 -1 -1 -1
-1 0 0 0 0 1
1 0 0 -1 1 1
-1 0 -1 -1 1 1
0 1 1 1 1 0
1 -1 1 0 -1 -1
1 0 0 -1 0 -1
1 -1 0 1 0 -1
0 -1 -1 0 1 0
-1 -1 0 2 0 1
Lasso results
lambda x1 x2 x3 x4 x5
8 0 0 0 0 0
2.8571 -0.4286 0 0 0 0
1.5429 -0.5714 0 0 0 0.2
1.36 -0.6 0 0.04 0 0.24
1.0587 -0.6443 0 0.1029 0.009 0.3056
0.8609 -0.6854 -0.0397 0.1556 0 0.3477
0.1379 -0.8224 -0.1676 0.3346 0 0.5011
0 -0.8647 -0.2127 0.3853 -0.0393 0.531
6
ID ending in 4. Classification data, training and validation results.
Training data
x1 x2 y
-7 -1 0
-5.5263 -1 0
-4.0526 -1 0
-2.5789 -1 0
-1.1053 -1 0
0.3684 1 1
1.8421 1 1
3.3158 1 1
4.7895 1 0
6.2632 1 1
Test/validation data Predictions
x1 x2 y M2 M3
-6.2632 -1 0 0 0
-4.7895 -1 0 0 0
-3.3158 -1 0 0 0
-1.8421 -1 0 0 0
-0.3684 -1 0 1 0
1.1053 1 0 1 1
2.5789 1 1 1 1
4.0526 1 1 1 1
5.5263 1 1 1 1
7 1 1 1 1
Training results from model M1: βˆ0 = −9.8016, βˆ1 = −0.4954, βˆ2 = 13.1315.
7
ID ending in 4. Regression data and training results.
Training data
x1 x2 x3 x4 x5 y
1 1 0 -1 1 0
0 1 -1 0 0 -1
0 0 1 -1 0 1
1 0 0 1 0 -1
1 1 1 -1 1 0
1 1 -1 1 1 0
-1 0 0 0 1 1
1 -1 1 0 1 1
1 0 0 1 1 -1
1 0 0 0 1 0
0 1 -1 1 1 -1
1 0 1 -1 1 -1
0 0 -1 0 1 0
-7 -4 0 0 -10 2
Lasso results
lambda x1 x2 x3 x4 x5
21 0 0 0 0 0
8.0588 0 0 0 0 -0.1176
4.486 -0.2011 0 0 0 -0.0112
4.3273 -0.2091 -0.0182 0 0 0
3.0145 -0.1812 -0.1159 0 0 0
2.9696 -0.1824 -0.1155 0.0061 0 0
0.992 -0.1738 -0.2043 0.1 -0.1885 0
0 -0.5022 -0.399 0.1913 -0.2555 0.312
ID ending in 5. Classification data, training and validation results.
Training data
x1 x2 y
-3 -1 0
-2.2 -1 0
-1.4 -1 0
-0.6 -1 1
0.2 1 0
1 1 1
1.8 1 1
2.6 1 1
Test/validation data Predictions
x1 x2 y M2 M3
-2.6 -1 0 1 0
-1.8 -1 0 0 0
-1 -1 0 1 0
-0.2 -1 1 1 0
0.6 1 1 0 1
1.4 1 1 0 1
2.2 1 1 0 1
3 1 1 0 1
Training results from model M1: βˆ0 = 12.1138, βˆ1 = 60.5692, βˆ2 = −48.6892.
8
ID ending in 5. Regression data and training results.
Training data
x1 x2 x3 x4 x5 y
0 0 1 1 -1 0
0 0 1 0 0 -1
0 0 0 -1 1 1
1 0 -1 -1 0 -1
0 0 1 -1 0 1
0 1 -1 -1 1 1
1 0 0 -1 1 1
1 1 -1 1 -1 -1
1 0 0 1 -1 0
1 0 0 -1 0 -1
-1 0 1 -1 1 1
-4 -2 -1 4 -1 -1
Lasso results
lambda x1 x2 x3 x4 x5
8 0 0 0 0 0
4.5333 0 0 0 -0.1333 0
2.3929 0 0 0 -0.0595 0.369
1.5 0 0 0.119 0 0.5476
0.7347 0 0 0.2041 0 0.6327
0.6303 0 0.0148 0.2162 0 0.6405
0.323 -0.0952 0.206 0.2654 0 0.6487
0 -0.0218 0.4304 0.4018 0.3012 1.0117
ID ending in 6. Classification data, training and validation results.
Training data
x1 x2 y
-4 -1 0
-3.1579 -1 0
-2.3158 -1 0
-1.4737 -1 0
-0.6316 -1 1
0.2105 1 0
1.0526 1 1
1.8947 1 1
2.7368 1 1
3.5789 1 1
Test/validation data Predictions
x1 x2 y M2 M3
-3.5789 -1 0 0 0
-2.7368 -1 0 0 0
-1.8947 -1 0 0 0
-1.0526 -1 0 0 0
-0.2105 -1 1 1 0
0.6316 1 0 1 1
1.4737 1 1 1 1
2.3158 1 1 1 1
3.1579 1 1 1 1
4 1 1 1 1
Training results from model M1: βˆ0 = 11.7791, βˆ1 = 55.9443, βˆ2 = −47.3233.
9
ID ending in 6. Regression data and training results.
Training data
x1 x2 x3 x4 x5 y
0 -1 0 0 0 -1
-1 1 -1 -1 1 -1
1 0 -1 1 -1 0
1 1 1 0 1 0
-1 0 1 1 1 1
-1 1 -1 -1 -1 1
-1 0 0 0 -1 -1
0 0 1 1 -1 0
0 1 -1 0 -1 1
1 -1 1 1 -1 1
1 0 1 0 0 -1
0 1 -1 -1 -1 0
1 1 1 -1 1 1
-1 -4 -1 0 3 -1
Lasso results
lambda x1 x2 x3 x4 x5
6 0 0 0 0 0
2.3077 0 0.1538 0 0 0
1.8512 0 0.1653 0 0 -0.0165
1.7632 0 0.1668 0.0075 0 -0.0205
0.7382 0 0.2492 0.0279 0.172 -0.0102
0.1204 0.03 0.2972 0.028 0.2742 0
0.0094 0.035 0.3047 0.0289 0.291 0
0 0.0358 0.3065 0.028 0.2941 0.002
ID ending in 7. Classification data, training and validation results.
Training data
x1 x2 y
-7 -1 0
-5.3529 -1 1
-3.7059 -1 0
-2.0588 -1 1
-0.4118 -1 1
1.2353 1 0
2.8824 1 1
4.5294 1 1
6.1765 1 1
Test/validation data Predictions
x1 x2 y M2 M3
-6.1765 -1 0 1 0
-4.5294 -1 0 1 1
-2.8824 -1 0 1 1
-1.2353 -1 1 1 1
0.4118 1 0 1 1
2.0588 1 1 1 1
3.7059 1 1 1 1
5.3529 1 1 1 1
7 1 1 1 1
10
Training results from model M1: βˆ0 = 1.3543, βˆ1 = 1.012, βˆ2 = −3.2558.
ID ending in 7. Regression data and training results.
Training data
x1 x2 x3 x4 x5 y
0 0 0 1 -1 1
-1 1 0 0 -1 1
-1 0 0 0 0 0
1 -1 0 1 1 1
1 1 -1 -1 0 0
1 0 0 -1 0 0
0 1 0 -1 1 1
0 -1 1 -1 -1 -1
0 -1 0 1 0 1
1 -1 1 -1 0 0
1 1 -1 0 1 0
-1 1 1 -1 1 0
-2 -1 -1 3 -1 -4
Lasso results
lambda x1 x2 x3 x4 x5
9 0 0 0 0 0
7.3636 0 0 0 -0.0909 0
3.8175 0.2336 0 0 -0.1971 0
2.4414 0.3625 0.1523 0 -0.1726 0
2.0472 0.4961 0.3504 0.2756 0 0
1.7 0.525 0.4 0.35 0 0
0.8735 0.5824 0.507 0.5234 0 0.0345
0.5414 0.8233 0.8496 0.9812 0.3459 0
0.1206 1.108 1.2613 1.5452 0.7739 0
0 1.216 1.4081 1.7273 0.9095 -0.0562
11
ID ending in 8. Classification data, training and validation results.
Training data
x1 x2 y
-9 -1 0
-6.8824 -1 0
-4.7647 -1 0
-2.6471 -1 1
-0.5294 -1 0
1.5882 1 1
3.7059 1 1
5.8235 1 0
7.9412 1 1
Test/validation data Predictions
x1 x2 y M2 M3
-7.9412 -1 0 0 0
-5.8235 -1 0 0 0
-3.7059 -1 0 0 0
-1.5882 -1 1 0 0
0.5294 1 1 0 1
2.6471 1 1 0 1
4.7647 1 1 0 1
6.8824 1 1 0 1
9 1 1 0 1
Training results from model M1: βˆ0 = −0.1495, βˆ1 = 0.0941, βˆ2 = 0.8121.
ID ending in 8. Regression data and training results.
Training data
x1 x2 x3 x4 x5 y
-1 1 -1 1 -1 -1
0 1 0 0 1 1
0 0 0 -1 1 0
0 0 -1 1 -1 0
0 1 1 1 0 1
1 0 0 -1 0 0
-1 0 0 1 0 1
0 0 0 0 1 1
0 0 1 1 0 -1
1 0 -1 -1 -1 -1
1 0 0 0 -1 -1
-1 -3 1 -2 1 0
Lasso results
lambda x1 x2 x3 x4 x5
5 0 0 0 0 0
2.3333 0 0 0 0 0.3333
1.5 0 0 0 0.125 0.5
0.4314 0 0.0643 0 0.2435 0.7169
0.1329 0 0.0364 -0.1416 0.3205 0.8532
0.0741 0.0503 0 -0.1905 0.373 0.9101
0.0356 0.0656 0 -0.2079 0.3851 0.9335
0 0.1136 -0.0459 -0.252 0.4391 0.9817
12
ID ending in 9. Classification data, training and validation results.
Training data
x1 x2 y
-3 -1 0
-2.3684 -1 0
-1.7368 -1 0
-1.1053 -1 0
-0.4737 -1 0
0.1579 1 1
0.7895 1 1
1.4211 1 0
2.0526 1 1
2.6842 1 1
Test/validation data Predictions
x1 x2 y M2 M3
-2.6842 -1 0 0 0
-2.0526 -1 0 0 0
-1.4211 -1 0 0 0
-0.7895 -1 0 0 0
-0.1579 -1 1 1 0
0.4737 1 1 1 1
1.1053 1 1 1 1
1.7368 1 1 1 1
2.3684 1 1 1 1
3 1 1 1 1
Training results from model M1: βˆ0 = −9.5899, βˆ1 = 10−4, βˆ2 = 10.9763.
13
ID ending in 9. Regression data and training results.
Training data
x1 x2 x3 x4 x5 y
1 1 -1 -1 1 0
0 -1 1 0 1 1
-1 -1 0 0 0 0
0 1 1 0 -1 1
1 -1 0 -1 -1 -1
1 1 1 1 0 0
0 0 0 1 1 1
-1 -1 -1 1 -1 1
0 -1 0 1 -1 1
-1 -1 1 -1 -1 0
-1 1 -1 1 1 -1
-1 0 1 0 -1 1
-1 -1 1 0 1 -1
3 3 -3 -2 1 -3
Lasso results
lambda x1 x2 x3 x4 x5
11 0 0 0 0 0
8.75 0 0 0.125 0 0
8.05 -0.025 0 0.15 0 0
7.9269 -0.0261 0 0.154 0.0078 0
3.849 -0.0097 -0.0885 0.2608 0.2772 0
1.5957 0 -0.1011 0.2952 0.4388 -0.1569
0.0242 0 -0.1067 0.3181 0.5488 -0.2663
0 0.0069 -0.11 0.3196 0.5533 -0.268