首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
ECMM422编程语言讲解、Python编程设计调试、辅导Python程序 辅导Python程序|解析Haskell程序
项目预算:
开发周期:
发布时间:
要求地区:
ECMM422 Machine Learning
Course Assessment 1
This course assessment (CA1) represents 40% of the overall module assessment.
This is an individual exercise and your attention is drawn to the College and University
guidelines on collaboration and plagiarism, which are available from the College website.
Note:
.
do not change the name of this notebook, i.e. the notebook file has to be named: ca1.ipynb
.
do not remove/delete any cell
.
do not add any cell (you can work on a draft notebook and only copy the function
implementations here)
.
do not add you name or student code in the notebook or in the file name
Evaluation criteria:
Each question asks for one or more functions to be implemented.
Each question is awarded a number of marks.
A (hidden) unit test is going to evaluate if all desired properties of the required function(s) are
met.
If the test passes all the associated marks are awarded, if it fails 0 marks are awarded. The large
number of questions allows a fine grading.
Notes:
In the rest of the notebook, the term data matrix refers to a two dimensional numpy array
where instances are encoded as rows, e.g. a data matrix with 100 rows and 4 columns is to be
interpreted as a collection of 100 instances each with four features.
When a required function can be implemented directly by a library function it is intended that
the candidate should write her own implementation of the function, e.g. a function to compute
the accuracy or the cross validation.
Some questions are just a check-point, i.e. it is for you to see that you are correctly
implementing all functions. Since those check-points use functions that you have already
implemented and that have already been marked, those questions are not going to be marked
(i.e. they appear as having marks 0).
In [ ]: %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 2/16
Question 1 [marks 6]
a) Make a function data_matrix = make_data_classification(mean, std,
n_centres, inner_std, n_samples, random_seed=42) to create a data matrix
according to the following rules:
.
mean is a n-dimensional vector (say [1,1], but the function should allow vectors of any
dimension)
.
n_centres is the number of centres (say 3)
.
std is the standard deviation (say 1)
.
the centres are sampled from a Normal distribution with mean mean and standard
deviation std
.
from each centre sample n_samples from a Normal distribution with the centre as the
mean and standard deviation inner_std so if mean=[1,1] n_centres=3 and
n_samples=10 then the data matrix will be a 30 rows x 2 columns numpy array.
b) Make a function data_matrix, targets = make_data_regression(mean, std,
n_centres, inner_std, n_samples_list, random_seed=42) to create a data matrix
and a target vector according to the following rules:
.
the data matrix is constructed in the same way as in make_data_classification
.
the targets are the Euclidean distance between the sample and the centre of the generating
Normal distribution
See Question 3 for a graphical example of the expected output.
Question 2 [marks 2]
import scipy as sp
# unit test utilities: you can ignore these function
def is_approximately_equal(test,target,eps=1e-2):
return np.mean(np.fabs(np.array(test) - np.array(target)))
def assert_test_equality(test, target):
assert is_approximately_equal(test, target), 'Expected:\n %s \nbut got:\n %s
In [ ]:
def make_data_classification(mean, std, n_centres, inner_std, n_samples, random_
# YOUR CODE HERE
raise NotImplementedError()
def make_data_regression(mean, std, n_centres, inner_std, n_samples, random_seed
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 3/16
a) Make a function data_matrix, targets =
get_dataset_classification(n_samples, std, inner_std) to create a data matrix
and a target vector for a binary classification problem according to the following rules:
the instances from the positive class are generated according to the same rules provided
for make_data_classification ; so are the instances from the negative class
instances from the positive class have as mean the vector [10,10] and those from the
negative class, vector [-10,-10]
the number of centres is fixed to 3
the random seed is fixed to 42
n_samples indicates the total number of instances finally available in the output
data_matrix
b) Make a function data_matrix, targets = get_dataset_regression(n_samples,
std, inner_std) to create a data matrix according to the following rules:
the instances are generated according to the same rules provided for
make_data_regression
the targets are generated according to the same rules provided for
make_data_regression
instances have as mean the vector [10,10]
the number of centres is fixed to 3
the random seed is fixed to 42
n_samples indicates the total number of instances finally available in the output
data_matrix
Question 3 [marks 1]
Make a function plot(X,y) to display the scatter plot of a data matrix of two dimensional
instances using the array y to assign the colour to the instances.
When running
X, y = get_dataset_regression(n_samples=600, std=30, inner_std=5)
plot(X,y)
you should get something like
In [ ]:
def get_dataset_classification(n_samples, std, inner_std):
# YOUR CODE HERE
raise NotImplementedError()
def get_dataset_regression(n_samples, std, inner_std):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 4/16
and when running
X, y = get_dataset_classification(n_samples=600, std=30, inner_std=5)
plot(X,y)
you should get something like
Question 4 [marks 1]
Make a function classification_error(targets, preds) to compute the fraction of
times that the entries in targets do not agree with the corresponding entries in preds .
Note: do not use library functions to compute the result directly but implement your own
version.
Question 5 [marks 2]
Make a function regression_error(targets, preds) to compute the mean squared error
between targets and preds .
Note: do not use library functions to compute the result directly but implement your own
version.
Question 6 [marks 7]
Make a function make_bootstrap(data_matrix, targets) to extract a bootstrapped
replicate of an input dataset.
In [ ]:
def plot(X,y):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
def classification_error(targets, preds):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
MSE =
n∑
i=1
(Ti − Pi)
2
.
1
n
In [ ]:
def regression_error(targets, preds):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 5/16
The function should return the following 6 elements (in this order):
bootstrap_data_matrix, bootstrap_targets, bootstrap_sample_ids,
oob_data_matrix, oob_targets, oob_samples_ids , where:
.
bootstrap_data_matrix : is a data matrix encoding the bootstrapped replicate of the
data matrix
.
bootstrap_targets : is the corresponding bootstrapped replicate of the target vector
.
bootstrap_sample_ids : is an array containing the instance indices of the bootstrapped
replicate of the data matrix
.
oob_data_matrix : is a data matrix encoding the out of bag instances
.
oob_targets : is the corresponding out of bag instances of the target vector
.
oob_samples_ids : is an array containing the instance indices of the out of bag instances
Question 7 [marks 10]
Consider the following functional blueprints estimator = train(X_train, y_train,
param) and test(X_test, estimator) . A function of type train takes in input a data
matrix X_train a target vector y_train and a single value param (not a list of
parameters). A function of type train outputs an object that represent an estimator. A
function of type test takes in input a data matrix X_test the fit object estimator and
outputs the predicted targets.
Using this blueprint, write the specialised train and test functions for the following classifiers
and regressors (use the function signature provided in the next cell, e.g. train_ab for training
an adaboost classifier):
Classifiers:
a) k-nearest-neighbor: the parameter controls the number of neighbors (you may use
KNeighborsClassifier from scikit) [train_knn, test_knn]
b) adaboost: the parameter controls the maximal depth of the decision tree uses as weak
classifier (you may use the DecisionTreeClassifier from scikit but you should provide your
own implementation of the boosting algorithm) [train_ab, test_ab]
c) random forest: the parameter controls the maximal depth of the tree (you may use the
DecisionTreeClassifier from scikit but you should provide your own implementation of
the bagging algorithm) [train_rfc, test_rfc]
Regressors:
In [ ]:
def make_bootstrap(data_matrix, targets):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 6/16
d) decision tree: the parameter controls the maximal depth of the tree (you may use the
DecisionTreeRegressor from scikit) [train_dt, test_dt]
e) svm linear: the parameter controls the regularization constant C (you may use SVR from
scikit) [train_svm_1, test_svm]
f) svm with a polynomial kernel of degree 2: the parameter controls the regularization
constant C (you may use SVR from scikit) [train_svm_2, test_svm]
g) svm with a polynomial kernel of degree 3: the parameter controls the regularization
constant C (you may use SVR from scikit) [train_svm_3, test_svm]
h) random forest: the parameter controls the maximal depth of the tree (you may use the
DecisionTreeRegressor from scikit but you should provide your own implementation of
the bagging algorithm) [train_rf, test_rf]
For the algorithms adaboost and random forest , the size of the ensemble should be fixed
to 100.
In [ ]:
# classifiers
from sklearn.neighbors import KNeighborsClassifier
def train_knn(X_train, y_train, param):
# YOUR CODE HERE
raise NotImplementedError()
def test_knn(X_test, est):
# YOUR CODE HERE
raise NotImplementedError()
from sklearn.tree import DecisionTreeClassifier
def train_ab(X_train, y_train, param):
# YOUR CODE HERE
raise NotImplementedError()
def test_ab(X_test, models):
# YOUR CODE HERE
raise NotImplementedError()
from sklearn.tree import DecisionTreeClassifier
def train_rfc(X_train, y_train, param):
# YOUR CODE HERE
raise NotImplementedError()
def test_rfc(X_test, models):
# YOUR CODE HERE
raise NotImplementedError()
# regressors
from sklearn.tree import DecisionTreeRegressor
def train_dt(X_train, y_train, param):
# YOUR CODE HERE
raise NotImplementedError()
def test_dt(X_test, est):
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 7/16
Question 8 [marks 0]
This is just a check-point, i.e. it is for you to see that you are correctly implementing all
functions. Since this cell uses functions that you have already implemented and that have
already been marked, this Question is not going to be marked.
Make a dataset using
X, y = get_dataset_classification(n_samples=240, std=30, inner_std=10)
# YOUR CODE HERE
raise NotImplementedError()
from sklearn.svm import SVR
def train_svm_1(X_train, y_train, param):
# YOUR CODE HERE
raise NotImplementedError()
def train_svm_2(X_train, y_train, param):
# YOUR CODE HERE
raise NotImplementedError()
def train_svm_3(X_train, y_train, param):
# YOUR CODE HERE
raise NotImplementedError()
#Note: you do not need to specialise the svm test function for each degree
def test_svm(X_test, est):
# YOUR CODE HERE
raise NotImplementedError()
from sklearn.tree import DecisionTreeRegressor
def train_rf(X_train, y_train, param):
# YOUR CODE HERE
raise NotImplementedError()
def test_rf(X_test, models):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 8/16
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.3)
and check that the classification error for
k-nearest-neighbor
random forest classifier
adaboost
Question 9 [marks 0]
This is just a check-point, i.e. it is for you to see that you are correctly implementing all
functions. Since this cell uses functions that you have already implemented and that have
already been marked, this Question is not going to be marked.
Make a dataset using
X, y = get_dataset_regression(n_samples=120, std=30, inner_std=10)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.3)
and check that the regression error for these regressors
decision tree
svm with polynomial kernel of degree 2
svm with polynomial kernel of degree 3
is approximately comparable.
Question 10 [marks 10]
In [ ]:
# Just run the following code, do not modify it
X, y = get_dataset_classification(n_samples=240, std=30, inner_std=10)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.3)
param=3
e_knn = classification_error(y_test, test_knn(X_test, train_knn(X_train, y_train
e_rfc = classification_error(y_test, test_rfc(X_test, train_rfc(X_train, y_train
e_ab = classification_error(y_test, test_ab(X_test, train_ab(X_train, y_train, p
print(e_knn, e_rfc, e_ab)
In [ ]:
# Just run the following code, do not modify it
X, y = get_dataset_regression(n_samples=120, std=30, inner_std=10)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.3)
param=3
e_dt = regression_error(y_test, test_dt(X_test, train_dt(X_train, y_train, param
e_svm2 = regression_error(y_test, test_svm(X_test, train_svm_2(X_train, y_train,
e_svm3 = regression_error(y_test, test_svm(X_test, train_svm_3(X_train, y_train,
print(e_dt, e_svm2, e_svm3)
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 9/16
Make a function sizes, train_errors, test_errors =
compute_learning_curve(train_func, test_func, param, X, y, test_size,
n_steps, n_repetitions) to compute the train and test errors as mandated in the learning
curve approach.
The regressor will be trained via train_func on the problem data_matrix , targets with
parameter param . The estimate will be done averaging a number of replicates equal to
n_repetitions , i.e. the code needs to repeat the process n_repetitions times (say 10)
and average the error.
Note that a fraction of the data as indicated by test_size (say 0.33 for 30%) is going to be
reserved for testing purposes. The remaining amount of data can be used in the training phase.
The learning curve should be computed for an amount of training material that varies from a
minimum of 2 instances up to all the instances available for training.
You should use the function regression_error to compute the error.
Note: do not use library functions (e.g. learning_curve in scikit) to compute the result
directly but implement your own version.
Question 11 [marks 1]
Make a function plot_learning_curve(sizes, train_errors, test_errors) to
display the train and test error as a function of the size of the training set.
You should get something like:
Question 12 [marks 3]
Make a function estimate_asymptotic_error(sizes, train_errors, test_errors)
that returns an estimate of the asymptotic error, i.e. the error made in the limit of an infinitely
large training set.
In [ ]:
def compute_learning_curve(train_func, test_func, param, X, y, test_size, n_step
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
def plot_learning_curve(sizes, train_errors, test_errors):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 10/16
Question 13 [marks 0]
This is just a check-point, i.e. it is for you to see that you are correctly implementing all
functions. Since this cell uses functions that you have already implemented and that have
already been marked, this Question is not going to be marked.
When you run:
X, y = get_dataset_regression(n_samples=800, std=30, inner_std=10)
train_func, test_func = train_dt, test_dt
param=5
sizes, train_errors, test_errors = compute_learning_curve(train_func,
test_func, param, X, y, test_size=.3, n_steps=10, n_repetitions=100)
e = estimate_asymptotic_error(train_errors, test_errors)
print('Asymptotic error: %.1f'%e)
plot_learning_curve(sizes, train_errors, test_errors)
you should get something like
Question 14 [marks 6]
Make a function bias2, variance = compute_bias_variance(predictions_dict,
targets) that takes in input a dictionary of lists of predictions indexed by the instance index,
and the target vector. The function should compute the squared bias component of the error
and the variance components of the error for each instance.
As a toy example consider: predictions_dict={0:[1,1,1], 1:[1,-1], 2:
[-1,-1,-1,1]} and targets=[1,1,-1] , that is, for instance with index 0 there are 3
predictions available [1,1,1] , instead for instance with index 1 there are only 2 predictions
available [1,-1] , etc. In this case, you should get bias2=[0. , 1. , 0.25] and
variance=[0. , 1. , 0.75] .
In [ ]:
def estimate_asymptotic_error(sizes, train_errors, test_errors):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# Just run the following code, do not modify it
X, y = get_dataset_regression(n_samples=800, std=30, inner_std=10)
train_func, test_func = train_dt, test_dt
param=5
sizes, train_errors, test_errors = compute_learning_curve(train_func, test_func,
e = estimate_asymptotic_error(sizes, train_errors, test_errors)
print('Asymptotic error: %.1f'%e)
plot_learning_curve(sizes, train_errors, test_errors)
In [ ]: def compute_bias_variance(predictions_dict, targets):
# YOUR CODE HERE
raise NotImplementedError()
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 11/16
Question 15 [marks 10]
Make a function bias2, variance = bias_variance_decomposition(train_func,
test_func, param, data_matrix, targets, n_bootstraps) to compute the bias
variance decomposition of the error of a regressor on a given problem. The regressor will be
trained via train_func on the problem data_matrix , targets with parameter param .
The estimate will be done using a number of replicates equal to n_bootstraps .
Question 16 [marks 2]
Consider the following regression problem (it does not matter that the target is only 1 and -1):
from sklearn.datasets import load_iris
def make_iris_data():
X,y = load_iris(return_X_y=True)
X=X[:,[0,2]]
y[y==2]=0
y[y==0]=-1
return X,y
Estimate the squared bias and variance component for each instance.
Consider as regressor a linear svm and a polynomial svm with degree 3.
What is the class of the instances that have the highest bias error on average?
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
def bias_variance_decomposition(train_func, test_func, param, data_matrix, targe
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# Just run the following code, do not modify it
from sklearn.datasets import load_iris
def make_iris_data():
X,y = load_iris(return_X_y=True)
X=X[:,[0,2]]
y[y==2]=0
y[y==0]=-1
return X,y
X,y = make_iris_data()
bias2, variance = bias_variance_decomposition(train_svm_1, test_svm, param=2, da
print(np.mean(bias2[y==1]) , np.mean(bias2[y==-1]))
bias2, variance = bias_variance_decomposition(train_svm_3, test_svm, param=2, da
print(np.mean(bias2[y==1]) , np.mean(bias2[y==-1]))
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 12/16
Question 17 [marks 6]
Make a function bs,vs = compute_bias_variance_decomposition(train_func,
test_func, params, data_matrix, targets, n_bootstraps) to compute the average
squared bias error component and the average variance component of the error for each
parameter setting in the vector params . The regressor will be trained via train_func on the
problem data_matrix , targets with parameter param . The estimate will be done using a
number of replicates equal to n_bootstraps . To be clear, the vector bs contains the
average square bias error for each parameter in params and the vector vs contains the
average variance error for each parameter in params .
Question 18 [marks 1]
Make a function plot_bias_variance_decomposition(train_func, test_func,
params, data_matrix, targets, n_bootstraps, logscale=False) .
You should plot the individual components or the squared bias, the variance and the total error.
You should allow the possibility to employ a logarithmic scale for the horizontal axis via the
logscale flag.
You should get something like:
Question 19 [marks 2]
Make a function find_best_param_with_bias_variance_decomposition(train_func,
test_func, params, data_matrix, targets, n_bootstraps) that uses the bias
variance decomposition analysis to determine which parameter among params achieves the
smallest estimated predictive error.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
def compute_bias_variance_decomposition(train_func, test_func, params, data_matr
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
def plot_bias_variance_decomposition(train_func, test_func, params, data_matrix,
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]: def find_best_param_with_bias_variance_decomposition(train_func, test_func, para
# YOUR CODE HERE
raise NotImplementedError()
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 13/16
Question 20 [marks 6]
When you execute the following code
X, y = get_dataset_regression(n_samples=400, std=10, inner_std=7)
params = np.linspace(1,30,30).astype(int)
train_func, test_func = train_dt, test_dt
p = find_best_param_with_bias_variance_decomposition(train_func,
test_func, params, data_matrix, targets, n_bootstraps=60)
print('Best parameter:%s'%p)
plot_bias_variance_decomposition(train_func, test_func, params,
data_matrix, targets, n_bootstraps=50, logscale=False)
You should get something like:
The next unit tests will run your functions
find_best_param_with_bias_variance_decomposition on an undisclosed dataset
using as regressors:
decision tree
svm degree 3
and 3 marks will be awarded for each correct optimal parameter identified.
Question 21 [marks 5]
Make a function conf_mtx = confusion_table(targets, preds) to output the
confusion matrix as a 2 x 2 Numpy array. Rows indicate the prediction and columns the target.
The cell element with index [0,0] should report the true positive count.
Running the following code:
from sklearn.datasets import load_iris
X,y = load_iris(return_X_y=True)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.3)
models = train_knn(X_train, y_train, param=3)
preds = test_knn(X_test, models)
conf_mtx = confusion_table(y_test, preds)
print(conf_mtx)
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 14/16
you should obtain something similar to
[[16. 1.]
[ 0. 28.]]
Note: the exact values can differ in your run
Note: do not use library functions to compute the result directly but implement your own
version.
Question 22 [marks 1]
Make a function error_from_confusion_table(confusion_table_func, targets,
preds) that takes in input the previous confusion_table function and returns the error, i.e.
the fraction of predictions that do not agree with the targets.
Question 23 [marks 12]
Make a function predictions, out_targets =
cross_validation_prediction(train_func, test_func, param, data_matrix,
targets, kfold) that estimates the predictions of a classifier trained via the function
train_func with parameter param on the problem data_matrix, targets using a kfold
cross validation strategy with the number of folds indicated by kfold .
Since the order of the instances associated to the predictions can be different from the original
order, the function is required to output also the corresponding target values in the array
out_targets (i.e. the value in position 10 in predictions corresponds to the target value
in position 10 in out_targets )
Note: do not use library functions (such as KFold or StratifiedKFold ) but implement
your own version of the cross validation.
In [ ]:
def confusion_table(targets, preds):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
def error_from_confusion_table(confusion_table_func, targets, preds):
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
def cross_validation_prediction(train_func, test_func, param, data_matrix, targe
# YOUR CODE HERE
raise NotImplementedError()
In [ ]: # This cell is reserved for the unit tests. Do not consider this cell.
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 15/16
Question 24 [marks 5]
Make a function mean_errors =
compute_errors_with_crossvalidation(train_func, test_func, params,
data_matrix, targets, kfold, n_repetitions) that returns the estimated average
error for each parameter in params . The classifier is trained via the function train_func
with parameters taken from params on the problem data_matrix, targets using a k-fold
cross validation strategy with the number of folds indicated by kfold . The error estimate is
repeated a number of times indicated in n_repetitions . The error should be computed
using the function error_from_confusion_table . The output vector mean_errors has
as many entries as there are paramters in params .
Note: do not use library functions (such as cross_val_score ) but implement your own
version of the code.
Question 25 [marks 2]
Make a function find_best_param_with_crossvalidation(train_func, test_func,
params, data_matrix, targets, kfold, n_repetitions) that uses crossvalidation to
determine which parameter among params achieves the smallest estimated predictive error.
Question 26 [marks 0]
This is just a check-point, i.e. it is for you to see that you are correctly implementing all
functions. Since this cell uses functions that you have already implemented and that have
already been marked, this Question is not going to be marked.
You should be able to run the following code:
from sklearn.datasets import load_wine
X,y = load_wine(return_X_y=True)
params = [3,5,7,9,11]
train_func, test_func = train_knn, test_knn
kfold = 5
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
def compute_errors_with_crossvalidation(train_func, test_func, params, data_matr
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
In [ ]:
def find_best_param_with_crossvalidation(train_func, test_func, params, data_mat
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
# This cell is reserved for the unit tests. Do not consider this cell.
2021/3/2 1
localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 16/16
n_repetitions = 5
best_param = find_best_param_with_crossvalidation(train_func, test_func,
params, data_matrix, targets, kfold, n_repetitions)
print(best_param)
and get a value around 3.
In [ ]:
# Just run the following code, do not modify it
from sklearn.datasets import load_wine
data_matrix, targets = load_wine(return_X_y=True)
params = [3,5,7,9,11]
train_func, test_func = train_knn, test_knn
kfold = 5
n_repetitions = 5
best_param = find_best_param_with_crossvalidation(train_func, test_func, params,
print(best_param)
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做ceng0013 design of a pro...
2024-11-13
代做mech4880 refrigeration a...
2024-11-13
代做mcd1350: media studies a...
2024-11-13
代写fint b338f (autumn 2024)...
2024-11-13
代做engd3000 design of tunab...
2024-11-13
代做n1611 financial economet...
2024-11-13
代做econ 2331: economic and ...
2024-11-13
代做cs770/870 assignment 8代...
2024-11-13
代写amath 481/581 autumn qua...
2024-11-13
代做ccc8013 the process of s...
2024-11-13
代写csit040 – modern comput...
2024-11-13
代写econ 2070: introduc2on t...
2024-11-13
代写cct260, project 2 person...
2024-11-13
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!