首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
代做program、代写MATLAB语言程序
项目预算:
开发周期:
发布时间:
要求地区:
Coursework 1
Guidelines
Setting up the coursework
To start, download the file "cw1.zip" from the module’s Keats website. Once this is done:
1. Unzip the file "cw1.zip" in a folder of your choice. We will refer to this folder as "
"
2. Change the the name of your unzipped folder to your k-number. For instance, if your k number is "k12345678", the
file "run_coursework.m" should be located at "
/k12345678/run_coursework.m"
3. Open your MATLAB editor, and make sure that the file explorer (upper left section of your editor) is located at
"
/k12345678/" (otherwise, the code will not run).
Instructions
You should now find two matlab scripts ("run_coursework.m" and "check_dimensions.m"), a series of MATLAB function
files, two ".mat" files, and this file ("coursewor_1.pdf").
In this coursework, you will complete a series of functions that will be called in the script "run_coursework.m". By running
the file "run_coursework.m", you will be able to inspect the results of the functions you modified.
You should modify ONLY the code inside the mentioned functions at each question. Most importantly, you should NOT
edit the scripts "run_coursework.m" or "check_dimensions.m"!
The functions to edit will be indicated at the end of each question. We will also specify the format of the inputs and the
required format of the outputs.
For instance, for a function named "example", which sums the entries of a vector v, the question will specify the
following:
----
Function file: "example.m"
Input format: vector v
Output format: scalar value
Function signature (name of the inputs and outputs in the code):
function sum_v = example(v)
----
1
The file "example.m" will then contain the following
function out = example(v)
% Write your code here
out = rand(1); % Placeholder (to delete and replace upon completion of the question)
end
By default, each function will have placeholder output ("out = rand(1)" in this example) that ensures that the code
runs, even if the result is purely random. If you skip a question, you should leave the placeholder to ensure the file
"run_coursework.m" still runs. If you complete a question, you will have to remove the placeholder and replace it with
your answer.
You ONLY need to modify the code INSIDE the function. You must NOT modify the name of the function, and you must
NOT modify the signature of the the function (i.e., the name of the inputs/outputs)!
Once a function has been coded, you should run the script "run_coursework.m" to make sure the function does the
intended operation. Note that the file "run_coursework.m" will run all of the coursework at once, so you might want to
run only the lines of "run_coursework.m" up to your current question. Comments in "run_coursework.m" indicate which
lines concern which questions.
You are not allowed to use MATLAB toolboxes: the coded functions should only contain built-in MATLAB functions such
as the ones seen in lectures or tutorials (e.g., "mean", "sum", "*", ".*", "binornd", ...).
Please avoid submitting functions that display text. It is recommended to use the MATLAB debugger instead of function
"disp" to inspect the behaviour of your code.
Submitting your coursework
Before submitting your work, you should clear your workspace (right click on the "Workspace" section of your editor >
"Clear Workspace") and verify that the script "run_coursework.m" runs well and gives the intended results.
You should also run the file "dimension_check.m" to make sure that each the outputs of each function have the right
dimensions.
No points will be awarded for functions that do not output the right dimensions or to functions that raise an error.
To submit your work, compress the folder containing the MATLAB files into a ZIP file with your k-number as its name. For
instance, if your k-number is "k12345678", the ZIP file should be "k12345678.zip". Please verify that the ZIP file directly
contains your MATLAB files, and not an intermediary folder.
Finally, submit the your ZIP file over KEATS.
Introduction
2
This coursework will explore how machine learning can be used to analyze text data. It is divided in two parts:
• Part I: prediction of the next word given the previous word in a sentence based on a given model;
• Part II: training of a classifier that can predict the next word based on the previous words.
A written sentence can be viewed as a sequence of words. We will use the notation to denote a sequence of K words.
For instance, the sentence "Hello, my name is Sam." is given by the 5-words sequence (disregarding the punctuation and
upper/lower cases):
.
Note that two different orderings of the same words represent two distinct sequences.
For instance, and are two different sentences.
The words composing a sentence are taken from a discrete vocabulary set of M different words for
.
For instance, with the vocabulary set , we can create
the sequences:
and
.
In order to represents the M words of the vocabulary as numbers we define the discrete set
, where represents the word in V.
For instance, if the vocabulary is , the sequence
can be expressed as the vector , and the
sequence can be represented by the vector .
From a probabilistic perspective, a sequence of K words is modelled as a discrete random vector
, where the random variable represents the k-th word in . For , the random
variable takes values in the set , and a realization represents the word in the vocabulary V.
3
For instance, a realization represents the sequence , if the
vocabulary is .
Throughout this coursework, we will work with a vocabulary V of words given as
V = [
"it", "is", "a", "the", "nice", ...
"good", "day", "evening", "not", "or" ...
];
Part I
In the first part of this coursework, we will analyze sequences of two words taken from the vocabulary set V.
Accordingly, each 2-words sequence will be represented by a random vector , where and take values
in the set .
Throughout this part, some functions will take as input the matrix
, such that the element at the i-th row and j-th colum of
the matrix represents the joint probability for and .
Question 1 [10 points]
Complete the function that takes as input the matrix defined above, and returns the marginal
probability distribution .
----
Function file: "marginalx1.m"
Input format: matrix
Output format: vector (denoted as px1 in the code)
Function signature:
function px1 = marginalx1(P_joint)
4
----
Question 2 [10 points]
Complete the function that takes as input the matrix at the begining of Part I and
the marginal probability distribution defined in Question 1, and returns the conditional probability distribution
as the matrix .
----
Function file: "probNextWord.m"
Input format:
• matrix
• vector (denoted as px1 in the code)
Output format: matrix
Function signature:
function P_cond = probNextWord(P_joint, px1)
----
Question 3 [10 points]
Complete the function that takes as inputs the matrix defined in Question 2 and a
realization of , and returns a realization of the next word given , i.e., a sample .
Note that takes values in only.
----
Function file: "sampleNextWord.m"
Input format:
• matrix ,
• scalar value ;
5
Output format: scalar value
Function signature:
function x2 = sampleNextWord(P_cond, x1)
----
Question 4 [5 points]
Complete the function that takes as inputs:
• the matrix defined in Question 2,
• a realization of the first word of the sequence ,
• the number of words in the sequence;
and returns a vector corresponding to a realization of the random vector
given .
We assume here that the distribution of each word only depends on its previous word
for , i.e., .
----
Function file: "sampleSequence.m"
Input format:
• matrix ,
• scalar value ,
• integer ;
Output format: vector
Function signature:
function x_K = sampleSequence(P_cond, x1, K)
----
Question 5 [5 points]
Complete the function that takes as inputs:
6
• the vocabulary row-vector ,
•
a vector with ;
and returns the sequence of words represented by the vector .
You can use the MATLAB function to initialize a row-vector with empty strings.
----
Function file: "sequenceToWords.m"
Input format:
• row-vector V,
• vector ;
Output format: row-vector of text values
Function signature:
function s_K = sequenceToWords(V, x_K)
----
Part II
In this second part, we are given a dataset of N sentences of length K, where
is the k-th word in the n-th sentence, for and . As explained in the
introduction, the integer represents the word in the given vocabulary of words.
The objective of this part will be to train a hard predictor with parameter vector capable of
predicting the next word based on the k previous words represented by the vector ,
with . For this, we will use a fraction of the available dataset to train the hard predictor
that minimizes the mean squared error (MSE), where the function
takes the nearest integers of a real number . The remaining fraction of the dataset will be used
to assess the performance of the trained predictor.
7
Accordingly, for a given , we will regroup all the predictor inputs available in the dataset into
a input matrix , where the vector represents the k first words of the
n-th sentence, for . The corresponding predictor targets (i.e., the next word after ) are grouped into an
target vector , where represents the -th word of the n-th sentence. We
will also use the notation to refer to the data matrix containing all of dataset .
We provide a function which takes as input a input data matrix and
its corresponding targets as a vector , for any number of rows ; and outputs the optimal
parameter vector of the predictor with respect to the MSE. This function can be found in the file
"leastSquaresSolver.m" and its function signature is:
function theta_k = leastSquaresSolver(X_k, t_k)
This function can be called at any point in the code to obtain the optimal parameter vector .
Question 6 [10 points]
Complete the function that takes as inputs:
•
the data matrix corresponding to the entire dataset defined at the begining of Part II,
• a scalar value representing the train/test ratio split;
and outputs the training set as the training data matrix , where , and the
test dataset as the test data matrix , where . Note that the training set
will containt the first rows of the matrix , i.e., the rows ranging from 1 to (included), while the test set will
contain the remaining rows of X, i.e., the rows of ranging from to N.
8
This partition must not involve any randomness or re-ordering of the rows, and it must use the function "round" available
in MATLAB.
----
Function file: "splitDatasetTrainTest.m"
Input format:
•
input matrix (denoted as X in the code),
• scalar ;
Output format (in order):
•
training data matrix ,
•
test data matrix ;
Function signature:
function [X_tr, X_te] = splitDatasetTrainTest(X, r)
----
Question 7 [10 points]
Complete the function which takes as input:
•
input data matrix , for ,
• integer corresponding to the number of words to select in each sentence;
and outputs the input matrix corresponding to the first k columns of the data matrix (i.e., the
columns of ranging from 1 to k included), and the target vector corresponding to the -th column of
.
----
Function file: "splitInputTarget.m"
Input format:
•
data matrix (denoted as X in the code),
• scalar value
Output format:
9
•
input matrix corresponding to the first k words of each row in ,
•
target vector corresponding to the -th column of ;
Function signature:
function [X_k, t_k] = splitInputTarget(X, k)
----
Question 8 [10 points]
Complete the function which takes as input:
•
a matrix composed of the first k columns of ,
• a parameter vector ;
and outputs the vector corresponding of the inner product of each row in with , i.e., where the i-th
element in corresponds to the inner product , for
----
Function file: "rowWiseInnerProduct.m"
Input format:
• matrix ,
• a parameter vector ;
Output format: vector
Function signature:
function o_k = rowWiseInnerProduct(X_k, theta_k)
----
Question 9 [10 points]
Complete the function which takes as input the number M of words in the vocabulary
V and a vector corresponding of the inner products of a parameter
10
vector with n sentences, represented as vectors , for ; and outputs the vector
representing the outputs of the predictor for each input sentence .
----
Function file: "predictNextWord.m"
Input format:
• scalar M,
• vector ;
Output format: vector
Function signature:
function t_hat_k = predictNextWord(M, o_k)
----
Question 10 [10 points]
Complete the function which takes as input a vector of predicted targets and a vector of
true targets t, and outputs the scalar value corresponding to the mean squared error (MSE).
----
Function file: "mseLoss.m"
Input format:
•
vector of predicted labels (denoted as in the code),
• vector t of true labels
Output format: scalar value
Function signature:
function mse = mseLoss(t_hat, t)
----
Question 11 [10 points]
We provide a function in the file "trainAndTest.m" which takes as input
11
• the input matrix X and the label vector t defined at the begining of part II,
• a scalar value representing the train/test ratio split (see Question 6),
• an integer representing the number of features to select in (see Question 7);
and applies the following steps:
•
split the dataset between training data and test data using the function
(Question 6),
•
split the training data between training inputs and training targets using the function
(Question 7),
•
split the test data between test inputs and test targets using the function
(Question 7),
•
compute the optimal parameter of the predictor with respect to the MSE by training on
, using the provided function ,
•
compute the training MSE loss of the predictor on the train dataset ,
using the functions (Question 8), (Question 9), and
(Question 10),
•
compute the test MSE loss of the predictor on the test dataset ,
using the functions (Question 8), (Question 9), and
(Question 10),
• output the both the training MSE loss and test MSE loss .
Using the provided function , we plot the training and test MSE loss as a function of the number
of features k for a fixed train-test ratio split (last section of the script "run_coursework.m").
For what value of k does the trained predictor underfit the data ? Which value of k yields the best fit in terms
of generalization error ?
The answers to these questions should be provided as the output of the function below.
----
Function file: "analyzePlot.m"
12
Input format:
Output format:
•
scalar value of k for which underfits the data,
•
scalar value of k for which yields the best generalization error;
Function signature:
function [k_underfit, k_best] = analyzePlot()
----
Do not forget to run the file "dimension_check.m" to verify that the outputs of your coded functions match the
required formats !
13
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做ceng0013 design of a pro...
2024-11-13
代做mech4880 refrigeration a...
2024-11-13
代做mcd1350: media studies a...
2024-11-13
代写fint b338f (autumn 2024)...
2024-11-13
代做engd3000 design of tunab...
2024-11-13
代做n1611 financial economet...
2024-11-13
代做econ 2331: economic and ...
2024-11-13
代做cs770/870 assignment 8代...
2024-11-13
代写amath 481/581 autumn qua...
2024-11-13
代做ccc8013 the process of s...
2024-11-13
代写csit040 – modern comput...
2024-11-13
代写econ 2070: introduc2on t...
2024-11-13
代写cct260, project 2 person...
2024-11-13
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!