首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
辅导python编程、program程序辅导
项目预算:
开发周期:
发布时间:
要求地区:
Instructions: Submit two files. One should be a write-up of all solutions and observations, as LastnameFirstnameSolution.pdf.
The second should be an archive LastnameFirstnameCode.zip containing code and any results
files.
1 [5 pts.] Irreducible data example
In class we discussed that not all datasets’ dimensionality can be successfully reduced using PCA.
(a)[1 pts] Discuss the cases when PCA will fail.
(b)[2 pts] How do we quantify that it fails?
(c)[2 pts] Provide an example dataset of 2D points (specify the points as vectors of numbers) in which PCA will
not work well for dimensionality reduction. Explain why. Hint: Think of 2D points and reduction to 1D.
2 [40 pts] Dimensionality reduction
For this question you can use the cloud.data.pdf from the UCI ML repository: https://archive.ics.uci.edu/ml/
datasets/Cloud. Read about it to get familiar with what is measured. Within the data, there are two datasets: DB
#1 and DB #2. For this homework, just use the 1024 vectors in DB #1. Use python for all your programming. You
will have to submit your code in LastnameFirstnameCode.zip together with the relevant write-up in the main
solution file LastnameFirstnameSolution.pdf.
(a)[5 pts] Load the data into a python program and center it. Note: there should be a function called center()
in your code that achieves this.
(b)[5 pts] Compute the covariance matrix of the data Σ. Hint: by using the definition of sample covariance, as a
matrix product or as a sum of outer products. See book for details. Use Numpy for linear algebra computations
(https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.linalg.html).As a result you should have a function
covar() in your code which does not use the built-in covariance functions.
(c)[5 pts] Compute the eigenvectors and eigenvalues of Σ. The numpy linear algebra module referenced above has a
function that can help.
(d)[10 pts] Plot the percentage of retained variance as a function of the number of principle components used.
Determine the number of principal components (PCs) r that will ensure 90% retained variance? How did you
compute this? Provide a function in your code that determines r based on an arbitrary percentage α of retained
variance.
(e)[10 pts] Compute the reduced dimension data matrix A with two dimensions by projection on the first two PCs.
Plot the points using a scatter plot (two dimensional diagram that places each sample i according to its new
dimensions ai1, ai2). Discuss the observations. Are there clusters of close-by points? Argue for or against whether
1
these are sufficient dimensions.
(f)[5 pts] Study the PCA implementation in pythons’ sklearn library https://scikit-learn.org/stable/
modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA. Do PCA using the library
on the same data. Do the eigenvalues approximately match to what you computed above?
3 [20 pts.] Kernel methods
Consider the problem of finding the most dissimilar diametric pair (MDDP): this is a pair of data points that are
dissimilar from the mean and also dissimilar from each other. Below is an algorithm that would find such a pair
given a data matrix D:
Algorithm 1: MDDP(D)
Result: a, b - the most dissimilar diametric points in D
Compute the data mean µ = mean(D);
s = +∞;
for i in (1 . . . n) do
for j in (i + 1 . . . n) do
temp = x
T
i µ + x
T
j µ + x
T
i xj ;
if temp < s then
s = temp;
a = xi
;
b = xj ;
end
end
end
The algorithm computes the sum of inner products x
T
i µ + x
T
j µ + x
T
i xj for each pair of points and returns the
pair with the lowest such quantity.
(a)[5 pts] Demonstrate the execution of this algorithm on the following data matrix of 2D instances: D=
0 1
1 3
5 0
2 4
.
Show the steps and the resulting MDD pair of points.
(b)[15 pts] As we discussed in class sometimes we would like to kernelize methods to handle non-linearity in data.
Provide a pseudo-code for a kernel version of the MDDP algorithm above. The goal is to kernelize the algorithm
for an arbitrary kernel Hint: Assume that you can compute the kernel matrix K, corresponding to
some mapping φ() and then use the basic kernel operations we discussed in class and also in the
book, to derive the steps of MDDP in terms of elements in K.
4 [10 pts.] Orthogonality of Error in Regression:
Prove that Yˆ T
= 0, where Yˆ is the predicted response and = Y −Yˆ is the error between the actual and predicted
response. Hint: Use the solution for the predicted response as a transformation of Y through the
hat matrix.
5 [25 pts.] Regression to understand CO2 pollution:
For this task, you will use data about CO2 emissions in European cities included as an excel sheet. The data has
total emissions per city (Column E) as well as other variables including number of airports, buildings, etc
(columns G and later). Use only data for cities (Column C: admin level=8). Feel free to prepare the data into
simpler text format before you use code.
The goal will be to train linear regression models of the total CO2 as a function of the predictor variables in
columns G and on.
2
(a) [7pts] Use scikitlearn’s linear regression: linear model.LinearRegression to learn an unregularized model.
Report the corresponding coefficient for each predictor variable.
(b) [7pts] Use scikitlearn’s ridge regression: linear model.Ridge to learn a regularized model. Try 2 values for
alpha: 1 and 10. Report the coefficients of the predictor variables for the two models.
(c) [11pts] Discuss your findings. Which are the most important factors determining the CO2 emissions? How
do the loading coefficients for factors differ between regularized and non-regularized models? How does the
residual error differ? Use visuazlizations as necessary to discuss your findings.
3
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
urba6006代写、java/c++编程语...
2024-12-26
代做program、代写python编程语...
2024-12-26
代写dts207tc、sql编程语言代做
2024-12-25
cs209a代做、java程序设计代写
2024-12-25
cs305程序代做、代写python程序...
2024-12-25
代写csc1001、代做python设计程...
2024-12-24
代写practice test preparatio...
2024-12-24
代写bre2031 – environmental...
2024-12-24
代写ece5550: applied kalman ...
2024-12-24
代做conmgnt 7049 – measurem...
2024-12-24
代写ece3700j introduction to...
2024-12-24
代做adad9311 designing the e...
2024-12-24
代做comp5618 - applied cyber...
2024-12-24
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!