首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
辅导program编程、R留学生程序讲解 讲解R语言编程|辅导Python编程
项目预算:
开发周期:
发布时间:
要求地区:
DRAFT
Statistical Learning Assignment - Semester 1, 2021
• INSTRUCTIONS:
1. The assignment must be typed (not handwritten). You may either use Microsoft Word (or similar)
or R markdown in RStudio for the assignment. Note that the final project will require the use of
R markdown. When answering this question, it should be no longer than 10 A4 pages
[single sided] with a font size no smaller than 11 point.
2. The assignment due date is listed on the Wattle (Turn-it-in) site. Upload the assignment through
Wattle using Turn-it-in. You should submit your assignment in two different parts. If you are
using R markdown:
(a) A pdf file [or HTML file] of your assignment (this should include important R code to highlight
what you have done).
(b) A ‘.Rmd’ file [an R markdown file].
If you are using Microsoft Word (or similar):
(a) A Word file of your assignment (this should include important R code to highlight what you
have done).
(b) A ‘.R’ file of your R code.
3. In answering the questions, write your answers clearly and succinctly. Use appropriate graphs and
tables when you think they help to describe your point or thinking process. Do not just “print” a
set of results. Every result should be discussed and have a reason for being presented. No points
will be awarded unless you clearly discuss what you are doing.
4. No late assignments will be accepted.
5. You should not discuss the assignment (questions, solutions, code, etc.) with your
classmates or other individuals. You can discuss these with me or your tutor (Dr.
Ha Nguyen) during our consultation times. You must independently write your own
solutions. This includes all computer code, English, and mathematics. University
policies on academic integrity will be strictly enforced. See http://www.anu.edu.au/
students/program-administration/assessments-exams/academic-honesty-plagiarism for
more details.
6. Have fun with the exploration!
1. (100 points) We will explore some of the techniques we are considering by examining data on housing
prices. We will use the data from the prediction competition available on Kaggle https://www.kaggle.
com/c/house-prices-advanced-regression-techniques. For this question you will need to create
an account on Kaggle. Please let me know if you don’t want to use Kaggle based on privacy concerns.
(a) Create an account on Kaggle. What is your Kaggle username? Download the training and test
data.
(b) Consider a multiple regression model to examine the relationship between housing sale prices (Y )
in Ames, Iowa, USA from 2006 to 2010 and their covariate information (x). While, 79 covariates
are available, for this assignment we will only use a few covariates. Only consider the following
covariates: LotArea, OverallCond, GrLivArea, FullBath, TotRmsAbvGrd, PoolArea. As the real
test data does not contain the response Y (SalePrice), split the training data in half. The first
half will be the new training data and the second half will be your personal test data. For this
assignment set α = 0.05.
1
DRAFT
i. (20 points) Using all of the training data together (personal training and test data), conduct
an exploratory data analysis. In doing your analysis make sure to identify any unusual points
and discuss why they are unusual. For this assignment do not remove any unusual points, only
comment on them (if they exist). In addition to visualisations of the raw data, consider the
natural log transformation of the response. You may also consider any transformations of the
covariates. For the rest of the assignment, if you believe the transformations are appropriate
(provide justification - this can simply be a discussion), use those transformations.
ii. (6 points) Using just your personal training data and the covariate GrLivArea, based on
traditional regression approaches (possibly: t-tests, F-tests, etc.), determine if there exists
a non-linear (quadratic, cubic, etc.) between the covariate and the response. How flexible
should the model be? Make sure to fully outline any tests and conclusions.
iii. (6 points) Using your personal training and personal testing data, along with the notion of
squared error loss, determine if there exists a non-linear (quadratic, cubic, etc.) relationship
between the covariate and the response. How flexible should the model be?
iv. (6 points) Consider all the covariates which we are using in this assignment: LotArea, OverallCond,
GrLivArea, FullBath, TotRmsAbvGrd, PoolArea. Using just your personal training
data and traditional regression approaches, determine if any of the variables are statistically
significant. Are you able to reduce the model (i.e. not use all the covariates)? Here you do
not need to consider any non-linearities or interactions. Make sure to fully outline any tests
and conclusions.
v. (6 points) Based on the ordering of the covariates in your final model in the previous question,
using your personal training and personal testing data, along with the notion of squared error
loss, determine which covariates should be included in the model.
vi. (6 points) Consider all the covariates which we are using in this assignment: LotArea, OverallCond,
GrLivArea, FullBath, TotRmsAbvGrd, PoolArea. Using just your personal training
data and traditional regression approaches, determine if PoolArea has a statistically significant
interaction with any of the other covariates. You may have up to five interactions in
your model. Make sure to fully outline any tests and conclusions.
vii. (6 points) Based on the ordering of the covariates in your final model in the previous question,
using your personal training and personal testing data, along with the notion of squared error
loss, determine which interactions should be included in the model.
viii. (6 points) Consider all the covariates which we are using in this assignment: LotArea,
OverallCond, GrLivArea, FullBath, TotRmsAbvGrd, PoolArea. You may now consider any
modelling that you wish using your personal training data. You may also consider any type of
model selection approach (i.e. traditional or based on squared-error loss for the testing data).
Make sure to fully outline any tests and conclusions. Calculate the mean-squared error on
your personal testing data.
ix. (6 points) Using your final model from Question 1(b)viii and the Kaggle test data, submit a
prediction file to Kaggle. See Kaggle for details on what the file should look like. What was
your score and rank?
• Note: as discussed on the site (https://www.kaggle.com/c/titanic/details/evaluation),
“[t]he Kaggle leader-board has a public and private component. 50% of your predictions
for the test set have been randomly assigned to the public leader-board (the same 50%
for all users). Your score on this public portion is what will appear on the leader-board.
At the end of the contest, we will reveal your score on the private 50% of the data, which
will determine the final winner. This method prevents users from ‘over-fitting’ to the
leader-board.”
x. (6 points) Examining the leader board you can see that one individual has a perfect score
(when I last looked). Is this surprising? What explanation might there be for this?
2
DRAFT
xi. (6 points) This Kaggle competition is using Root Mean Squared Logarithmic Error instead
of Mean Squared Error. Provide a discussion about the difference between the two criteria.
xii. (20 points) Provide a full discussion of your final model from Question 1(b)viii. This may
include, but is not limited to, discussions of the coefficients, visualisations of the fitted model,
and model checking.
3
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代写infosys 110 digital syst...
2024-12-28
代写fbe 506 quantitative met...
2024-12-28
代做part i: (crazy eddie htm...
2024-12-28
代写infosys 110 digital syst...
2024-12-28
代做stats 769 statistics sec...
2024-12-28
代写ece3700j introduction to...
2024-12-28
代做tcm2301 biochemistry代做...
2024-12-28
代做ece5550: applied kalman ...
2024-12-28
代写mth205 introduction to s...
2024-12-28
代写scicomp project 3 week 4...
2024-12-28
代做business operations anal...
2024-12-28
代写mth205 introduction to s...
2024-12-28
代写socs0100 computational t...
2024-12-28
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!