首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
program代做、代写Python程序语言
项目预算:
开发周期:
发布时间:
要求地区:
46-886 Machine Learning Fundamentals HW 1
Homework 1
Due: Sunday, March 23, 11:59pm
• Upload your assignment to Canvas (only one person per team needs to submit)
• Include a writeup containing your answers to the questions below (and your team
composition), and a Python notebook with your code. Your code should run without
error when we test it.
• Please note that this assignment has two parts: A & B.
• Cite all sources used (beyond course materials)
• Finally, let’s review the instructions for using Google Colab, and submitting the final
writeup and Python notebook on Canvas.
1. Visit colab.research.google.com, and log in using your CMU ID.
2. Create a new notebook. Save it. Optionally, share it with your partner.
3. Upload1 climate change.csv to Colab after downloading it from Canvas.
4. Complete the assignment. Remember to save the notebook when exiting Colab.
5. File → Download → Download .ipynb downloads the notebook.
6. Submit this notebook and a write up to Canvas.
7. Remember to indicate if you had a partner at this stage.
1You may need to do this on every fresh run, i.e., when Colab reinitializes your interpreter. If read csv
complains that climate change.csv is non-existent, that’s certainly a sign.
1
46-886 Machine Learning Fundamentals HW 1
Part A: Climate Change
A.1 In this problem, we will attempt to study the relationship between average global tempera ture and several other environmental factors that affect the climate. The file (available on
Canvas) climate change.csv contains monthly climate data from May 1983 to December
2008. You can (and should) familiarize yourself with the data in Excel. A brief description
of all the variables can be found below.
Variable Description
Year Observation year
Month Observation month, given as a numerical value (1 = January, 2 =
February, 3 = March, etc.)
Temp Difference in degrees Celsius between the average global temperature
in that period, and a reference value
CO2, N2O, CH4,
CFC-11, CFC-12
Atmospheric concentrations of carbon dioxide (CO2), nitrous ox ide (N2O), methane (CH4), trichlorofluoromethane (CFC-11) and
dichlorodifluoromethane (CFC-12), respectively. CO2, N2O and CH4
are expressed in ppmv (parts per million by volume). CFC-11 and
CFC-12 are expressed in ppbv (parts per billion by volume).
Aerosols Mean stratospheric aerosol optical depth at 550 nm. This variable is
linked to volcanoes, as volcanic eruptions result in new particles being
added to the atmosphere, which affect how much of the sun’s energy is
reflected back into space.
TSI Total Solar Irradiance (TSI) in W/m2
(the rate at which the sun’s
energy is deposited per unit area). Due to sunspots and other solar
phenomena, the amount of energy that is given off by the sun varies
substantially with time.
MEI Multivariate El Nino Southern Oscillation index (MEI) – a measure of
the strength of the El Nino/La Nina-Southern Oscillation (a weather
effect in the Pacific Ocean that affects global temperatures).
We are interested in studying whether and how changes in environmental factors predict
future temperatures. To do this, first read the dataset climate change.csv into Python
(do not forget to place this file in the same folder, usually /current, on Colab as your
Python notebook). Then split the data into a training set, consisting of all the observations
up to and including 2002, and a test set consisting of the remaining years.
(a) Build a linear regression model to predict the dependent variable Temp, using CO2,
CH4, N2O, CFC-11, CFC-12, Aerosols, TSI and MEI as features (Year and Month
should NOT be used as features in the model). As always, use only the training set to
train your model. What are the in-sample and out-of-sample R2
, MSE, and MAE?
(b) Build another linear regression model, this time with only N2O, Aerosols, TSI, and
2
46-886 Machine Learning Fundamentals HW 1
MEI as features. What are the in-sample and out-of-sample R2
, MSE, and MAE?
(c) Between the two models built in parts (a) and (b), which performs better in-sample?
Which performs better out-of-sample?
(d) For each of the two models built in parts (a) and (b), what was the regression coefficient
for the N2O feature, and how should this coefficient be interpreted?
(e) Given your responses to parts (c) and (d), which of the two models should you prefer
to use moving forward?
Hint: The current scientific opinion is that N2O is a greenhouse gas – a higher con centration traps more heat from the sun, and thus contributes to the heating of the
Earth.
3
46-886 Machine Learning Fundamentals HW 1
Part B: Baseball Analytics (No knowledge of baseball is needed to complete this problem)
Sport Analytics started with – and was popularized by – the data-driven approach to player
assessment and team formation of the Oakland Athletics. In the 1990s, the “A’s” were
one of the financially-poorest teams in Major League Baseball (MLB). Player selection was
primarily done through scouting: baseball experts would watch high school and college games
to identify future talent. Under the leadership of Billy Beane and Paul DePodesta, the A’s
started to use data to identify undervalued players. Quickly, they met success on the field,
reaching the playoffs in 2002 and 2003 despite a much lower payroll than their competitors.
This started a revolution in sports: analytics is now a central component of every team’s
strategy.2
In this problem, you will predict the salary of baseball players. The dataset in the included
baseball.csv file contains information on 263 players. Each row represents a single player.
The first column reports the players’ annual salaries (in $1,000s), which we aim to predict.
The other columns contain four sets of variables: offensive statistics during the last season,
offensive statistics over each player’s career, defensive statistics during the last season, and
team information. These are described in the table below.
Read the baseball.csv file into Python. Note that three of the features are categorical
(League, Division, and NewLeague) and thus need to be one-hot encoded. Do that before
proceeding to the questions below.
B.1 Before building any machine learning models, explore the dataset: try plotting Salary
against some features, one at a time. When you have identified a feature that you feel may
be useful for predicting Salary, include that plot in your writeup, and comment on what
you have observed in the plot (one sentence will suffice).
B.2 Split the data into a training set (70%) and test set (30%). Train an “ordinary” linear
regression model (i.e. no regularization), and report the following:
(a) The in-sample and out-of-sample R2
(b) The value of the coefficient for the feature you identified in question A.1, and an
interpretation of that value.
(c) The effect on salary that your model predicts for a player that switches divisions from
East to West.
(d) The effect on salary that your model predicts for a player that switches divisions from
West to Central.
(e) The effect on salary that your model predicts for a player that switches divisions from
Central to East.
B.3 Train a model using ridge regression with 10-fold cross-validation to select the tuning pa rameter. The choice of which tuning parameters to try is up to you (this does not mean
there is not a wrong answer). Report the following:
2For more details, see the Moneyball: The Art of Winning an Unfair Game book by Michael Lewis and
the Moneyball film.
4
46-886 Machine Learning Fundamentals HW 1
Variable Description
Salary The player’s annual salary (in $1,000s)
AtBats Number of at bats this season
Hits Number of hits this season
HmRuns Number of home runs this season
Runs Number of runs this season
RBIs Number of runs batted in this season
Walks Number of walks this season
Years Number of years in MLB
CareerAtBats Number of at bats over career
CareerHits Number of hits over career
CareerHmRuns Number of home runs over career
CareerRuns Number of runs over career
CareerRBIs Number of runs batted in over career
CareerWalks Number of walks over career
PutOuts Number of putouts this season
Assists Number of assists this season
Errors Number of errors this season
League League in which player plays (N=National, A=American)
Division Division in which player plays (E=East, C=Central, W=West)
NewLeague League in which player plays next year (N=National, A=American)
(a) The in-sample and out-of-sample R2
(b) The final value of the tuning parameter (i.e. after cross-validation)
(c) The value of the coefficient for the feature you identified in question 1, and an interpre tation of that value. Compared to your model from question 2, has this feature become
more or less “important”?
(d) Of the two models so far, which one should be used moving forward?
B.4 Train a model using LASSO with 10-fold cross-validation to select the tuning parameter.
The choice of which tuning parameters to try is up to you (this does not mean there is not
a wrong answer). Report the following:
(a) The in-sample and out-of-sample R2
(b) The final value of the tuning parameter (i.e. after cross-validation)
5
46-886 Machine Learning Fundamentals HW 1
(c) The number of features with non-zero coefficients (Hint: there should be at least one
feature with coefficient equal to 0)
(d) Of the three models so far, which one should be used moving forward?
6
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代写tc2343、代做python设计程...
2025-04-15
6412ele代写、代做c/c++,pyth...
2025-04-15
fit5221代写、代做python语言编...
2025-04-15
代写assessment 3 – “annota...
2025-04-15
代写 comp 310、代做 java/pyt...
2025-04-14
代做 program、代写 java 语言...
2025-04-14
program 代做、代写 c++/pytho...
2025-04-14
代写review questions – ad/a...
2025-04-14
代写eng5009 advanced control...
2025-04-14
代做ent204tc corporate entre...
2025-04-14
代写assignment st3074 ay2024...
2025-04-14
代做cs3243 introduction to a...
2025-04-14
代做empirical finance (bu.23...
2025-04-14
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!