首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
代写program、R程序设计代做
项目预算:
开发周期:
发布时间:
要求地区:
Problem Set 1
XXX
Due: 10/10/2024
Introduction
This document was produced by R using RMarkdown. To complete this weeks assignment, we will ask you to
complete a series of analytical and coding exercises. The Analytical Exercises require no coding, whereas
the Coding Exercises require you to use R. The nice thing about RMarkdown is you can do both your
analytical and coding exercises in the same document. For each part in the Coding Exercises, we provide
an empty space of code chunks (area highlighted in grey with a header of the form “## ~~ Problem XX ~~
##”“)
To ease your introduction into R, Problem 2 is a short tutorial into the R programming environment.
Hopefully, you have already downloaded RStudio on your computer. If not, please go do that now. You can
download the latest version at (https://www.rstudio.com/products/rstudio/download/).
Once you have downloaded RStudio, you will be able to open the R markdown script (assignment_1.rmd
ffle) that created this assignment_1.pdf ffle. We ask you to ffll in your code in the code chunk sections (the
areas highlighted in gray bounded by “‘ marks) in the .rmd ffle in each of the subparts of the questions. You
will see that you can include LaTex code in these document. You are not required to use LaTex to do the
analytical exercises (i.e. those without coding), but it is good practice to improve your LaTex skills.
In order to compile a markdown document (i.e. turn your code into a pdf ffle), you must have a version of LaTex
downloaded on your computer. I suggest you download MikTex (https://miktex.org/howto/install-miktex).
If at any time you are confused about R, or not sure what a command does or additional arguments available
for each command, there are two tried and true methods to help resolve this issue. In the R console, you can
use the help command, where you supply the name of the command you are confused about. Alternatively,
google is your friend.
Installing MikTex R Markdown was recently updated, and this update has issues with missing packages
with MikTex. In order to deal with these issues, you need to allow MikTex to install any missing packages
without asking you ffrst. To do so, when you are installing MikTex, in the ‘Settings’ screen, it asks Install
missing packages on-the-ffy. Please select Yes in this screen. If you have already installed MikTex, you
can go to the MikTex console -> Settings and the same box appears in that screen.
Packages to Install
Each week, we will list the packages that you need to install into R in order for you to complete the assignments.
This also allows you to know a nice resource to view which packages you have learned throughout this course.
The packages used this week are
• stats
• ggplot2 (optional)
1Code Setup
## This is a code chunk: it is outlined in grey and has R code inside of it
## Note: you can change what is shown in the final .pdf document using arguments
## inside the curly braces at the top {r, comment='\t\t'}. For example, you
## can turn off print statements showing in the .pdf by adding 'echo=False'
## i.e. changing the header to {r, comment='\t\t',echo=FALSE}
## ~~~~~~~~~~~~~~ CODE SETUP ~~~~~~~~~~~~~~ ##
# ~ This bit of code will be hidden after Problem Set 1 ~
#
# This section sets up the correct directory structure so that
# the working directory for your code is always in the data folder
# Retrieve the code working directory
#script_dir = dirname(sys.frame(1)$ofile)
initial_options <- commandArgs(trailingOnly = FALSE)
render_command <- initial_options[grep('render',initial_options)]
script_name <- gsub("'", "",
regmatches(render_command,
gregexpr("'([ˆ']*)'",
render_command))[[1]][1])
# Determine OS (backslash versus forward slash directory system)
sep_vals = c(length(grep('\\\\',script_name))>0,length(grep('/',script_name))>0)
file_sep = c("\\\\","/")[sep_vals]
# Get data directory
split_str = strsplit(script_name,file_sep)[[1]]
len_split = length(split_str) - 2
data_dir = paste(c(split_str[1:len_split],'data',''),collapse=file_sep)
# Check that the data directory contains the files for this weeks assignment
data_files = list.files(data_dir)
if(any(sort(data_files)!=sort(c('us_air_rev.csv','us_load_factor.csv')))){
cat("ERROR: DATA DIRECTORY NOT CORRECT\n")
cat(paste("data_dir variable set to: ",data_dir,collapse=""))
}
Problem 1 (Analytical Exercise)
Consider a simple AR(1) model:
Yt = αYt−1 + εt with εt ∼ N(0, σ
2
) for t = {1, . . . , T} and Y0 = 0
1. What is the distribution of Y1? What is the distribution of Y2?
2. What is the distribution of Yt for |α| < 1 as t → ∞.
3. What is the deffnition of stationarity? Explain why in this model we can check for stationarity by
looking at the mean and the variance of the Yt.
4. Suppose that α = 1. Why does this imply that the model is nonstationary? Can you think of a simple
transformation that makes the model stationary?
5. Now suppose that |α| < 1. Find a formula for the jth autocorrelation ρj = corr(Yt, Yt−j ).
26. Explain how we could use estimates of ρj for j = 1, 2, . . . to check whether some actual time series data
was generated by an AR(1) model like we one described above.
Problem 2 (Coding Exercise)
The problem will take you through a few tasks to familiarize yourself with R, as well as, some basic time
series concepts:
(a) Loading data into R
(b) Doing simple data analysis
(c) Doimg time series analysis
For this problem, we have pulled two seperate datasets from the FRED database, maintained by the Federal
Reserve Bank of Saint Louis (https://fred.stlouisfed.org/). The datasets cover the aggregate revenue
and load factor in domestic US ffights from 2000 to 2018. In the last two decades, airlines have begun
using sophisticated algorithms to increase capacity utilization of ffights (i.e. ffights tend to be more full).
Furthermore, during the same time period, airline revenues have increased. The point of this exercise will
be to understand the role of these productivity increases in “explaining” increased revenues in the airline
industry.
The two seperate datasets you will be working with are:
1. US Domestic Air Travel Revenue Passenger Mile (fflename = us_air_rev.csv) : this dataset contains
monthly data detailing the number of miles traveled by paying passengers in domestic US air travel.
2. US Domestic Air Travel Load Factor (fflename = us_load_factor.csv) : this dataset contains
monthly data detailing the percentage of seats fflled up (capacity utilitization) in domestic US air travel.
A Small Detour: Brief introduction to print statements We ask you to print a number of your
results in this exercise. In R, there are two different wants to print results:
1. print
2. cat
There are some deep programmatic differences underlying what each of these does, for our purposes we only
care about how easy to read your outputs are.
Printing Strings Let’s say you have a list of numbers, [4,5,6] and I want you to print out the following
statement:
The ffrst element of the list is: 4
The second element of the list is: 5
The third element of the list is: 6
Below I show you three ways to do so, the ffrst way simply uses print without any additional arguments.
The second way uses print with an additional argument, quote=False which gets rid of the quotes around
the strings. The third approaching, using cat, shows how this combines the second approach and has an
additional formatting feature that is useful for printing output.
## Define a list called x with 3 elements
x = c(4,5,6)
## Retrieve 1st, 2nd, 3rd element of list
first_elem = x[1] #1st element
second_elem = x[2] #2nd element
third_elem = x[3] #3rd element
## Create strings stating 'The XXXX element of the list is:'
first_str = 'The first element of the list is:'
3second_str = 'The second element of the list is:'
third_str = 'The third element of the list is:'
## Concatenate the list elements and the string to create the whole phrase
first_phrase = paste(first_str,first_elem,sep=' ')
second_phrase = paste(second_str,second_elem,sep=' ')
third_phrase = paste(third_str,third_elem,sep=' ')
## ~~ (1) Print without extra arguments ~~ ##
print('~~ (1) Print without extra arguments ~~')
print(first_phrase)
print(second_phrase)
print(third_phrase)
## ~~ (2) Print with extra argument turning off quotes ~~ ##
print('~~ (2) Print with extra argument turning off quotes ~~',quote=F)
print(first_phrase, quote=F)
print(second_phrase, quote=F)
print(third_phrase, quote=F)
## ~~ (3) Print without quotes and without trailing # ~~ ##
cat("\n")
cat("~~ (3) Print without quotes and without trailing # ~~\n")
cat(paste(first_phrase,"\n",sep=''))
cat(paste(second_phrase,"\n",sep=''))
cat(paste(third_phrase,"\n",sep=''))
[1] "~~ (1) Print without extra arguments ~~"
[1] "The first element of the list is: 4"
[1] "The second element of the list is: 5"
[1] "The third element of the list is: 6"
[1] ~~ (2) Print with extra argument turning off quotes ~~
[1] The first element of the list is: 4
[1] The second element of the list is: 5
[1] The third element of the list is: 6
~~ (3) Print without quotes and without trailing # ~~
The first element of the list is: 4
The second element of the list is: 5
The third element of the list is: 6
Printing Dataframes The main object you will be working with in R is called a dataframe (think an
excel spreadsheet). We will discuss more fully these objects in the following section. However, oftentimes you
will be asked to print out dataframes. In this case, using print is your best option. This is due to differences
between cat and print that are beyond the scope of this introduction.
(a) Loading Data
The ffrst thing we want you to do is to load both datasets: us_air_rev.csv and us_load_factor.csv into
R.
Please load data in the section below
## ~~ Problem 2: Part (a) Load Data into R ~~ ##
4## Change working directory to data directory
setwd(data_dir) # <- This is set in CODE SETUP section
# If you are having issues, you can manually
# set this variable to the folder where the data is
## Load Air Revenue Data ##
## Load Air Load Factor Data ##
There are two ways to view data that you have loaded into memory in R.
1. View only ffrst (or last few rows) using head (tails) commands
2. View the entire dataset in a seperate window using View commands
Note, for very large datasets it is not a good idea to use the View command as it is very memory (RAM)
intensive.
Other checks you always want to do when loading data includes:
1. Check the column names using colnames
2. Check the data types for each column using a loop and xxx
3. Check the dimension (number of rows and columns) using the dim command
We now want you to run the following checks on both of your loaded datasets:
(1) Print the column names.
(2) Print off the ffrst 20 rows.
(3) Print off the number of rows and columns.
(4) Print the data types of all the columns.
Note, for part (4) I have already built the for loop statement to get all the data types for each of the columns.
For those familiar with for loops in other environments, R has a built in set of apply functions that are
optimized for speciffc objects (lapply is optimized for lists, vapply is optimized for vectors etc). If you are
unfamiliar with for loops, give it a google.
## ~~ Problem 2: Part (a) Run Data Checks ~~ ##
(b) Doing simple data analysis
In the next part, we will have you doing some actual time series analysis. But generally we are interested in
decomposing time series into trend, seasonal and stochastic components. One clear form of seasonality is
month to month variation in the data. An “approximation” for trend components is to look at year to year
changes. We will have you investigate these below.
We now want you to do the following:
(1) Calculate the average revenue and load factor, by year. Do this two ways: (1) Using aggregate and
mean, (2) Using aggregate and sum.
(2) Calculate the average revenue and load factor, by month. Do this two ways: (1) Using aggregate and
mean, (2) Using aggregate and sum.
(3) Plot graphs for part (1) and (2) on the same plot, using your favorite plotting function. Note, you can
either use the built-in plot function or the popular external library ggplot2.
For parts (1) and (2), I want you to build a better understanding of using R. I am asking you to compute
averages using two different methods. In Method (1), you can use the built-in mean function to have R do
the work for you. In Method (2), you will do the average calculation yourself by summing over observations
and dividing by the number of observations.
## ~~ Problem 2: Part (b) Simple Data Analysis ~~ ##
5(c) Doing time series analysis
In R, there are already built-in functions that allow us to do these seasonality and trend decompositions
with much fewer lines of code. To do so, we must convert our data into time series objects. What seperates
a normal vector of data from a vector of time series data is that the latter has some time frequency of
observations. In our case, the time frequency is monthly.
We want now return to the main question of this section, how much does capacity utilizations explain increases
in airline revenue?
Fixing notation, we have:
• t ∈ {01/2000, . . . , 12/2018} = T is month-year combinations
• Revt is revenue for each month-year combination
• Loadt is load factor for each month-year combination
We now want you to do the following:
(1) Create a time series object using the ts command for each series. Be sure to specify the correct
frequency for the data.
(2) Plot an autocorrelation function between our two time series: {Revt}t∈T and {Loadt}t∈T .
(3) Run the following linear regression, reporting the coefficients and R2
:
Revt = α + βLoadt + ϵt
(4) Decompose both series into cyclical and trend components using the decompose command. Plot
seperately these cyclical and trend components for each of the series.
(5) Using the dataframe created in part (3), redo parts (2) and (3). What differs from part (2)? Why?
What can we conclude about the impact of capacity utilization changes on revenues?
## ~~ Problem 2: Part (b) Time Series Analysis ~~ ##
Problem 3 (Coding Exercise)
In class, you have learned about the Wold decomposition, a fundamental result in time series analysis. This
exercise will attempt to walk through Wold’s theorem in practice. We have provided simulated time series
data, in an .rda file called “ts_simulation.rda”, where Yt is the tth observation from our data. To open this
file, use the load command. The name of the dataframe is “sim”.
We now want you to do the following:
(1) Verify the stationarity of the process. Do this in two ways:
(a) “Heuristic” : show that the first-moment and second-moment do not depend on t.
(b) “Testing” : use a Dickey-Fuller test to test for stationarity. Interpret your results.
(2) Estimate three seperate autoregressive models: AR(1), AR(3) and AR(6). For each of the seperate
models, retrieve the residuals, ϵˆ{t,p}, where p is the order of the AR process. Using each set of residuals
of the AR process, estimate an MA(2) model, where ηˆ{t,p,q} are the residuals of this second step.
Verify whether the assumptions of Wold are violated:
Corr[ˆϵ{t,p}Ys] = 0 such that s < t
E[η{t,p,q}] = 0
V ar(η{t,p,q}) = σ
2
(3) To find the right ARMA(p,q) process, we add new lags (increase p), estimate our model, use an
information criteria to determine the increase in fit and stop once new models do not improve fit. To
simplify the problem, assume q = 2. Build a series of ARMA(p,q) models, using the Akaike Information
Criteria (AIC) to find the right p. (Note: A for loop over p would be a good idea).
6## ~~ Problem 3: Wold-Decomposition ~~ ##
7
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做 program、代写 c++设计程...
2024-12-23
comp2012j 代写、代做 java 设...
2024-12-23
代做 data 编程、代写 python/...
2024-12-23
代做en.553.413-613 applied s...
2024-12-23
代做steady-state analvsis代做...
2024-12-23
代写photo essay of a deciduo...
2024-12-23
代写gpa analyzer调试c/c++语言
2024-12-23
代做comp 330 (fall 2024): as...
2024-12-23
代写pstat 160a fall 2024 - a...
2024-12-23
代做pstat 160a: stochastic p...
2024-12-23
代做7ssgn110 environmental d...
2024-12-23
代做compsci 4039 programming...
2024-12-23
代做lab exercise 8: dictiona...
2024-12-23
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!