MTH783P Time Series Analysis for Business Spring 2020
Assessed Coursework Silvia Liverani
The dataset for this assessment is a modified version of the Air Quality Data Set from the
UCI Machine Learning Repository. This dataset contains the responses of a gas multi-
sensor device deployed on the field in an Italian city. The dataset contains 390 instances
of daily responses from an array of several metal oxide chemical sensors embedded in
an Air Quality Chemical Multisensor Device. The device was located on the field in a
significantly polluted area, at road level, within an Italian city. Data were recorded from
March 2004 to April 2005 (one year). Ground Truth daily averaged concentrations for
Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) are provided, together with
information on weather conditions. Missing values are tagged with the value -200. The
description of the variables is available in Table 1.
Table 1: Description of the variables
Date Date (dd/mm/yyyy)
NOx True hourly averaged NOx concentration in ppb
NO2 True hourly averaged NO2 concentration in microg/m3
Temp Temperature in °C
RH Relative Humidity (% )
AH Absolute Humidity
The dataset is available on QMplus as AirQualityCoursework.csv. Use R to analyse
the dataset and address the following tasks.
1. (5 points) Split the data into two datasets: a training dataset and a test dataset. The
aim will be to use the training dataset to forecast the value of NOx concentration
daily in January 2005. Therefore, the training dataset should incude the first 296
observations (until the last observation of 2004). The test dataset should include the
31 daily observations for January 2005.
2. (25 points) Explore the training dataset: plot and produce summary statistics to
identify the key characteristics of the data and produce a report of your main findings.
The topics that you might choose to discuss include: possible issues with the data
collection, identification of possible outliers or mistakes in the data, role of missing
data (if any), distribution of the variables provided, relationships between variables.
3. Fit a statistical model to the training data and use it to forecast the NOx concentra-
tion every day in January 2005.
(a) (20 points) How did you decide which model to fit? Include details of other
models that you tried, if any.
1
(b) (10 points) What are the underlying assumptions of the model that you have
chosen? Carry out a residual analysis to ensure that the assumptions are satis-
fied.
(c) (10 points) Forecast the NOx concentration every day in January 2005 and dis-
cuss the results.
(d) (10 points) Discuss any weaknesses of this analysis.
4. (10 points) All tables and plots that you include in your report should be repro-
ducible. Therefore, include in your submission on QMplus a text file with the R
commands that can be used to reproduce your results, including tables and plots.
This text file should include all and only lines of code used to produce results pre-
sented in the report and it should be written in a clear and readable way.
5. (10 points) Marks will be given for the overall presentation of the coursework, the
quality of figures and writing.
All modelling and forecasting choices and assumptions must be justified.
Requirements for the coursework submission:
• The submission deadline is 15:00 on Tuesday 28th April.
• The submission should include a document in .pdf format containing the answers
to questions 2 and 3 (with a 3-page limit, including figures and discussions) and a
text file (with extension .txt) containing the R-code used for the results presented
in the report. Minimum fontsize is 12.
• While discussing the coursework with your classmates is encouraged, the sub-
mission must be your own independent work. Every submission will be checked
for plagiarism using an automated system. Please refer to the QMUL Academic
Regulations for more information about the definition of plagiarism and the re-
lated penalties: https://qmplus.qmul.ac.uk/mod/book/view.php?id=1007932
chapterid=103814
• The policy for late submissions of the School of Mathematical Sciences will be used.
You can read the policy here: https://qmplus.qmul.ac.uk/mod/book/view.php?
id=1007932chapterid=103810