EXAM 2
MSCI:3250
SPRING 2020
• This exam comprises 4 questions and is worth 150 points
• You will have 90 minutes to complete this exam:
• Early submission bonus: Submissions before 10:05 PM will receive a 3
point bonus
• Late penalty: Submissions after 10:15 PM will be deducted 3 points for
each minute late
• The exam is open-book, open-notes, and you can use the Internet, but no
communication with other individuals (with the exception of the instructor) is
allowed
• By taking this exam, you agree to abide by the Tippie Honor Code and Honor
Pledge below
HONOR PLEDGE
On my honor, I pledge that during this examination I have neither
given nor received assistance, and that I did not have advanced
knowledge of the exam content.
Specifications
• Submit an R script file with your codes for all questions
o Name your file “lastname_exam2.R”
o Add a comment with your name at the top of the file and comments denoting
each question number
o You may add other comments for clarification
o Add the command rm(list=ls()) at the top of your file to clear the workspace
• The solution for each question must be generated as R variables or plots with specific
names as instructed
o All solutions should be generated by running your codes without any
customization or modification by the instructor
o Load required packages with the library() command. Your script should not
include any unnecessary packages or install() commands
o Assume all input files are in the working directory. Do not include the setwd()
command in your script
Background
April 20th has become known as “Weed
Day”, prompting annual celebrations and
rallies across the country. For this exam,
we will analyze crime and demographic
data from Denver, CO, where recreational
marijuana has been decriminalized.
Carefully review all provided files
(“mj_crimes1.csv”, “mj_crimes2.csv”, and
“neighborhoods.txt”) before beginning.
Then answer the following questions:
1. (30 points) Read “mj_crimes1.csv” and “mj_crimes2.csv” into data frames and then
merge them. Do not convert strings to factors. Treat empty cells as missing values.
Output variables:
o part1 (10 pt): data frame created from “mj_crimes1.csv”
o part2 (10 pt): data frame created from “mj_crimes2.csv”
o crime (20 pt): data frame created by vertically merging part1 and part2
(Hint: Assume that the column names in part1 are correct. Should produce a
data frame with 1,203 rows and 12 columns)
2. (50 points) Read “neighborhoods.txt” into a data frame (do not convert strings to factors,
treat empty cells as missing values). Then merge with crime.
Output variables:
o nbhd (13 pt): data frame created from “neighborhoods.txt”.
o mj_df (47 pt): data frame created by horizontally merging crime and nbhd.
Only include neighborhoods that have crime reports. (Hint: Make sure that the
values in the shared columns match. There are 2 values in crime that should be
corrected. Assume that the values in nbhd are correct. Should produce a data
frame with 1,203 rows and 21 columns)
3. (30 points) Analyze marijuana industry-related crimes:
Output variables:
o industry_table: frequency table that counts the number of crime reports that
were industry or non-industry related for each offense category. Display the
offense categories as rows, industry vs. non-industry as columns.
o industry_mod: logistic regression that models the likelihood of a crime being
marijuana industry-related based on whether the crime was violent, plus the
neighborhood’s population, age, vacant housing units, and home value
o industry_r2: calculate the pseudo R2 for industry_mod using the following
formula
1 − !"##!"##
!"## represents the deviance of the full model !"## represents the deviance of the intercept-only model
4. (40 points) Analyze crime reports by neighborhood:
Output variables and plots:
o nbhd_summary: dplyr summary table that calculates the total crime reports
(“TotalReports”), median age (“MedianAge”) and median poverty rate
(“MedianPoverty”) for each neighborhood
o age_cor: use nbhd_summary to calculate the correlation between a
neighborhood’s total crime reports and median age
o Create a scatterplot that visualizes the relationship between a neighborhood’s
total crime reports and poverty rate. Add appropriate labels/titles. Set the points
to be shaped like a triangle (point up), and filled with the color green (Hint: use
the pch argument to change the marker type)