End of Semester Exam
MATH 161B Spring 2020 Professor Gottlieb
Tagner et al (in two separate publications in 1979 and 1983) reported analyses of a study aimed at
assessing children’s pulmonary function in the absence and presence of cigarette smoke including expo-
sure to second hand smoke by at least one parent. These papers were some of the earliest attempts to
systematically study the effects of second-hand smoking on children. The variables in the data set are
Fev (forced expiratory volume, the amount of air a child can exhale in the first second of a forceful
exhale, in liters), the age of the child (in years), the sex (coded 0 for female and 1 for male), the smoking
status (coded 0 for not exposed to cigarette smoke, 1 for exposed to cigarette smoke). A total of 654
children participated in the study.
1. For this problem you will only be using the variables Fev and Age.
(a) Use JMP to create a scatterplot of Fev against Age and include the scatterplot in your
exam. Describe the general pattern you observe.
(b) Fit a simple linear regression model with Fev as response and Age as predictor. Include
the JMP output (Summary of Fit, ANOVA, and coefficients tables) in your exam.
(c) Write down the fitted model and use this model to predict the Fev of a 14 year old.
(d) Would it be reasonable to use this model to predict the Fev of a college student? Justify
your answer.
(e) Interpret the estimated slope βˆ1 in the context of the problem. Also report and interpret a
95% confidence interval for the population slope β1.
(f) Is there a meaningful interpretation of the intercept parameter in this problem? If so, provide
this interpretation. If not, explain why.
(g) Find the value of R2 and interpret this value in the context of the problem.
(h) The population model for the line you fit in part (b) can be written as
y = β0 + β1x+
where y=Fev, x=Age and
IID∼ N(0, σ2). Find an estimate for σ2.
(i) Find the values of the largest and smallest residual.
(j) Conduct a residual analysis to investigate the model assumptions. Use the studentized resid-
uals and include the qq-plot the studentized residuals vs. predicted values plots in your
exam. Are any model assumptions violated? If so how?
(k) Transform the response variable Fev using a square-root transformation. That is, fit a new
model using
√
Fev instead of Fev as the response. Conduct a residual analysis on this
updated model and include the two plots mentioned in part (j). Do the model assumptions
seem reasonably satisfied in this case?
1
2. For this problem you will use all the variables in the data set with x1 = Age, x2 = Sex, and
x3 = Smoke. Continue to use the transformed response
√
Fev.
(a) First, consider the multiple regression model with only two predictors: x1 = Age and x2 =
Sex. √
Fev = β0 + β1x1 + β2x2 + β12x1x2 + .
Write out the assumptions for and specify the model for boys and girls separately.
(b) Fit this model in JMP and include a copy of the coefficients table in your exam. (Friendly
Reminder: By default, JMP will center interaction terms. If you prefer to fit a model without
centering use the following: In the Fit Model dialog, you will find in the upper left corner of
the dialog window there is a red triangle right next to the words “Model Specification.” Click
on that red triangle to bring up a context menu which includes as the first entry “Center
Polynomials.” Uncheck this box.)
(c) Write down the fitted model for males. Use this model to predict the
√
Fev for a 10 year
old boy.
(d) For a fixed period of development (for example 1 year) is the average change in
√
Fev the
same for boys and girls? Refer to the model specified in part (a) and identify an appropriate
hypothesis test involving one of the β parameters. State both the null and the alternative
hypotheses.
(e) Conduct the test from part (d). Find the test statistic value and corresponding p-value from
the JMP output, and formulate a conclusion in the context of the problem.
(f) Finally consider the full multiple linear regression model with the following interaction terms:
√
Fev = β0 + β1x1 + β2x2 + β3x3 + β12x1x2 + β13x1x3 + β23x2x3 + β123x1x2x3 +
How many different regression lines do we fit in this model? Write down the regression line
equation for each sub-population. What are the relationships (if any) on the intercepts and
slopes of the individual regression lines?
(g) Fit the model from (e) in JMP. Include a copy of the coefficients table in your exam. Which
of the βs are significantly different from zero (at significance level α = 0.05) given the other
predictors are in the model?
(h) Write down the fitted line that you would use to predict the average
√
Fev of boys who
were not exposed to second-hand smoke. Use this fitted model to to predict the Fev of a
non-exposed 10 year old boy.
(i) Write down the model again, but omitting the β terms that are not significantly different
from zero. What does the exclusion of these terms mean for the resulting regression model?
That is, describe how the relationships between slopes and intercepts in the updated model
differ from the ones you have described in (f). In particular, which, if any, of the regression
lines in this new model are parallel or share an intercept? (You do not need to fit this model
in JMP.)