ECON 3720: Introduction to Econometrics
Problem Set 03
Fall Semester 2024
Due: September 27th 2024
Please submit the problem set no later than 5 PM on September 27th 2024. Submit the problem set to your TA’s mailbox in the Monroe Hall basement. Failure to do so will result in a a grade of 0. Remember to show all of your work. Good luck!
1) Note: This is Problem 2-1 in the Wooldridge textbook. Let kids denote the number of children ever born to a woman, and let educ denote years of education for the woman. A simple model relating fertility to years of education is
kids = β0 + β1educ + ε
where ε is the unobserved error.
a) What kind of factors are contained in ε? Are these likely to be correlated with the level of education?
b) Will a simple regression analysis uncover the ceteris paribus effect of education on fertil-ity? Explain.
2) Note: This is Problem 2-4 in the Wooldridge textbook. The data set BWGHT contains data on births to women in the United States. Two variables of interest are the dependent variable, infant birth weight in ounces (bwght), and an explanatory variable, average number of cigarettes the mother smoked per day during pregnancy (cigs). The following simple regression was estimated using data on n = 1, 388 births:
bwght = 119.77 − 0.514cigs
a) What is the predicted birth weight when cigs = 0? What about when cigs = 20 (one pack per day)? Comment on the difference.
b) Does this simple regression necessarily capture a causal relationship between the child’s birth weight and the mother’s smoking habits? Explain.
c) To predict a birth weight of 125 ounces, what would cigs have to be? Comment.
d) The proportion of women in the sample who do not smoke while pregnant is about 0.85.
Does this help reconcile your finding from part (c)?
Hint: This means that 85% of the women in our sample have cigs = 0. Is there a lot of variation in the (strictly) positive part of the cigs distribution? What would this mean for the statistical properties of our regression slope estimator?
3) Download the data set auto.dta from Canvas. This data set contains data on 74 cars sold in the US in 1978 and various of their attributes.
a) Draw a scatterplot of price versus weight. Describe in words what you see (see the documentation for twoway scatter).
b) Run a regression of price on weight. Report the constant (i.e. the intercept) and slope coefficients, βb0 and βb1.
c) Report a 95% confidence interval for your estimate of the slope coefficient. Is your estimate significant at the 5% level?
d) What is the predicted price of a car weighing 2,500lb? How about a car weighing 4,000lb?
e) What is the expected price difference between two cars, one of which is 500lb heavier?
f) What is the predicted price of a car weighing 500lb? Do you think this number is very meaningful? Why or why not?
4) Download the cps08.dta data set from Canvas. We will consider the gender wage gap using regression analysis.
a) Run a regression of hourly wages (ahe) on gender (female). The variable female is a dummy variable that is equal to 1 if the person is female and equal to 0 otherwise (in this dataset, female = 0 implies that the person is male). Report the intercept and slope coefficient, β0 and β1.
b) What are the average wages of men in the sample? What are the average wages of women in the sample? What is the gender pay gap? Report these numbers using only information from the regression results. What do these numbers correspond to in terms of β0 and β1?
c) What percent of the variability in hourly wages does a person’s gender explain in the CPS data? What does this imply about the economic significance of gender for determining wages?
d) Report a 95% confidence interval for the gender wage gap. Do you reject the null hy-pothesis that the gender wage gap is 0 at the 5% significance level? Why?
e) In your opinion, is your estimate of the gender wage gap economically significant? Why?