BSc Econometrics: Take-Home Exam
Fall 2024
To complete this course you are required to write a take-home exam (which has 30 points) in addition to the written exam (which has 60 points).
Deadline to submit your solution 10th of Nov. 2024, 23:00 CET.
How to submit your solution Via Canvas.
What to submit One pdf file including your solutions which should bear your name and student ID on the header of all pages AND one Python script. which you have used to conduct the empirical analyses (incomplete submissions will have zero points). These files should be named as SurnameStudentID; for example, Mustermann5048003.
Editorials Use a font size of 12, 1.5 line-spacing and 2cm margins at each side of the pages when writing your solutions. Figures and/or tables in the pdf file (i) must be numbered, discussed and referred to in the pdf solution and (ii) must have clear and concise caption texts, titles and labels. All pages should be numbered. Your pdf file must not include any Python programs and it must not include screenshots from Python programs. Your Python program should include comments so that it is clear which part of the program is used for which part of the question. Comments which you add in the Python program to answer the questions will have no points (these will not be evaluated). Provide your email address on the first page of the pdf file. Your pdf file should be a maximum of 12 pages (excluding the title page).
Evaluation The evaluation is done individually, that is, submitting solutions which overlap with others will lead to zero points. When evaluating your submission, I may contact you via email and ask you to attend a personal meeting for explanations regarding your submitted solutions and programs. Failing to attend any such meeting will result in zero points for take- home exam. Further, Some of the students enrolled in the course are not eligible to write the take-home exam (this is not about those who take this course for the first time). The submission of those who are not eligible will not be evaluated.
Python programs You may use packages, functions or any procedures in your Python pro- grams which we have not covered in the class; however, it is mandatory that you explain these clearly and thoroughly, failing to do so will result in zero points for the corresponding parts.
Honorable Declaration Your pdf solution must include a signed version of the “Honorable Declaration” which you find at the bottom of this document, otherwise your solution will not be corrected, that is, you will receive zero points for take-home exam.
Maximum points 30.
Question Statement
In April 2008, the unemployment rate in the United States stood at 5.0%. By April 2009, it had increased to 9.0%, and it had increased further, to 10.0%, by October 2009. Were some groups of workers more likely to lose their jobs than others during the Great Recession? For example, were young workers more likely to lose their jobs than middle-aged workers? What about workers with a college degree versus those without a degree or women versus men? The data which you should use for this exercise is Employment 08 09 avaible from DropBox together with a detailed description. Use these data to answer the following questions.
A Provide a table of descriptive statistics for all of the variables, including sample aver- age, standard deviation, maximum and minimum value, median, skewness and kurtosis. Briefly explain these measures for some of these variables (focus on those variables you find more interesting and relevant for the analysis; you may wish to read the whole task first and then make your variable choices for explanation in this part).
B Provide boxplots for the variables. Are there any possible outliers in the data? Explain. Note: to generate boxplots using python you may use this page.
C What fraction of workers in the sample were employed in April 2009? Use your answer to compute a 95% confidence interval for the probability that a worker was employed in April 2009, conditional on being employed in April 2008.
D Regress Employed on Age and Age2 , using a linear probability model.
D1 Based on this regression, was age a statistically significant determinant of employ- ment in April 2009?
D2 Is there evidence of a nonlinear effect of age on the probability of being employed?
D3 Compute the predicted probability of employment for a 20-year-old worker, a 40-year-old worker, and a 60-year-old worker.
E Repeat (D) using a probit regression.
F Are there important differences in your answers to (D) and (E)? Explain. Further, provide a plot of the fitted values using both models from (D) and (E) and carefully compare these noting similarities or differences.
G The data set includes variables measuring the workers’ educational attainment, sex, race, marital status, region of the country, and weekly earnings in April 2008.
G1 Construct a table like Table 11.2 of the reference book of the course to investigate whether the conclusions on the effect of age on employment from (D) and (E) are affected by omitted variable bias. Run statistical tests (using 5% type I error) whenever possible (note that when conducting a hypothesis testing, then writing out the hypothesis pair is a must).
G2 Use the regressions in your table from (G1) to discuss the characteristics of workers who were hurt most by the Great Recession.
Appendix
To export regression tables produced by statsmodels you can use the following procedures.
Regression tables from statsmodels to csv
Say you estimate amodel using statsmodels and have saved the estimated results in a variable called results. Then you can use the following piece of program to save the regression table in csv format (which you can easily import it in MSExcel and MSWord):
with open(’TableName . csv’, ’w’) as f:
f. write(results. summary() . as csv())
Regression tables from summary col to csv
To generate tables like the one asked in part (G1), that is to bag many regression results together you can use summary col which you short import using
from statsmodels . iolib . summary2 import summary col
That is, when you have regression results obtain from statsmodels, say you call these result1, result2, ..., result5, then you can generate one summary table (as asked in part (G1) above) which includes result1, result2,..., result5 using the following
ALLresults = summary col([result1, result2, result3, result4, result5])
Note that you may have more than 5 individual regressions results or less than 5 individual regression results; you just pass all the results you wish to bag together to summary col.
When you use summary col to summarize the results of more than one estimated regression equation (let the summary table be called ALLresults), then you can use the following piece of program to save ALLresults as a csv file:
ALLresults. tables[0] . to csv(’TableName. csv’)