COMM5000
Data Literacy
Sandbox PwC Distribution Project
Case Study Business Report
Term 1, 2024
Assessment Administrative Details
Turnitin
Turnitin is an originality checking and plagiarism prevention tool that enables checking of submitted written work for improper citation or misappropriated content. Each Turnitin assignment is checked against other students' work, the Internet and key resources selected by your Course Coordinator.
If you are instructed to submit your assessment via Turnitin, you will find the link to the Turnitin submission in your Moodle course site. You can submit your assessment well before the deadline and use the Similarity Report to
improve your academic writing skills before submitting your final version.
You can find out more information on the Turnitin information site for students.
Late Submissions
The parameters for late submissions are outlined in the UNSW Assessment Implementation Procedure. For
COMM5000, if you submit your assessments after the due date, you will incur penalties for late submission unless you have Special Consideration (see below). Late submission is 5% per day (including weekends), calculated from the
marks allocated to that assessment (not your grade). Assessments will not be accepted more than 5 days late.
Extensions
You are expected to manage your time to meet assessment due dates. If you do require an extension to your
assessment, please make a request as early as possible before the due date via the special consideration portal on myUNSW (My Student profile > Special Consideration). You can find more information on Special Consideration and the application process below. Lecturers and tutors do not have the ability to grant extensions.
Special Consideration
Special consideration is the process for assessing the impact of short-term events beyond your control (exceptional circumstances), on your performance in a specific assessment task.
What are circumstances beyond my control?
These are exceptional circumstances or situations that may:
. Prevent you from completing a course requirement,
. Keep you from attending an assessment,
. Stop you from submitting an assessment,
. Significantly affect your assessment performance.
Available here is a list of circumstances that may be beyond your control. This is only a list of examples, and your exact circumstances may not be listed.
You can find more detail and the application form. on the Special Consideration site, or in the UNSW Special Consideration Application and Assessment Information for Students.
PricewaterhouseCoopers (PwC) Distribution Project Statement
Wholesale distribution companies typically purchase products from manufacturers/suppliers and then sell them to retail stores, making them available for consumers. Typically, wholesale distributors deal in large quantities of goods and are set up to have warehouses, distribution centres, and logistic functions to manage and deliver inventory to retail stores. We are interested in better understanding the profitability of wholesale distribution companies.
By examining the profitability of wholesale distribution companies globally over the past five years (PwC to provide an Excel file containing raw data), we aim to determine whether there is a correlation (positive or negative) between their profitability and their local jurisdiction's GDP, along with other key economic metrics or events (e.g., the COVID- 19 pandemic, Ukraine war, Chinese property sector crisis, global inflation, interest rate rises, etc.). If such correlations exist, what might be the reasons? Please provide both quantitative and qualitative analysis to support your findings.
Additionally, considering their straightforward business model, wholesale distributors are not typically involved in other key business functions such as manufacturing, R&D, retail trade, etc. Can researchers review publicly available information of major global distribution companies and validate their key functions, assets, and risks across various jurisdictions (e.g., comparing the activities performed, assets held, and risks borne by wholesale distributors based in the US vs. China)? This comparison may reveal other drivers of profitability. Please also provide any supporting analysis for these additional considerations.
The key jurisdictions we are interested in are the US, UK, China, Japan, South Korea, Australia, and New Zealand.
The context of COMM5000
This project presents a genuine business inquiry that PwC is exploring. As a consultant engaged by PwC, you are tasked with employing the COMM5000 data analysis toolkit, encompassing both descriptive and inferential statistical methods, to dissect and interpret complex datasets.
This Sandbox project tackles a real-world data problem and provides a rare opportunity to acquire and hone analytical skills that are highly sought after in the workplace. Engaging with actual data from a leading global firm allows you to bridge the gap between theoretical knowledge and practical application, cultivating competencies that are quintessential for business graduates. This hands-on experience aligns perfectly with the UNSW program learning outcomes. It aims to instil a deep understanding of data interpretation and strategic decision-making, ensuring that graduates are well- equipped to meet the evolving demands of the global business.
The work will be scaffolded into two milestones (M1 and M2, each worth 20%) and a final project report (60%). Each milestone will require you to apply what you have learned to address specific aspects of the data. Typically, M1 consists of exploratory data analysis, while M2 focuses on identifying hypotheses and formulating key inferential questions. The final project report will utilize insights from M1 and M2 to model the data and answer the project's questions.
PwC schedule of engagement
It is imperative that you attend these sessions, where PwC delegates will conduct live synchronous sessions to discuss the importance of analysing the factors affecting the profitability of distribution companies to their operations. You are encouraged to ask questions and discuss any project aspects during these sessions. Please note, these sessions will not be recorded.
1) Week 3, Monday 26th February (Stream1 at 12:00pm and Stream 2 at 4:00pm): A 30-minute presentation followed by a 15-minute Q&A.
2) Week 9, Monday 8th April, at 12:00pm (Stream 1) and 4:00pm (Stream 2):: A 30-minute Q&A and mentoring session.
Sandbox Project Showcase at PwC
Students who express interest in their assessment submission may be considered for the Project Showcase at PwC. High- Of the students who express interest, the highest performers in the assessment will be placed into teams (based on common countries, industries etc.) and invited to present their analysis and insights of the PwC distribution project problem in person at PwC’s Sydney office.
Each team will present for up to 15 minutes followed by 5 minutes of Q&A. In addition to their data literacy, students are invited to showcase their sense making capabilities in the presentation.
To facilitate this, teams may choose to investigate additional sources of information (e.g. IBISWorld Industry Reports, economics journals etc.) to assist them in developing a narrative of their choice of data analysis approach, the reasons behind the identified data trends / correlations (or apparent lack thereof) and what it may mean for businesses in the industry making decisions for the future.
Note that this presentation is not part of COMM5000 assessments. This is an important career opportunity to be invited to showcase your work to an industry leader. A selection of the five best reports in Week 11. Those selected are invited to present their analysis of the PwC distribution project problem in person.
Business Report
Report details
Week 11, Friday 26th April 11:59 PM
60%
Case Study business report. Individual work. Reports will be checked for plagiarism.
Maximum 2500 Words (excluding tables, charts, etc)
Via Moodle course site
Description of assessment task
In this final report, you will use the information you have collected in your analysis in M1 and M2 to build a linear regression model. This model will capture the key objectives we have been pursuing since M1.
(1) We want to investigate what company characteristics are determinant factors of its profitability.
(2) We want to know if the country of jurisdiction has a premium or effect on a company's profitability.
(3) We want to address the question of whether Covid19 has affected the profitability of wholesale companies.
In M2, you have performed a series of hypothesis testing to see whether there is evidence that profits differ between countries (at least for some industries) and whether there was a significant change from 2019 to the Covid year 2020. Hypothesis testing is very informative but does not provide us with a means to control for confounding effects. For example, suppose you conclude that you reject a null hypothesis of equal population means of operating revenues growth rates between the UK and Australia. In that case, you can’t conclude that this difference is due solely to the country's effect. Other factors may affect revenues even if you control for the country of jurisdiction. Meaning that rejecting the null hypothesis here is driven by factors other than differences in the country.
To provide a ceteris paribus analysis, we need to use a regression model to identify and estimate the country effect (if any) while controlling for other factors. Other factors are defined by the data we are given. We will use the information on the company characteristics to control for the factors that may drive profitability and estimate the effect of the country of jurisdiction. The same goes for the Covid19 effect. We can estimate this effect using dummy variable(s) while controlling for company characteristics.
DATA Considerations
(1) Profitability Variables
For the dependent variable in your regression model, choose the variables you have already analysed in M2.
i. Operating Revenue ($’000)
ii. EBITDA: earnings before interest, taxes, depreciation, and amortisation
https://www.investopedia.com/terms/e/ebitda.asp
Team PwC: Those who want to target PwC (whom I call TeamPwC hereinafter ) must consider analysing multiple profit variables. In the Q&A session, PwC mentioned the following profits variables as targets: EBIT/Revenue, EBIT/Total Cost, EBIT/Operating Costs, Gross Profit/Revenue, Gross Profit/Operating Expenses, Gross Profit/COGS, EBIT/Total Assets. You don’t need to have all of them, but my understanding is if you consider 2-3 variables, that should align with their expectations.
(2) Countries for analysis
You need to use the TWO countries allocated to you in M1 and M2 to analyse the country effect.
Team PwC: To make your analysis more complete, you should consider at least: Australia, UK, and China.
Note We use the insights from M2 to motivate using dummy variables to capture the country effect. But none of the computations done in M2 will be used in the regression model.
(3) Company Classification
Use the THREE industry selection allocated to you in M2.
Team PwC: PwC seems to like the idea of using some classification of the companies in each country. The wording they have used is as follows: ‘ ..part of the exercise where looking across a few regions; the students find industries with a meaningful number of companies in each jurisdiction (i.e., pharmaceutical goods distribution is common while tobacco distribution is very uncommon) SIC codes can be grouped for ease of analysis - students should go by the broad category, i.e., automobile vehicles industry and automobile parts industry can be grouped.’ Other classification like company size is encouraged if you find more meaningful.
Statistical Analysis
Let Yi be (your selected) profit variable for the wholesale company i. The aim is to build and estimate a linear regression model in the form.
Yi = β0 + β1X1i +. . . +βkXk + i
Task (A): Within-Country Analysis (choice of per industry or pooled industries)
(1) We investigate what company characteristics determine factors for the selected profits variable Yi in for example Australia. First, consider the year 2019 and estimate a regression model for Y.
How do you select what X variables go into the regression? One simple way is to start by putting all the variables in the Australian dataset in the regression. Then perform a process of eliminating those that are not statistically significant.
Model_Australia2019: Consider all the 2019 company variables and the past years' variables as regressors, including the Y variable's past values.
(i) Estimate the model and use the estimation results to remove all the variables that are not statistically significant.
(ii) Repeat the regression with these significant variables and check if all are statistically significant.
(iii) Repeat this process until you get a model where all regressors are significant.
(2) Model_Australia2020: Now, you can use your selected set of regressors from the 2019 model as your regressors
for the 2020 model. Estimate this model and check if your regressors are all significant.
(3) Pooled Model for Covid19 Effect for Australian Wholesale companies:
To estimate the Covid19 effect in the case of Australia, we have to pool both 2019 and 2020 observations and estimate one full model.
In the Excel spreadsheet for Australia:
(i) stack the column for 2019 on top of the 2020 columns for both the Y and X variables.
(ii) create a dummy variable covid19i that takes the value ‘0’ for each ‘i’ observation from the year 2019 and
takes the value ‘1’ for each observation ‘i’ from the year 2020.
(iii) Regress the stacked Yi on the stacked Xi and the covid19i
Table 1: Example of regression table
You can use a table similar to the one above for summarising the results of the three regression models in (1)-(3).
(4) Analysis of the within-country results for Australia: using the results from the three estimated regression models,
what are the determinant factors for profits?
(i) Consider two of the determinant factors in 2019 and interpret the meaning of their estimated coefficients. What is the effect of a unit change of these factors on the profit variable Y?
(ii) Is there a change in the factors determining Y when you compare it to the 2020 results? Explain.
(iii) Lastly, what is the estimated effect of Covid19 as captured by the dummy variable? Interpret the coefficient on the dummy variable covid19i
Team PwC: You can analyse the within-country models by company classification. You may find that what affects one class of companies doesn’t affect another, and you may find some groups of companies are more affected by Covid19. PwC didn’t seem to require an analysis of all industries but a selection of interesting classes or industries. Those more ambitious can keep the data together and add a class dummy. If you have, say 3 classes or industries, you can use a dummy classi that takes 0 if the company ‘i’ belongs to class 1, 1 if it belongs to class 2, 3 if it belongs to class 3.
Task (B): Between-Country Analysis
Consider now the TWO countries' data. We will use the determinants (regressors) we have selected in Task (A) for Australia and use them as a starting set of regressors for the between-country models. For illustration in this document, assume that the TWO countries are Australia and the UK.
(1) Consider the year 2019, estimate a regression model of Y on the selected regressors in (A) for both the Australia and UK.
(2) Country effect: To estimate the country effect, we need to stack the data from the TWO countries in one file. We use a dummy variable to capture the regional effect or effect of the country of jurisdiction.
Pooled Model: create two dummies, one for each country: AustTaliai = 1 if company ‘i’ is in Australia and 0 otherwise; and uki = 1 if company ‘i’ is in the UK.
Caution! Beware of the dummy variable trap!
Estimate the THREE regression models in (1) and (2) above. You can use a Regression Table to present the results.
Table 2: Between-Country Regressions
Analysis: Using the estimated regressions in Table 2:
(i) Discuss differences in the significance of the determinant factors you have identified in Model (1) (Task (A)) with their effects on profits in the TWO countries.
(ii) Interpret the effect of the dummy in the pooled model. What does this effect mean when comparing the effect of the regressors between the countries?
Team PwC: For PwC THREE should be analysed here. However, PwC are interested in the trend over the years. You need to consider a few years instead of just one year. Looking at all 5 or 6 years, you can plot bar charts of the
country effect coefficients and see whether it changes over the years. Please also continue the analysis using the classification of companies you have chosen. Some insights may be found by looking at how industries are affected by different factors depending on the country and over the years. To analyse the effect of the years, you can add
multiple dummy variables of the 6 years 2017 to 2022. Caution! Beware of the dummy variable trap!
Ethics Considerations
Considering what you have learnt in Week 8, are there any ethical considerations around data collection, data analysis and use/implementation of the report’s recommendations? Refer to the PwC code of Ethics:
https://www.pwc.com/gx/en/ethics-business-conduct/pdf/pwc-code-of-conduct-april-2021-v2.pdf
Can you recommend to PwC how to make ethically informed decisions related to your report's statistical analysis and results?
Executive Report
The final report unifies the insights you have gathered in M1 and M2 and this final analysis. It showcases all the statistical techniques you have learnt in COMM5000. This report is a final take on this problem and should include the key results you wish the PwC team to get from the data. Your report will consist of the following core components. You can
restructure the section as you see fit for your report.
. Executive summary This is the punch line of this case study. This should tell PwC what you have found in clear and precise language. This shouldn’t be technical. It should refer to PwC description of the problem and their expectations.
. Introduction: The introduction should present the problem analysed in this case study. The background and the expected outcomes or target questions of the analysis. Your introduction should include some key conclusions from M1 or M2 and how these conclusions provide some basis for developing the model for wholesale company profit characteristics. This is not a copy/paste of all your reports for the first two milestones. It should report the take-home message(s) from the detailed analysis you have completed—no need to re- report the tables and graphs from those milestones.
. Section 1: Data considerations, including what countries you are analysing, any major limitations in the data like a significant % of missing points for some variables/countries. This will inform. the reader that you will restrict the analysis according to data availability.
. Section 2: Analysis of the determinants of profitability (growth rate of operating revenues) in Australia, including your analysis of the covid19 effect. Is there a Covid19 premium? Can you express it in the per cent change of growth in the profit variable?
. Section 3: Analysis of the countries' differences in profitability determinants. How important is the country's effect? Is it the same size across the countries you have analysed? Is there a premium or penalty to a company located in a specific country compared to Australia?
. Section 4: Robustness analysis and model limitations. Any statistical analysis is based on assumptions that ensure that the inference you perform, and the estimation of the model parameters is also correct. Some points
to discuss in your analysis are the key assumptions:
o Zero conditional means: does your model satisfies the exogeneity condition? This will ensure that no confounding factors will bias the estimates of your key target parameters in the model. A brief explanation of what this assumption implies for the context of the model(s) you are estimating. Whether you believe it holds and why? What can be done to ensure it is satisfied if you had more data and more time to develop the model? If it isn’t satisfied, what are the implications on the model estimation and inference?
o Multicollinearity: With the selected set of regressors in your models, check whether the regressors satisfy the assumption of no perfect collinearity. Explain what it means in your context and the implications of the regressors failing the condition on the validity of the inference.
. Concluding remarks. In this section, you should summarise the data's key take-home messages. Your view on ethics considerations and how this impacts the collection of the data and the use of your results can be discussed in conclusion.
(Team PwC: recommendations to PwC on what they should look at when evaluating wholesale company profits reports. You should articulate clearly what the data has to say about the project aims as described in the original project description. If there are aspects you are not investigating, explain why. It could be a lack of data, time, etc)
General advice:
- For each section, you should summarise the key points and state briefly the statistical evidence/argument that support those key points.
- Defer all the Excel regression outputs and any other data analysis output to the appendix. Keep the body of the report flowing with only some storytelling charts that must be included in the body. If you choose the regression tables to summarise the results of the regressions, include these in the analysis of the results in the report.
Submission instructions
- Sign the UNSW declaration form. and attach to your report.
- Team PwC Please add ‘Team PwC’ under your name. COMM5000 teaching team with PwC will select those reports with the most compelling and complete analysis. Those selected will be notified to prepare for the PwC showcase event.
- A submission link on the course Moodle site will open in Week 10, a week prior to the deadline.