Hands on Data Analytics - Group Project Guidelines
Submission deadlines and format:
· Proposal: max 10 slides (.pptx or .pdf) excluding appendix (check template). Deadline November 9, 2024
· Final presentation: max 20 slides (.pptx or .pdf) excluding appendix (check template). Deadline November 30, 2024
Submission notes:
· One team member shall submit the project for the for the whole group.
· Relevant material that are not presented during your oral presentation shall be included in an Appendix section of the slide deck (for example screenshot of your KNIME workflow, data descriptions, additional analysis results,…). This will be subject to evaluation.
· All submissions will be checked for plagiarism. It is better you do a poor work rather than copy a good work
Project Scope
This is a team project. The recommended size for the team is 4-5 people (ask approval to your instructor if you want to form. a bigger or a smaller team).
Your goal is to show your understanding and your practical implementation skills of a data analysis project covering the following 3 main area:
1. Import data and create basic exploratory plots (descriptive statistics and data visualization)
2. Data cleaning, transformation, merging, and plotting
3. Train a machine learning model
Evaluation Rubric
· [15%] Presentation proposal and teamwork.
· [50%] Presentation coverage and style. Includes evaluation of the originality in the presentation and in developing the project work
· [35%] Technical level and overall difficulty: Solution to difficult steps in data analysis will be favourably evaluated (for example cleaning very dirty data)
How to ask for help
· Tutorials after week 8
o Session 1: Tue 13:00-13:50
o Session 2: Thu 13:00-13:50
Guidelines
Choose a Dataset
Available datasets:
· 15 datasets are given in “Group Project” section in iSpace. These datasets have been chosen by our instructors, so their usability and feasibility should be good.
· Alternatively, you may look for one from public dataset hub such as
a. https://www.kaggle.com/datasets
b. https://archive-beta.ics.uci.edu/ml/datasets
Please be careful with the size and quality of the datasets. Some of them are too big/small for our project. Some come with poor quality. If your team would like to choose your own dataset, please check with your instructor for approval, so that the usability of the dataset and the feasibility of the project can be assured.
Rule of thumb – Choosing a dataset that you are interested in and curious about may give you a better chance to succeed in the project.
Choose the Analysis Question
After selecting a dataset, you should decide one or a few analysis questions for your project topic.
· Good questions are asked after you have done exploratory data analysis and basic visualizations. Better questions are asked if you have domain knowledge about the subject. For example, if the project involves home loan data, an understanding of the US real estate market, mortgage rate and default trends as well as general consumer sentiment will definitely help you create interesting questions.
· These analysis questions should be meaningful. That is, the questions your team try to answer should be beneficial to some stakeholders, whether it is yourself, a business organization, or the public.
The list in below provides a few sample questions for your reference. These general questions could be further customized based on your own project and dataset. The list is not exhausted, and you can certainly design your own analysis questions.
c. What types of something (e.g., customers, weather, market, etc.) are we dealing with?
d. What is the probability that something occurs?
e. What are the deciding factors for something to occur?
f. What could happen if some phenomenon happens? (e.g., the beer-diaper example)
…
Recommended Analysis Steps
Before starting the work meet with all team mates. Produce a document organizing your work. For example you could decide to split the work. This document can be included in the presentation slides.
The following steps of the analysis will be subject to evaluation:
Step 1. [Data Import] Import the data in KNIME and understand all different data fields. Identify your modelling goal (data field to predict or main technique to use) and produce a basic data summary plot
Step 2. [Data Preparation] Clean and filter the data and produce new data features (feature engineering) as you think they are useful for your modelling goal.
Step 3. [Modelling] Apply any machine learning model (e.g. regression, classification, clustering,..) that is most appropriate for your modelling goal.
Step 3. [Result] Create plots, tables and an interpretation, summarizing your results.
Project Presentation
· Your group presentation shall last at most 10 + 5 minutes for Q&A. Your instructor can interrupt you to ask question (especially if you are reading a script)
· We recommend that you show most interesting points in your presentation instead of presenting the whole data analysis process. Suggested contents:
- Business problem and data description (Introduction)
- Data preparation you applied
- Modelling techniques you applied
- Result / Conclusion from your analysis
· Use nice plots and images during your presentation.
· All team members must participate in the presentation. Team members who fail to present will be highly penalized and could receive a 0 score.
· Do not read a pre written script during the presentation. However, you can use a note with keywords (flashcards). If your English is not good, try to speak a few short sentences that that convey the message. You can prepare those points in advance.