DEPARTMENT OF COMPUTER SCIENCE
FITE7410 Financial Fraud Analytics
First Semester, 2024-2025
Assignment 1 – Exploratory Data Analysis (EDA)
(Due Date: 11 Oct, 2024 (Fri) 23:59)
Assessment Criteria:
· Plagiarism: Please follow the guidelines laid down by our department.
· You are allowed to discuss the assignment with your classmates, however, you should submit your individual work. Any direct copy and paste is PROHIBITED and would be considered as PLAGIARISM.
· Assignments would be marked based on the logic, presentation and understanding of the problem; not only on accuracy.
· LATE PENALTIES: 50% of assignment marks will be deducted for late submissions. 0 marks if the submission is later than 2 weeks.
Objectives of this assignment:
· Perform. data cleaning and preparation.
· Explore and visualize the data to identify patterns and trends.
· Engineer new features based on domain knowledge or insights from EDA.
· Prepare a report summarizing the findings from EDA.
Instructions of this assignment:
1. (50%) Exploratory Data Analysis
a. Use the provided dataset for the mini-case study.
Download the dataset (A1_data.csv) from Moodle, which is a modified version of IEEE-CIS Fraud Dataset.
b. Using the R package, conduct exploratory analysis of the dataset downloaded.
· Identify and handle missing values, outliers, and inconsistencies, if applicable.
· Explore the distribution of features (e.g. univariant, bi-/multi-variant analysis) using histograms, box plots, scatter plots, correlation plots, etc.
· Create new features that may be relevant for fraud detection.
NOTE: A sample R script. is provided, but you still need to complete the program. Or you can build the model by yourselves and use whatever library you like.
2. (50%) Write a short report on the following:
a. Describe the dataset based on the EDA result, including:
· A description of the data cleaning and preparation process.
· Visualizations of the data, with clear labels and explanations.
· A discussion of the key findings from EDA, including insights and potential hypotheses.
· A description of the engineered features and their rationale.
NOTE: The short report should consist of a main body of maximum 2-3 pages, focusing on your analysis and insights. Additional figures and diagrams can be included in a separate Appendix to support your report.
3. Submission on Moodle:
a. R language script.
b. A pdf version report
NOTE: Report submissions will be checked for similarity using Turnitin.