BISM7217 – 2020 S1 – Assignment 1
Advanced Business
Data Analytics
BISM7217
ASSIGNMENT 1
BISM7217 – 2020 S1 – Assignment 1
Summary
• Type: Project report
• Learning Objectives Assessed: 1, 2, 3, 4, 5
• Due Date: 20 Apr 2020 11 AM
• Deliverable: A written report submitted via TurnItIn and a RapidMiner process
• Weight: 30%
This assignment is an individual assignment. The aim is to provide experience in the steps involved
with creating, evaluating, improving classification models, and finally presenting and interpreting the
model in a business report. You are strongly encouraged to commence this assignment by the end of
the third week of the semester, and you should progress thoughtfully through the steps. Hasty decisions
made early in the design process may result in much more work later.
Feel free to discuss concepts and ideas with peers, but remember your submission must be your work.
Be careful not to allow anyone to copy your work.
Specification
Direct marketing is a form of advertising which allows organizations to communicate directly to
customers through a variety of media, including phone cell calls and emails. As selecting the best set
of clients, i.e., that are more likely to subscribe a product, is a complex task (Nobibon et al., 2011),
various technologies should be employed to improve marketing by focusing on specific customers, thus
allowing companies to build more extended relations aligned with their business strategies (Rust et al.,
2010). Centralizing customer remote interactions in a contact center eases operational management of
campaigns, and communicating with customers through the telephone is one way to conduct direct
marketing activities (Moro et al., 2014). Marketing operationalized through a contact center is called
telemarketing (Kotler et al., 2009). In the banking industry, deciding on the target customers for
telemarketing is of crucial importance, under a growing pressure to increase profits and reduce costs.
Banks are now pressured to increase capital requirements in various ways, including capturing more
long term deposits (Moro et al., 2014).
Under this context, the use of predictive modeling based on a previous data to predict the result of a
telemarketing phone call to sell long term deposits is a valuable tool to support client selection decisions
of bank campaign managers. As an analyst in BOP, a Portuguese bank, you are going to propose a
classification model that can predict the result of a phone call to sell long term deposits. Such a model
is valuable to assist managers of BOP bank in prioritizing and selecting the next customers to be
contacted during bank marketing campaigns. Your model will help managers, including the Director of
BISM7217 – 2020 S1 – Assignment 1
3
Market Intelligence, to analyze the probability of success. Consequently, the time and costs of such
campaigns would be reduced, and by performing fewer and more effective phone calls, client stress and
intrusiveness would be diminished.
Dataset
The data is related to direct marketing campaigns of BOP bank. The marketing campaigns were based
on phone calls. Often, more than one contact with the same client was required to assess if the product
is of interest to a customer. The provided dataset contains 41188 records and 20 inputs, ordered by date
(from May 2008 to November 2010). The classification goal is to predict if the client will subscribe
(yes/no) a term deposit (subscription variable).
There are 4 types of input variables and only 1 target/label/special variable:
A) Bank client data:
1. Age (type: numeric)
2. Job: type of job (type: categorical)
3. Marital: marital status (type: categorical)
4. Education (type: categorical)
5. Default: has credit in default? (type: categorical)
6. Housing: has a housing loan? (type: categorical)
7. Loan: has a personal loan? (type: categorical)
B) Related with the last contact of the current campaign:
1. Contact: contact communication type (type: categorical)
2. Month: last contact month of the year (type: categorical)
3. Day_of_week: last contact day of the week (type: categorical)
4. Duration: last contact duration, in seconds (type: numeric).
Important note: The duration attribute profoundly affects the output target. For example, if the
duration is ZERO, then y would be most likely “NO”. Yet, the duration is not known before a
call is performed. Also, after the end of the call y is known. Thus, you should discard the
duration attribute if you intend to have a realistic predictive model.
C) Other attributes:
1. Campaign: number of contacts performed during this campaign and for this client (type:
numeric)
Note: This attribute includes the last contact.
2. Pdays: number of days that passed by after the client was last contacted from a previous
campaign (type: numeric)
Note: 999 means the client was not previously contacted.
3. Previous: number of contacts performed before this campaign and for this client (type: numeric)
4. Poutcome: outcome of the previous marketing campaign (type: categorical)
D) Social and economic context attributes
1. Emp.var.rate: employment variation rate - quarterly indicator (type: numeric)
BISM7217 – 2020 S1 – Assignment 1
4
2. Cons.price.idx: consumer price index - monthly indicator (type: numeric)
3. Cons.conf.idx: consumer confidence index - monthly indicator (type: numeric)
4. Euribor3m: euribor 3 month rate - daily indicator (type: numeric)
5. Nr.employed: number of employees - quarterly indicator (type: numeric)
E) Output variable (desired target):
• Subscription: indicates if the client subscribed to a term deposit (type: binary)
Deliverables
Your reports should include the following parts:
• Executive summary: Include those results that are most significant for your strategy
development and recommendations and justify them.
• Introduction or data exploration
• Model building.
• Model evaluation
It is up to you to decide what proportion of your report goes to each part. You may include tables, charts,
or tables of your analysis and models. At the end of your analysis, your RapidMiner process should be
exported to your desktop or laptop in .rmp format and then uploaded along with your report.
The consistency of your .rmp file will be checked with the results in your report. You do not need to
provide the screenshots of your RapidMiner process, as the marker can observe them from your .rmp
file. Consider the following points for designing your process:
• You need to create only one .rmp file with as many operators and outputs that are needed.
• You should not modify “BISM7217_2020_S1_A1_Data.xlsx” file before importing it in
RapidMiner.
• All of your analysis should be done after importing “BISM7217_2020_S1_A1_Data.xlsx” in
RapidMiner, not Excel, or any other analytical tool.
• Process should start with loading “BISM7217_2020_S1_A1_Data.xlsx” file from your
desktop.
Formatting and professionalism
The project report is to be written to a professional standard. This requires a formal writing style – do
not use dot points - and adopt a professional tone. Given the report’s nature, you may choose to write
this essay in the first person. The report must be consistent with the University’s policies on academic
integrity, plagiarism, and consequences as noted below. The report should be typed (in Times Roman
12-point font or larger, single-spaced) and the Word Count should be 1500 words (+/- 10%) in total
length. The Word Count excludes the title page, tables, footnotes and references (if required). The word
limit must be observed or the assessment will be affected as noted in the Rubric. No appendices are to
be provided.
BISM7217 – 2020 S1 – Assignment 1
5
Submission
To be done through Blackboard Assignment Submission and TurnItIn as indicated in Learn.UQ.
Acceptable submission formats are Microsoft Word and PDF formats for the reports and .rmp for the
process. The files MUST be named in the format of BISM7217_StudentLastName_StudentID.pdf (or
a. docx or .doc extension). If your ID is 41724593 and your surname is Mory, the name of your files
would be BISM7217_Mory_41724593.pdf. The written assignment file should not be zipped.
Plagiarism
It is understandable that students talk with each other regularly, and discuss problems and potential
solutions. However, it is expected that the submitted assignment is a unique document – all parts of the
assignment are to be completed solely by the individual student. In cases where an assignment is
perceived to not be a unique work, a loss of marks and other implications can result. For further
information about academic integrity, plagiarism and consequences, please visit:
http://ppl.app.uq.edu.au/content/3.60.04-student-integrity-and-misconduct.
Frequently Asked Questions
Question: How can I format my report?
Answer: The most common approach is considering 4 parts: 1) Executive summary, 2) Introduction 3)
Model building and 3) Model evaluation. You may wish to other sections such as Conclusion (Optional)
or References (Optional).
Question: What should I include in ES?
Answer: Executive Summary (ES) is the essence of your work that should be very brief. Since your
report is a maximum of 1500 words, it is better no to aim for more than 200 words for ES, but again it
is your choice, and it is essential to provide a quality and persuasive report.
Question: What can I discuss in the model building section?
Answer: You can discuss the following items c in this section: How you build various models? If you
changed the parameter, and why? Did you try to improve your models, and how? Could you improve
your models?
Question: What should I include in the model evaluation section?
Answer: How did you evaluate your models? What metrics you used, and why? Which model
performed better, and why you think so? Can you rely on your results, and why?
Question: What are the expectations when describing a Decision Tree (DT)? Do we need to talk about
every branch?
Answer: The advantage of DTs is that they are very intuitive, and you can interpret them by elaborating
on their branch. So, yes, but you do not need to elaborate on all of them. You can pick some more
indicative ones and elaborate on them. You can use model improvement techniques, such as AdaBoost,
Bagging, and Random Forests, along with decision trees and also elaborate on them too.
Question: Do I have to have all the DTs with different configurations/and different model improving
methods in the .rmp file, to show how I tried different modeling? Or is it ok to have only the models
that I am satisfied with and that I decide to use in my report?
Answer: You can only submit the process of the models that you discuss in your report. But it worth
mentioning in your report the additional work you havev done.
Question: How can I export the figures generated in RapidMiner to my report?
Answer: You can use windows snipper.
Question: Which one is more important, accuracy, or presentation? And how high accuracy we are
expected to reach?
Answer: Your approach, the undertaken steps, and their justification are more important than the final
accuracy level. You need to show that you tried your best, but if available data is not enough for
achieving higher accuracy, it is not your fault. It is the maximum that we can learn from the available
data.
Question: Can I upload as many as rmp files?
Answer: We prefer only one process.
Question: Whenever I choose the export process from the File menu and save the process on my
computer, I am unable to find it?
Answer: Make sure to choose your desktop while exporting rmp file.
Administrative Requirements
Consultation sessions
To ensure that an equal and sufficient amount of time is allocated for every student who attends
consultation sessions regarding the practical aspects of BISM7217, the average consultation time
(during busy consultation times) will be limited to 5 minutes per student. The main aim of this restriction
during busy periods is to ensure student equity and minimise waiting time. However, in circumstances
where no other students are waiting, longer consultation times will be provided.
Tutors will advise you of their consultation times during tutorials – these details are also available on
the BISM72117 Blackboard site under “Contacts”.
Submission Date
BISM7217 – 2020 S1 – Assignment 1
7
11 AM 9th April 2020 For each calendar day (i.e. including Saturdays and Sundays) or part thereof after
the submission deadline, a penalty of 5% of the total possible assignment marks will be deducted until
the assignment is submitted.
Deadline extensions
An extension to the assignment deadline will only be considered for legitimate reasons and with
supporting documentation (e.g. medical certificate). A request for an extension is assessed by the
Assessment, Examinations Misconducts Coordinator. You may discuss your situation with your
course coordinator, but you still need to make a formal extension request using the form identified on
the Electronic Course Profile for this course. Extensions will not be granted where the School is not
satisfied; you took reasonable measures to avoid the circumstances that contributed to you not
submitting by the due date. The following are not grounds for an extension:
• holiday arrangements (including overseas travel)
• misreading a due date
• social and leisure events
• moving house
• the pressure of work/competing deadlines
• computer issues
Please refer to the Electronic Course Profile for this course for more detail.
Marking rubric
Your report will be graded on its structure, rationale, arguments, use of academic support/sources, and
overall presentation quality. This assignment is worth 20 marks. The marking rubric on the next page
is designed to reflect a marking schema of 100 points that are scaled back to 20 marks. Part marks are
rounded up or down to the nearest half mark.