首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
讲解COMP30027程序、辅导CSS,Java语言编程、Python编程辅导 辅导Web开发|讲解Database
项目预算:
开发周期:
发布时间:
要求地区:
School of Computing and Information Systems
The University of Melbourne
COMP30027, Machine Learning, 2021
Project 2: How long will it take to cook this?
Task: Build a classifier to predict the cooking time of recipes
Due: Stage I: Wednesday 19 May, 5pm UTC+10 (Australian Eastern Standard Time)
Stage II: Wednesday 26 May, 5pm UTC+10 (Australian Eastern Standard Time)
Stage I: Friday 21 May, 5pm UTC+10 (Australian Eastern Standard Time)
Stage II: Friday 28 May, 5pm UTC+10 (Australian Eastern Standard Time)
Submission: Stage I: Report (PDF) and code to Canvas; test outputs to Kaggle in-class competition
Stage II: Peer reviews and reflection to Canvas
Marks: The Project will be marked out of 20, and will contribute 20% of your total mark.
Groups: Groups of 1 or 2, with commensurate expectations for each (see Sections 2 and 6).
1 Overview
The goal of this Project is to build and critically analyse supervised Machine Learning methods, to predict the
cooking time for recipes based on their steps, ingredients and other features. The cooking time of a recipe has
been categorised into three classes, corresponding to quick, medium and slow.
This assignment aims to reinforce the largely theoretical lecture concepts surrounding data representation, classifier
construction, and evaluation, by applying them to an open-ended problem. You will also have an opportunity
to practice your general problem-solving skills, written communication skills, and creativity.
This project has two stages. The main focus of these stages will be the written report, where you will demonstrate
the knowledge that you have gained and the critical analysis you have conducted in a manner that is
accessible to a reasonably informed reader.
2 Deliverables
More details about deliverables are given in the Submission (Section 6).
Stage I:
1. Report: an anonymous written report, of 1000-1500 words (for a group of one person) or 2000-2500
words (for a group of two people)
2. Output: the output of your classifiers, comprising predictions of labels for the test instances, submitted
to the Kaggle1
in-class competition described below.
3. Code: one or more programs, written in Python, which implement machine learning models, make
predictions, and evaluate the results.
Stage II:
1. Peer review: reviews of two reports written by other students, of 200-300 words each (for a group of one
person) or 300-400 words each (for a group of two people).
2. Reflection: a written reflection piece of 400-600 words. This deliverable is individual work.
1https://www.kaggle.com/
1
3 Terms of Use
The data has been collected from Food.com (formerly GeniusKitchen), under the provision that any resulting
work should cite this resource:
Generating Personalized Recipes from Historical User Preferences. Bodhisattwa Prasad Majumder,
Methods in Natural Language Processing and the 9th International Joint Conference on Natural
Language Processing (EMNLP-IJCNLP), 2019.
This reference must be cited in the bibliography. We reserve the right to mark any submission lacking this
reference with a 0, due to violation of the Terms of Use.
Please note that the dataset is a sample of actual data posted to the World Wide Web. As such, it may contain
information that is in poor taste, or that could be considered offensive. We would ask you, as much as possible,
to look beyond this to the task at hand. For example, it is generally not necessary to read individual records.
The opinions expressed within the data are those of the anonymised authors, and in no way express the official
views of the University of Melbourne or any of its employees; using the data in an educative capacity does not
constitute endorsement of the content contained therein.
If you object to these terms, please contact us (kris.ehinger@unimelb.edu.au or
ling.luo@unimelb.edu.au) as soon as possible.
4 Data
The data files are available via Canvas, and are described in a corresponding README.
The recipes are collected from Food.com2
, which is a platform that allows the user to publish recipes and
comments on others’ recipes. In our dataset, each recipe contains:
• recipe features: name, ingredients, steps, number of steps, and number of
ingredients
• text features: produced by various text encoding methods for name, ingredients, and steps. Each
feature is provided as a single file with rows corresponding to the file of recipe features.
• class label: the preparation time of a recipe duration (3 possible levels, 1, 2 or 3)
You will be provided with training set and a test set. The training set contains the recipe features, text features,
and the duration, which is the “class label” of our task. The test set only contains the recipe and text features
without the label.
The files provided are:
• recipe train.csv: recipe features and class label of training instances.
• recipe test.csv: recipe features of test instances.
• recipe text features *.zip: preprocessed text features for training and test sets, 1 zipped file for each text
encoding method. Details about using these text features are provided in README.
5 Task
You are expected to develop Machine Learning models to predict the preparation of a recipe based on its features
(e.g. name, ingredients, steps etc.). You will implement and compare different machine learning models and
explore the effective features for this task.
2https://www.food.com/
2
• The training-evaluation phase: The holdout or cross-validation approaches can be applied on the training
data provided.
• The test phase: the trained classifiers will be evaluated on the unlabelled test data. The predicted labels
of test cases should be submitted as part of the Stage I deliverable.
Various machine learning techniques have been (or will be) discussed in this subject (0R, Naive Bayes, Decision
Trees, kNN, SVM, neural network, etc.); many more exist. You may use any machine learning method you
consider suitable for this problem. You are strongly encouraged to make use of machine learning software
and/or existing libraries (such as sklearn) in your attempts at this project.
In addition to different learning algorithms, there are many different ways to encode text for these algorithms.
The files in recipe text features *.zip are some possible representations of the name, ingredients and steps of
recipes we have provided. For example, one of the encoding method is CountVectorizer in sklearn,
which converts text documents into “Bag of Words” – the documents are described by word occurrences while
ignoring the relative position information of the words. You can use these representations to develop your
classifiers, but you should also feel free to extract your own features from the raw recipe features, according to
your needs. Just keep in mind that any data representation you use for the text in the training set will need to be
able to generalise to the test set.
6 Submission
The report, code, peer reviews and reflections should be submitted via Canvas; the predictions on test data
should be submitted to Kaggle.
Stage I submissions will be open one week before the due date. Stage II submissions will be open as soon as
the reports are available (24 hours following the Stage I submission deadline).
6.1 Individual vs. Team Participation
You have the option of participating as a “group” of one individual, or in a group of two. In the case that you opt
to participate individually, you will be required to implement at least 1 and up to 4 distinct Machine Learning
models. Groups of two will be required to implement at least 3 and up to 5 distinct Machine Learning models,
of which one is to be an ensemble model – stacking based on the other models. The report length requirement
also differs, as detailed below:
Group size Distinct models required Report length
1 1–4 1,000–1,500 words
2 3–5 2,000–2,500 words
If you wish to form a group of 2, only one of the members needs to register on Canvas by Wednesday 5 May,
via the survey “Assignment 2 Group Registration”. For a group of 2, only one of the members needs to
submit deliverables.
Note that once you have signed up for a given group, you will not be allowed to change groups. If you do not
register before the deadline above, we will assume that you will be completing the assignment as an individual,
even if you were in a two-person group for Assignment 1.
6.2 Stage I: Report
The report should be 1,000-1,500 words (groups of one person) or 2,000-2,500 words (groups of two people)
in length and provide a basic description of:
1. the task, and a short summary of some related work
3
2. what you have done, including any learners that you have used, or features that you have engineered.
This should be at a conceptual level; a detailed description of the code is not appropriate for the report.
The description should be similar to what you would see in a machine learning conference paper.
3. evaluation of your classifiers.
You should also aim to have a more detailed discussion, which:
4. Contextualises the behaviour of the method(s), in terms of the theoretical properties we have identified in
the lectures
5. Attempts some error analysis of the method(s)
And don’t forget:
6. A bibliography, which includes the paper listed in Terms of Use, and other related work
Note that we are more interested in seeing evidence that you have thought about the task and investigated the
reasons for the relative performance of different methods, rather than in the raw scores of the different methods.
This is not to say that you should ignore the relative performance of different runs over the data, but rather that
you should think beyond simple numbers to the reasons that underlie them, and connect these to the theory that
we have discussed in this class.
Reports must be submitted in the form of a single PDF file. If a report is submitted in any format other than
PDF, we reserve the right to return the report with a mark of 0.
To facilitate anonymous peer-review, your name and student ID should not appear anywhere in the
report, including the metadata (filename, etc.).
6.3 Stage I: Predictions of test data
To give you the possibility of evaluating your models on the test set, we will be setting up this project as a Kaggle
in-class competition. You can submit results on the test set there, and get immediate feedback on your model’s
performance. There is a Leaderboard, that will allow you to see how well you are doing as compared to other
classmates participating on-line. The Kaggle in-class competition URL and instructions will be announced on
Canvas shortly.
You will receive marks for submitting (at least) one set of predictions for the unlabelled test dataset into the
competition; and get basically reasonable accuracy, e.g. better than Zero-R. The focus of this assignment is
on the quality of your critical analysis and your report, rather than the performance of your Machine Learning
models.
6.4 Stage II: Reviews
During the reviewing process, you will read two submissions by other students. This is to help you contemplate
some other ways of approaching the project, and to ensure that students get some extra feedback. For each
report, you should aim to write 200-400 words total (200-300 words if you work alone, or 300-400 words if
you work in a group of two people), responding to three “questions”:
• Briefly summarise what the author has done
• Indicate what you think that the author has done well, and why
• Indicate what you think could have been improved, and why
Please be courteous and professional in the reviewing process. A brief guideline for reviewers published
by IEEE can be found https://www.ieee.org/content/dam/ieee-org/ieee/web/org/members/
students/reviewer_guidelines_final.pdf.
4
6.5 Stage II: Reflections
A comprehensive written reflection piece summarising your critical reflection on the following topics within
400-600 words.
1. the process of completing this project
2. things that you are satisfied with and those can be improved in your Stage I deliverables, e.g. modelling,
evaluation, analysis and discussion.
3. if you have worked in a group of two people, what are the individual contributions?
The reflection report is individual and not anonymous. Everyone must submit their own reflection on Canvas.
7 Assessment Criteria
The Project will be marked out of 20, and is worth 20% of your overall mark for the subject. The mark
breakdown will be:
Report 14 marks
Performance of classifier 2 mark
Reviews 2 marks
Reflection 2 marks
TOTAL 20 marks
The report will be marked according to the rubric, which will be announced via the Canvas.
The performance of classifier (2 marks) is for submitting (at least) one set of model predictions to the Kaggle
competition; and get basically reasonable accuracy, e.g. better than Zero-R.
Since all of the documents exist on the World Wide Web, it is inconvenient but possible to “cheat” and identify
some of the class labels from the test data using non-Machine Learning methods. If there is any evidence of
this, the performance of classifier will be ignored, and you will instead receive a mark of 0 for this component.
The code will not be directly assessed, but if you do not submit it, it will be assumed that you are attempting
to circumvent the Machine Learning requirement, and you will receive a mark of 0 for the performance of
classifier.
8 Using Kaggle
The Kaggle in-class competition URL will be announced on Canvas shortly. To participate the competition:
• Each student should create a Kaggle account (unless they have one already) using your Student-ID
• You may make up to 8 submissions per day. An example submission file can be found on the Kaggle site.
• Submissions will be evaluated by Kaggle for accuracy, against just 30% of the test data, forming the
public leaderboard.
• Prior to competition close, you may select a final submission out of the ones submitted previously – by
default the submission with highest public leaderboard score is selected by Kaggle.
• After competition close, public 30% test scores will be replaced with the private leaderboard 100% test
scores.
5
9 Changes/Updates to the Assignment Specifications
We will use Canvas to advertise any (hopefully small-scale) changes or clarifications in the assignment speci-
fications. Any addendums made to the assignment specifications via Canvas will supersede information contained
in this version of the specifications.
10 Late Submissions
Late submissions will bring disruption to the reviewing process. You are strongly encouraged to submit by the
date and time specified above. If circumstances do not permit this, then the marks will be adjusted as follows:
• Each day (or part of the day) that the report is submitted after the specified due date and time for Stage I,
10% will be deducted from the marks available, up until 7 days (1 week) has passed, after which regular
submissions will no longer be accepted. A late report submission will mean that your report might not
participate in the reviewing process, and so you will probably receive less feedback.
• Any late submission of the reviews will incur a 50% penalty (i.e. 1 of the 2 marks), and will not be
accepted more than 7 days (1 week) after the Stage II deadline.
• Any late submission of the reflection will incur a 50% penalty (i.e. 1 of the 2 marks), and will not be
accepted more than 7 days (1 week) after the Stage II deadline.
11 Academic Honesty
While it is acceptable to discuss the assignment with others in general terms, excessive collaboration with
students outside of your group is considered cheating. Your submissions will be examined for originality and
will invoke the University’s Academic Misconduct policy here either inappropriate levels of collaboration or plagiarism are deemed to have taken place.
6
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做ceng0013 design of a pro...
2024-11-13
代做mech4880 refrigeration a...
2024-11-13
代做mcd1350: media studies a...
2024-11-13
代写fint b338f (autumn 2024)...
2024-11-13
代做engd3000 design of tunab...
2024-11-13
代做n1611 financial economet...
2024-11-13
代做econ 2331: economic and ...
2024-11-13
代做cs770/870 assignment 8代...
2024-11-13
代写amath 481/581 autumn qua...
2024-11-13
代做ccc8013 the process of s...
2024-11-13
代写csit040 – modern comput...
2024-11-13
代写econ 2070: introduc2on t...
2024-11-13
代写cct260, project 2 person...
2024-11-13
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!