首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
代做program编程、代写Python程序语言
项目预算:
开发周期:
发布时间:
要求地区:
297.201-2025 Semester 1 Massey University
1
Project 1
Deadline: Hand in by midnight March 25th 2025
Evaluation: 33% of your final course grade.
Late Submission: Refer to the course guide.
Work This assignment is to be done individually.
Purpose: Gain experience in performing data wrangling, data visualization and introductory data
analysis using Python with suitable libraries. Begin developing skills in formulating a
problem from data in a given domain, asking questions of the data, and extracting
insights from a real-world dataset. Learning outcomes 1, 3 and 4 from the course
outline.
Please note that all data manipulation must be written in python code in the Jupyter Notebook environment. No marks
will be awarded for any data wrangling that is completed in excel.
Also, demonstrating your own skills, critical thinking and understanding is still the most important aspect of assessment,
so you must keep a record of all your AI prompts and outputs, and submit these as an appendix to your completed
assessment. Failure to keep an accurate record of your AI use for an assessment may be a breach of the Use of Artificial
Intelligence in Assessment Policy, and of the Academic Integrity Policy. Refer to Stream as to the level of Generative AI
use that is permitted and what this means. This particular assignment is designated as permitting ‘AI Planning’.
Download this Word document to guide you in creating your AI use statement.
In addition, do not copy the work of others – there are many ways to solve the problems below, and we expect that no
two answers will produce the same code. Copying the work of others (even if object/variable names are changed) will be
considered plagiarism.
The dataset problem domain: Analysis of Professional Tennis Match Results (ATP – men, or WTA - women)
You are asked to download a curated dataset on the topic of professional tennis and use your data wrangling and
visualisation skills covered I the first few weeks, to answer a series of analytics questions. You do not have to be an
expert in tennis to answer these questions and solve this assignment, but you will need to cover some basics. Once you
enter the workforce as a data scientist, you will need to quickly learn about domains previously unknown to you in order
to perform your job, so this is an exercise in practicing how to do this. Some helpful information on tennis and various
tournaments can be found here:
https://www.olympics.com/en/news/tennis-rules-regulations-how-to-play-basics
https://www.tennisleo.com/basic-tennis-rules/
https://thetennisbros.com/tennis-tips/what-are-the-major-tournaments-in-tennis/
Keep in mind that some questions can be interpreted in slightly different ways and so depending on your interpretation
and assumptions, you might come up with slightly different answers – this is perfectly acceptable. The purpose of this
assignment is not to answer all questions in the ‘right’ way, but to develop your technical and problem solving skills.
Therefore, you will not be marked down for having slightly different answers as long as you have stated your
assumptions clearly and have gone about in a technically sound and reasonable manner in answering the questions.
The datasets we’re after can be found below.
Dataset source: http://tennis-data.co.uk/alldata.php
You will need to download this dataset from home since Massey’s filter restricts access to this website due to its
categorization as a gambling site.
297.201-2025 Semester 1 Massey University
2
If for some reason, you’re unable to download it from home, please let us know. As a an alternatively, you can download
the similar data from a GitHub source here, but there are some columns not present in this data, so we would prefer
that you use the tennis-data.co.uk source instead:
For men: https://github.com/JeffSackmann/tennis_atp (use only: atp_matches_
.csv)
For women: https://github.com/JeffSackmann/tennis_wta (use only: wta_matches_
.csv)
Task 1: Wrangling, reshaping, EDA (20 marks)
- Collect data covering 10 years (2015 - 2024) from the above website. Read each excel dataset using Python and
combine into a single dataset.
- Check that all the data has been read. Check that all the data in the combined dataset is in order based on the
date column.
- What other data-checking operations could you perform to make sure that the data is ready for analysis? Use
various approaches to perform sanity checking on the data, including some plotting and discuss.
- Create EDA 6-8 visualisations of the dataset and explain each one. Be curious and creative. Ensure that the plots
are clean and interpreted.
Task 2: Analysis questions and plotting (20 marks)
- Who are the top 10 players by total wins in the dataset, and how many wins do they have? Plot and discuss this.
- Who are the top 10 players according to the largest number of First Round tournament losses across all 10
years? Plot and discuss this.
297.201-2025 Semester 1 Massey University
3
- Identify the 5 biggest upsets for each year in the dataset based on ranking differentials. List player names,
rankings, winner/loser, score, and tournament name and what the difference in the rankings was at the time – a
table is fine.
- Who were the top 10 players at year-end in 2019? How have their rankings changed over the period of 2015 to
2024? Plot and discuss this.
Task 3: Advanced analysis questions (20 marks)
- Which tournaments have had on average the most upsets (where a lower-ranked player defeated a higher ranked player)? List the top 10 and plot their averages.
- Determine who the top 10 ranked players (by ranking) were at the end of 2024. Then calculate their head-to head win-loss record against each other for all the matches they played in 2024. Present this result and discuss.
- List the top 5 players who had the longest winning streaks between 2015 – 2024. List their names, the lengths
of their winning streaks and the year(s) in which they occurred.
- In tennis, each set is played first to 6, but sometimes it is played to 7. A tiebreak is a set that someone wins 7-6
and is different to someone winning a set 7-5. Tiebreaks are stressful and some players perform better than
others in tiebreaks. Count how many tiebreaks each player in the entire dataset has played. Then, calculate the
percentage of tiebreaks that each player has won. List the top 10 players according to the percentage of
tiebreaks won.
Task 4: Open questions and analyses (30 marks)
- Come up with 3 more questions of your own.
- Try to demonstrate the usage of more advanced data wrangling functionalities as you answer the questions like
group by, pivots etc…
- Create several plots and discuss them.
A Jupyter notebook template will be provided for you. Please use it for this assignment.
Hand-in: Submit a single zipped file via the Stream assignment submission link. It should contain one notebook with all
the answers embedded, and an HTML version of your notebook also with its output showing as well in case we have
issues running your code. Also, you must submit the AI use statement.
Use of Generative AI in This Assignment
In industry, AI and online resources are commonly used to improve efficiency and productivity. However, at university,
the primary goal is to develop your understanding and ability to work through problems independently. We need you to
master these skills first, so that you will be able to use the AI tools more effectively and efficiently later on. This means
that while AI can be a helpful tool for learning, it should not replace your own thought process or problem-solving efforts
as it will actually short-circuit your learning and development
.
Allowed Uses of AI for assignment 1
You may use AI along the lines of the following prompts to:
• Understand background knowledge related to professional tennis, tournament structures, and general
concepts about tennis matches.
o Example: “Explain the rules of a tennis match and how scoring works.”
• Seek feedback on your problem-solving approach without directly generating code.
o Example: "I plan to find the top 10 players by total wins using pandas. Does this approach make sense?"
• Clarify error messages or debugging hints, as long as you are the one writing the code.
o Example: “Why am I getting a KeyError in pandas when trying to merge two dataframes?”
• Find alternative ways to visualize data for inspiration, but not for direct copying.
o Example: “What are common ways to visualize win-loss records in sports data?”
297.201-2025 Semester 1 Massey University
4
Prohibited Uses of AI for assignment 1
You must NOT:
• Copy AI-generated code directly into your submission.
• Input the assignment questions directly into AI and use its responses as your own.
• Paraphrase AI-generated explanations/code and present them as original work.
• Ask AI to write step-by-step solutions to any of the assignment tasks.
• Academic Integrity & AI Use Statement
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代写24bsp008 accounting for ...
2025-04-01
代做econ4043 the 1st assignm...
2025-04-01
代写fit5196-s1-2025 assessme...
2025-04-01
代做linc11 winter 2025 synth...
2025-04-01
代做csc 305 spring 2025 assi...
2025-04-01
代写eco1002 business economi...
2025-04-01
代做mec104 experimental, com...
2025-04-01
代做comp9021, trimester 1, 2...
2025-04-01
代做pstat w 174/274 course p...
2025-04-01
代做qmss practicum 2025 natu...
2025-04-01
代写math 6b final exam revie...
2025-04-01
engd3104代做、代写matlab程序...
2025-04-01
代写engd3106、代做python/c++...
2025-04-01
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!