首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
program代写、代做Java/c++程序设计
项目预算:
开发周期:
发布时间:
要求地区:
ARS - Coursework Guide – 24/25
Version History
1.0 29/09/24 First version.
1.1 12/11/24 Fleshed out marking criteria for task 2 report
Summary
Title: Reinforcement Learning using Gymnasium environments
Hand-in: Programs AND a written report will need to be submitted online via Moodle. Check
the module’s Moodle page for the precise deadline.
Late policy: The coursework deadlines (task 1 and task 2) are absolute. Late submissions are
subject to a 5% deduction of the overall coursework mark per day.
Informal Description
The coursework consists of two tasks as described below. Your aim is to build several reinforcement
learning agents and to design, implement and run several basic research-based experiments. You
will hand-in software and a report that discusses your work on these tasks. Briefly, task 1 is about
implementing some basic RL prototypes (with noise injection and basic modularity) for your chosen
environment(s) and identification of key literature, gaps, and research questions, whereas task 2 is
about designing, developing and running experiments based on the research questions identified in
task 1.
Aims and Outcomes
• If you take the labs seriously, at the end of the semester you should be:
o comfortable with implementing and modifying reinforcement learning agents,
o capable of adapting your RL solutions to different kinds of robotic problems with
well-defined states, actions and rewards,
o comfortable with neural network approaches for the mapping of complex high dimensional states to actions (if you choose to use neural network based RL
solutions),
o comfortable with setting up experiments pertaining to noise and studying and
mitigating its impact,
o comfortable with designing modular AI solutions,
o capable of scanning the literature in order to understand modern RL techniques, and
incorporating/extending these in your own solutions,
o capable of identifying gaps, and/or weaknesses/limitations in state-of-the-art
research, and using this to define research questions for guiding your research,
o capable of studying and evaluating algorithm performance objectively,
o capable of designing innovative algorithms and experiments, and reporting the
results of these in a clear and well-structured manner.
Rough Timetable
Week Main Lab Main activities
1 01/10/24 Getting started. Familiarization with Gymnasium
2 08/10/24 Task 1
3 15/10/24 Task 1
4 22/10/24 Task 1
5 (28|29)/10/24 Task 1. Demos for task 1 – we may need both Mon. & Tue. slots
6 05/11/24 Task 2
7 12/11/24 Task 2.
8 19/11/24 Task 2.
9 26/11/24 Task 2
10 (02|03)/12/24 Task 2. Demos for task 2 – we may need both Mon. & Tue. slots
Laboratory notes
• You will work individually.
• We need to start working hard from the very first day to make the most of the lab sessions.
In the first week you will learn the basics of Gymnasium, will experiment with several
environments, and will even try some small heuristics on simple control problems (e.g.
cartpole).
• Rough time estimation:
o Total hours: 20 credits ≈ 200 hours
o Subtract lectures (22 hours) and labs (20 hours) = 200 – 42 = 158
o Divide the remainder by 12 weeks = 158 / 12 ≈ 13 hours per week for everything
else, e.g.: studying, researching, reading, thinking, coding, testing, analyzing, writing.
Getting Started
Preliminary steps
• Check the following three main Gymnasium resources:
o Farama’s general documentation page for Gymnasium.
o Basic usage page in the above documentation.
o Gymnasium GitHub page – includes installation instructions.
• Install Gymnasium.
• For the purpose of the coursework it is sufficient to work with the “classic control” set of
environments, however do feel free to install and use other categories of environments (e.g.
MuJoCo and Atari), if you wish.
• Go through the Basic Usage page.
• You can install Gym on your own machines, or in your local directory in UNM’s HPC, or you
can also use Google Colaboratory. Please note that in the past there were ways to render
environments properly in Colab (e.g. have a look at this tutorial) however this may change
from time to time. For an example of a Jupyter notebook for the cart pole example, refer to
the module’s Moodle page. I suggest not bothering with rendering, except for some
debugging exercises, since performance metrics are the key concern.
• As mentioned, if you want to use any of the MuJoCo environments you can. Deep Mind
recently bought MuJoCo and made it open source, which means there are no more licensing
issues. You are not required to use MuJoCo, but if you really want to, you are free to install
it, and get the environments setup.
• To see what environments are available use:
import gymnasium as gym
print(gym.envs.registry.keys())
• To better understand some Gymnasium environments consult this Wiki or scroll to
“environments” in the Gymnasium’s GitHub page, and search for your environment. For
example for the cart pole environment have a look at this page.
Try to come up with some heuristic solutions for Cart Pole
• Try to come up with some simple heuristics to keep the pole up based on your
understanding of the environment. You can start from and modify the (failing) heuristic
example provided in the Moodle page (i.e. sol-H1-cart-pole-v0).
• Difficult? Let's see whether reinforcement learning helps.
Have a look at a Q-learning solution
• Example: s1cart-pole-v0-sol1.
• Try to run the code.
• Read the code. Try to understand it as much as possible, although note, it will only fully
make sense once we have done Q-Learning in the lectures.
Task Description
• Requirements for Task 1:
o Title. Prototypes, literature, gaps, and research questions.
o Prototypes:
▪ Environment selection. Select two environments to work on throughout
the whole assignment. Select one environment from within the control
category (e.g. CartPole-v1) and one environment from any category
(including the control one). Please recall that different environments
may impose significant changes to your reinforcement learning
algorithm since, for example, they may involve continual action spaces,
or other representational differences. To simplify matters you might
want to constrain yourself to environments with discrete action spaces.
▪ Core method required: reinforcement learning. If you want to use other
methods for other integrated modules, that is fine.
▪ Additional requirements: (1) noise injection at the inputs and/or
outputs, (2) some modularity (e.g. RL component and denoising
component).
▪ Aim: for each environment develop at least one viable proof of concept
based on RL.
o Literature:
▪ Steps:
• Explore the recent RL literature in relation to the topic of noise
and or modularity.
• Select 1-3 good papers from the date range 2022-2023 and
highlight their gaps (i.e. limitations and/or open
questions/problems). Note that although these 1-3 papers will
be your “core/seed” papers, you should still study the literature
more broadly (i.e. your report should cite other papers apart
from the core papers).
• Select your gaps for further investigation. Justify your choices.
• Design at least 2 research questions based on your selected
gaps.
▪ Aim: clearly outline 1-3 selected papers, overall gaps, selected gaps, and
research questions. Note that it is crucial for the papers, gaps and
research questions to be 100% credible, i.e.: (1) the papers must be
recent and good, (2) the gaps must be genuine open problems, and (3)
the research questions must sit squarely in the gaps and must point in
useful directions.
▪ Constraint 1: Every student must have a different set of core papers
and/or a different set of gaps and/or a different set of research
questions (RQs). Once a student has defined their selected papers, gaps,
and RQs, they must email them to me, in order for me to check and
approve them. Please note that this process will operate on a “first
come first served” basis. Please also note that if two students share the
same papers, they can still be different in terms of the chosen gaps or
RQs, however, it is preferable if all elements are distinct.
▪ Constraint 2: The selected research questions must include, or focus on,
(1) noise, (2) modularity, or (3) both.
• Requirements for Task 2:
o Title. Research questions and experiments.
o Environment selection. You must use the same two environment you selected
for task 1.
o Core method required: reinforcement learning. As before, if you want to use
other methods for other integrated modules, that is fine.
o Goals. Keywords: novel experiments and insights. The aim of this task is for you
to design, develop, run, and analyze, experiments that address the research
questions your listed in task 1. The mains tasks would be: (1) design experiments
that address the research questions, (2) implement the experiments, (3) debug
and finetune your code, (4) run the experiments and collect results, (5) analyze
the results and assess whether they answered the research questions, (6) either
proceed back to step 1 with adjustments to the experiments/solutions, or
proceed with additional experiments (depending on time and completion
status). Document your findings.
• Requirements for all tasks (i.e. tasks 1 and 2):
o Performance. Define one or more valid performance measures, apart from the
default/compulsory one, i.e.: the average number of episodes needed before
learning a problem (see below for more information).
o Evaluation. Run your experiments and report your results for both of your
chosen environments consistently.
o Four I’s. Try to maximize your work along the following dimensions: (1)
informedness (i.e. it is based on a solid understanding of the literature), (2)
innovativeness (i.e. novel), (3) inventiveness (i.e. not technically trivial), (4)
impactfulness (e.g. generates new knowledge).
o Core themes. The core themes for both tasks are: (1) reinforcement learning, (2)
noise, (3) modularity. Please note that the research questions can be exclusively
about noise, or modularity, or both, however, the models must always include
elements of noise and modularity.
• Demo. Show and explain the performance of your solutions, and the results of your
experiments.
Performance Evaluation
• Since you will be injecting noise into your sensor data and/or actions, your results are
not directly comparable to solutions on external leaderboards (e.g.:
https://github.com/openai/gym/wiki/Leaderboard). Your focus will be on internal
comparisons (i.e. your own experimental conditions) and innovation.
• One key performance measure that you should recall is the number of episodes required
before solving the problem. In other words, here you are interested in the speed of
learning. Care must be taken in being explicit and consistent regarding what constitutes
having solved the problem.
Assessment – Overall
Component Marks
(100)
Description Main Criteria
Task 1 - demo 5
Demo of work so
far.
Evidence of understanding of the base code. Evidence of solid
understanding of literature, gaps, questions, and innovation.
Task 1 - report 20
Report (1-2
pages)
summarizing task
1
Are the core papers (1-3) well explained? Are the overall gaps
well identified and explained? Are the selected gaps justified
properly? Are the research questions grounded in the gaps,
and are they clear, concrete, and heading in the right
direction?
Task 2 - demo 5
Demo of work so
far.
Evidence of understanding of the base code. Good explanation
of gaps, question, experimental design, results, analyses, and
conclusions. Solid argumentation vis-à-vis the 4 I’s. Strong
justifications and arguments. Clear communication.
Task 2 - paper 50
Mini-conference
paper (4 pages)
summarizing all of
the work done on
both tasks.
Are the structure, grammar and argumentation of the
paper/report good? Are the introduction, background,
methods, results and analyses, clear, comprehensive and
insightful? Does the paper show critical and creative thinking?
Task 2 - software 20
Multiple files
organized with a
clear structure.
Is the code complete? Is the code well-designed, clean,
elegant, and well commented? Is the code
complex/challenging enough?
Assessment Criteria for the Report (task 1) and Paper (task 2)
• 1st an excellent, well-written report/paper demonstrating extensive understanding and
good insight.
• 2:1 a comprehensive, well-written report/paper demonstrating thorough understanding and
some insight.
• 2:2 a competent report/paper demonstrating good understanding of the implementation.
• 3rd an adequate report/paper covering all specified topics at a basic level of understanding.
• F an inadequate report/paper failing to cover the specified topics.
Report guide (task 1)
• The report for task 1 has no fixed format, as long as it is well structured and well organized.
The only constraint is that it should be 1-2 pages long. No appendices are allowed, and to be
fair to all, no material on page 3 onwards (if you exceed 2 pages) will be included in the
assessment. The font size of the main text should not be smaller than 11.
• This report will exclusively focus on: (1) a very brief summary of your prototypes, (2) brief
summaries of your selected core papers, and why they were chosen, (3) lengthier
explanations on the weaknesses/gaps of the papers, (4) an explanation and justification of
your selected gaps, and (5) an explanation and justification of your research questions, and
how they are grounded in the gaps.
Paper Guide (task 2)
You should design your final report as a conference paper. The paper should contain:
• [8 marks] Introduction (about 1 page). Brief explanation of the motivation and main
concepts, a problem statement, an extremely brief overview of the key papers and their
gaps, the research questions, and a brief summary of your main contributions. Key marking
criteria: (1) Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation,
(5) Insightfulness, (6) Critical and creative thinking.
• [8 marks] Background (about 0.5 pages). Brief overview of the field and the key papers
closely related to your work (this will include the core 1-3 papers and other relevant papers).
The core selected papers with their gaps, and why there were chosen selected, must be
clearly explained. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)
Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking.
• [8 marks] Methods (about 1 page). A detailed and concise description of how you
implemented task 2 (e.g. algorithms and experimental design). Key marking criteria: (1)
Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation.
• [10 marks] Results (about 1 page). An overview of your key results encompassing
performance measures and other results leading to insights about the problem and/or your
solutions. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)
Comprehensiveness, (4) Argumentation, (5) Insightfulness.
• [10 marks] Discussion (about 0.5 pages). Your interpretation of the results, your conclusions,
and proposed future work. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)
Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking.
• [6 marks] References & Appendices (not included in the word count). Key marking criteria:
(1) Consistency of references, (2) Comprehensiveness of references, (3) Structure and clarity
of appendices, (4) Insightfulness of appendices.
Note: Writing a concise report/paper is a core part of the assignment. The total number of pages for
your paper (i.e. main sections, excluding references and Appendices) cannot exceed 4 pages (with a
minimum page margin of 2.5cm on each side), using single line spacing, a two-column format, and a
minimum font size of 11).
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做 program、代写 c++设计程...
2024-12-23
comp2012j 代写、代做 java 设...
2024-12-23
代做 data 编程、代写 python/...
2024-12-23
代做en.553.413-613 applied s...
2024-12-23
代做steady-state analvsis代做...
2024-12-23
代写photo essay of a deciduo...
2024-12-23
代写gpa analyzer调试c/c++语言
2024-12-23
代做comp 330 (fall 2024): as...
2024-12-23
代写pstat 160a fall 2024 - a...
2024-12-23
代做pstat 160a: stochastic p...
2024-12-23
代做7ssgn110 environmental d...
2024-12-23
代做compsci 4039 programming...
2024-12-23
代做lab exercise 8: dictiona...
2024-12-23
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!