首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
comp10002代做、代写C/C++编程设计
项目预算:
开发周期:
发布时间:
要求地区:
School of Computing and Information Systems
comp10002 Foundations of Algorithms
Semester 2, 2023
Assignment 2
Learning Outcomes
In this project, you will demonstrate your understanding of dynamic memory and linked data structures (Chapter
10) and extend your program design, testing, and debugging skills. You will learn about the problem of language
generation and implement a simple algorithm for generating text based on the context provided by input prompts.
Background
Figure 1: A prefix automaton.
The recent success of generative tools has spawned many new applications and debates in society. A generative tool is trained on a massive
dataset, for example, pictures or texts. Then, given a new input, referred to as a prompt, patterns in the prompt get matched to the frequent
patterns in the model learned by the tool, and the contextual extensions
of the recognized pattern get generated.
Artem and Alistair want to design a tool for generating text statements from prompts. In the first version of the tool, they decide to
learn a frequency prefix automaton from training statements. A sample trained automaton is shown in Figure 1. In the automaton, nodes
are annotated with unique identifiers and frequencies with which they
were observed during training, and arcs are annotated with statement
fragments. For instance, the root of the automaton has identifier 0 and
a frequency of 8 (f=8). Given a prompt, for example, “Hi”, the tool
should identify the context by replaying the prompt starting from the initial node of the automaton, that is, identify that the prompt leads to the node with identifier 9. Then, given the context, the tool should generate the most
likely extension of the prompt, that is, the extension that follows the most frequent subsequent nodes. Thus, the
tool should generate the statement “Hi#Sir” for the example prompt. Your task is to implement this tool.
Input Data
Your program should read input from stdin and write output to stdout. The input will list statements, instructions, and prompts, one per input line, each terminating either with “\n”, that is, one newline character, or
EOF, that is, the end of file constant defined in the
header file. The following file test0.txt uses
eighteen lines to specify an example input to your program.
The input always starts with statements, each provided as a non-empty sequence of characters. For example,
lines 1–8 in the test0.txt file specify eight input statements. If the line that follows the input statements is
empty, that is, consists of a single newline character, it is the instruction to proceed to Stage 1 of the program
(see line 9 in the example input). Subsequent lines in the input specify prompts, one per line, to be processed
in Stage 1 (lines 10–13 in test0.txt). The prompts can be followed by another empty line, which denotes
the instruction to proceed to Stage 2 (line 14). The Stage 2 input starts with the instruction to compress input
statements, given as a non-negative integer (line 15), followed by prompts to be processed in Stage 2 of the
program (lines 16–18). In general, the input can contain an arbitrary number of statements and prompts.
The input will always follow the proposed format. You can make your program robust by handling inputs
that deviate from this format. However, such extensions to the program are not expected and will not be tested.
Stage 0 – Reading, Analyzing, and Printing Input Data (12/20 marks)
The first version of your program should read statements from input, construct their frequency prefix automaton, and print basic information about the automaton to the output. The first four lines from the listing below
correspond to the output your program should generate for the test0.txt input file in Stage 0.
1
1 ==STAGE 0============================
2 Number of statements: 8
3 Number of characters: 50
4 Number of states: 29
5 ==STAGE 1============================
6 Hey#...you
7 Hi#T...
8 Hey...#you
9 Hel...lo
10 ==STAGE 2============================
11 Number of states: 17
12 Total frequency: 26
13 -------------------------------------
14 Hi...#Sir
15 Hi#t...here
16 He...y#you
17 ==THE END============================
18
Line 1 of the output prints the Stage 0 header. Lines 2 and 3 print the total number of statements and the total
number of characters in all the statements read from the input, respectively. Finally, line 4 reports the number of
nodes, also called states, in the frequency prefix automaton constructed from the input statements.
Figure 2e shows the prefix automaton constructed from the eight input statements in test0.txt, whereas
Figures 2a to 2d show intermediate automata constructed from subsets of the input statements. Nodes of an
automaton are states, each with a unique identifier, while arcs encode characters. Note that state identifiers
are used for presentation only and do not impact the output your tool should generate. In an automaton, the
node without incoming arcs is the initial state, and a node without outgoing arcs is a leaf state. The characters
encountered on the arcs traversed on a walk from the initial state to a leaf state (without visiting the same state
twice) define a statement. The statements defined by an automaton are all and only statements the automaton
was constructed from. Note that the automaton in Figure 2e has 29 states reported on line 4 of the output listing.
(e)
Figure 2: Prefix automata constructed from input statements in the test0.txt file.
For example, the automaton in Figure 2a was constructed from the first input statement in test0.txt. In
this automaton, the initial state has identifier 0, and the only leaf state has identifier 8. The arcs encountered
while walking from state 0 to state 8 define the statement from line 1 of the input. States are annotated with
frequencies of traversing, that is, arriving to and departing from, them when performing all the walks that define
the statements used to construct the automaton. For instance, all the states of the automaton in Figure 2a except
for the leaf state are annotated with the frequency of one (f=1), whereas the leaf state is annotated with the
frequency of zero (f=0). Figure 2b, 2c, and 2d show automata constructed from the statements in the first two,
four, and six lines in the input, respectively. For any two statements, the resulting automaton reuses the states
and arcs that correspond to their longest common prefix (see states 0 and 1 and arc “H” in Figure 2b).
You should not make assumptions about the maximum number of statements and the number of characters
in the statements provided as input to your program. Use dynamic memory and data structures of your choice,
for example, arrays or linked lists, to process the input and construct the prefix automaton.
Stage 1 – Process Prompts (16/20 marks)
The output of Stage 1 of your program should start with the header (line 5 in the listing).
2
Extend your program from Stage 0 to process the Stage 1 prompts; input lines 10–13 in test0.txt. To
process a prompt, it is first replayed on the automaton, and then the continuation of the prompts is generated.
The replay of a prompt starts in the initial state and follows the arcs that correspond to the characters in the
prompt. While following the arcs, the encountered characters should be printed to stdout. If the entire prompt
was replayed, print the ellipses (a series of three dots) to denote the start of text generation. To generate text,
one proceeds with the walk from the state reached during the replay to a leaf state by selecting the most frequent
following states. If, at some encountered state, two or more next states have the same frequency, the one that is
reached via the ASCIIbetically greater label on the arc (the label with the first non-matching character greater in
ASCII) should be chosen. Again, the characters encountered along the arcs should be printed to stdout.
For instance, the replay of the input prompt on line 10 in test0.txt leads to state 4 in the automaton
in Figure 2e; the states and arcs visited along the replay are highlighted in green. The generation phase then
continues the walk from state 4 to state 28; see highlighted in blue in the figure. State 26 is chosen to proceed
with the walk from state 4 as it is reached via character “y” with the ASCII code of 121, while label “P” that
leads from state 4 to state 5 while having the same frequency (f=1) has a smaller ASCII code of 80. The output
that results from processing the prompt on line 10 of the input is shown on line 6 of the output listing.
If the automaton does not support a replay of the entire prompt, the output should be terminated once the first
non-supported character is encountered. The replayed characters must be appended by the ellipses in the output,
and no generation must be performed; see the output on line 7 in the listing for the input prompt on line 11
in test0.txt. Every output triggered by an input prompt, including the replay, ellipses, and the generated
characters, should be truncated to 37 characters; see example in the output of the test1.txt input file.
Stage 2 – Compress Automaton & Process Prompts (20/20 marks)
The output of Stage 2 should start with the header (line 10 in the listing).
Extend your program to compress the automaton obtained in Stage 1 and use the compressed automaton to
process the input prompts of Stage 2 (lines 16–18 in test0.txt). The first line of the input of Stage 2 (line 15)
specifies the number of compression steps to perform. Each next compression step should be performed on the
automaton resulting from all the previous compression steps. A single compression step of an automaton is
defined by its arc. To find the arc that defines the next compression step to perform, traverse the automaton states
starting from the initial state in the depth-first order, prioritizing states reachable via smaller (in ASCII) labels.
The arc between the currently visited state x and the next visited state y in the traversal of the states leads
to the next compression step if: (i) x has a single outgoing arc and (ii) y has one or more outgoing arcs. The
compression step is performed by first adding a new arc from x to every state reachable from y via an outgoing
arc and then deleting y and all arcs that connect to y. The label of an added arc is the concatenation of the
labels of the deleted arcs on the walk from the source to the target of this new arc in the original automaton. The
automaton in Figure 3a is the result of the first compression step in the automaton in Figure 2e defined by the arc
from state 0 to state 1. Figure 3b is the result of compressing the automaton in Figure 3a using the arc between
states 18 and 19; highlighted in red in Figure 3a. The depth-first order of the states in the automaton in Figure 3a
that starts from the initial state and prioritizes smaller labels is 0, 2, 18, 19, 20, 3, 4, 5, 6, 7, 8, 26, 27, 28, 9, 10,
14, 15, 16, 17, 11, 12, 13, 21, 22, 23, 24, 25, and the arc from state 18 to state 19 is the first arc between two
consecutive states in this order that satisfies the compression conditions. The automaton in Figure 3c is obtained
by compressing the arc between states 3 and 4 in the automaton in Figure 3b. The automaton in Figure 1 results
from 12 requested compression steps in the original automaton constructed in Stage 1 of the program. It allows
for one additional compression defined by the arc between states 21 and 24, but it was not requested.
The prompt replay and extension generation in Stage 2 must follow the corresponding principles described
in Stage 1 but should be performed on the compressed automaton. The output should report the number of states
in the compressed automaton (line 11 in the listing), the sum of frequencies of all the states in the compressed
automaton (line 12), and all the generated statements (lines 14–16) after the delimiter line of 37 “-” characters
(line 13). Every run of your program should terminate by printing the end message (line 17 in the output listing).
Pay Attention!
The output your program should generate for the test0.txt input file is provided in the test0-out.txt output
file. Two further test inputs and outputs are provided. The outputs generated by your program should be exactly
the same as the sample outputs for the corresponding inputs. Use malloc and dynamic data structures of your
choice to read and store input logs. Before your program terminates, all the malloc’ed memory must be free’d.
(c) after the first three compression steps
Figure 3: Compressed versions of the prefix automaton in Figure 2e.
Important...
This project is worth 20% of your final mark, and is due at 6:00pm on Friday 13 October.
Submissions that are made after the deadline will incur penalty marks at the rate of two marks per day
or part day late. Students seeking extensions for medical or other “outside my control” reasons should email
ammoffat@unimelb.edu.au as soon as possible after those circumstances arise. If you attend a GP or other
health care service as a result of illness, be sure to obtain a letter from them that describes your illness and their
recommendation for treatment. Suitable documentation should be attached to all extension requests.
You need to submit your program for assessment via Gradescope (Assignment 2). Submission is not
possible through Grok. Multiple submissions may be made; only the last submission that you make before the
deadline will be marked. If you make any late submission at all, your on-time submissions will be ignored, and
if you have not been granted an extension, the late penalty will be applied.
As you can submit your program multiple times, test your program in the test environment early. The
compilation in the Gradescope environment may differ from Grok and your laptop environment. Note that
Gradescope may overload and even fail on the assignment due date when many students submit their programs,
and if that does happen, it will not be a basis for extension requests. Plan to start early and to finish early!!
A rubric explaining the marking expectations is linked from the LMS. Marks will be available on the LMS
approximately two weeks after submissions close, and feedback will be made available via Gradescope.
Academic Honesty: You may discuss your work during your workshop, and with others in the class, but what
gets typed into your program must be individual work, not copied from anyone else. So, do not give hard copy
or soft copy of your work to anyone else; do not have any “accidents” that allow others to access your work;
and do not ask others to give you their programs “just so that I can take a look and get some ideas, I won’t
copy, honest”. The best way to help your friends in this regard is to say a very firm “no” if they ask to see
your program, pointing out that your “no”, and their acceptance of that decision, are the only way to preserve
your friendship. See https://academicintegrity.unimelb.edu.au for more information. Note also that
solicitation of solutions via posts to online forums, whether or not there is payment involved, is also Academic
Misconduct. In the past students have had their enrolment terminated for such behavior.
The assignment page contains a link to a program skeleton that includes an Authorship Declaration that you
must “sign” and include at the top of your submitted program. Marks will be deducted (see the rubric) if you
do not include the declaration, or do not sign it, or do not comply with its expectations. A sophisticated
program that undertakes deep structural analysis of C code identifying regions of similarity will be run over
all submissions. Students whose programs are identified as containing significant overlaps will have
substantial mark penalties applied, or be referred to the Student Center for possible disciplinary action.
Nor should you post your code to any public location (github, codeshare.io, etc) while the assignment is
active or prior to the release of the assignment marks.
© The University of Melbourne, 2023. The project was prepared by Artem Polyvyanyy, artem.polyvyanyy@unimelb.edu.au.
4
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做 program、代写 c++设计程...
2024-12-23
comp2012j 代写、代做 java 设...
2024-12-23
代做 data 编程、代写 python/...
2024-12-23
代做en.553.413-613 applied s...
2024-12-23
代做steady-state analvsis代做...
2024-12-23
代写photo essay of a deciduo...
2024-12-23
代写gpa analyzer调试c/c++语言
2024-12-23
代做comp 330 (fall 2024): as...
2024-12-23
代写pstat 160a fall 2024 - a...
2024-12-23
代做pstat 160a: stochastic p...
2024-12-23
代做7ssgn110 environmental d...
2024-12-23
代做compsci 4039 programming...
2024-12-23
代做lab exercise 8: dictiona...
2024-12-23
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!