首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
讲解CSCI-1200语言、c++编程设计调试、辅导c/c++程序 辅导R语言编程|讲解留学生Processing
项目预算:
开发周期:
发布时间:
要求地区:
CSCI-1200 Data Structures — Spring 2021
Homework 10 — Performance and Big ’O’ Notation
In this final assignment for Data Structures, you will carry out a series of tests on the fundamental data
structures in the Standard Template Library to first hypothesize and then measure the relative performance
(running time & memory usage) of these data structures and solidify your understanding of algorithm
complexity analysis using Big ’O’ Notation. The five fundamental data structures we will study are: vector,
list, binary search tree (set / map), priority queue, and hash table (unordered set / unordered map).
Be sure to read the entire handout before beginning your implementation.
Overview of Operations
We will consider the following six simple, but moderately compute-intensive, operations that are common
subtasks of many interesting real-world algorithms and applications. We will test these operations using
integers and/or STL strings as noted below.
• Sort - we’ll use the default operator< for the specific data type.
• Remove duplicates from a sequence - without otherwise changing the overall order (keeping the first
occurrence of the element).
• Determine the mode – most frequently occurring element. If there is a tie, you may return any of the
most frequently occurring elements.
• Identify the closest pair of items within the dataset - integer data only. We’ll use operator- to
measure distance. If there is a tie in distance, you may return any of the pairs of closest distance.
• Output the first/smallest f items - a portion of the complete sorted output.
• Determine the longest matching substring between any two elements - STL string data only. For
example, if the input contains the words ‘antelope’, ‘buffalo’ and ‘elephant’, the longest substring
match is ‘ant’ (found within both ‘antelope’ and ‘elephant’). If there is a tie, you may return any of
the longest matching substrings.
See also the provided sample output for each of these operations.
“Rules” for comparison: For each operation, we will analyze the cost of a program/function that reads
the input from an STL input stream object (e.g., std::cin or std::ifstream) and writes the answer to an
STL output stream (e.g., std::cout or std::ofstream). The function should read through the input only
once and construct and use a single instance of the specified STL data structure to compute the output. The
function may not use any other data structure to help with the computation (e.g., storing data in a C-style
array).
Your Initial Predictions of Complexity Analysis
Before doing any implementation or testing, think about which data structures are better, similarly good, or
useless for tackling each operation.
Fill in the table on the next page with the big ’O’ notation for both the runtime and memory usage to
complete each operation using that data structure. If it is not feasible/sensible to use a particular data
structure to complete the specified operation put an X in the box. Hint: In the first 3 columns there should
only be 2 X’s! If two or more data structures have the same big ‘O’ notation for one operation, predict and
rank the data structures by faster running time for large data. We combine set & map (and unordered set
& unordered map) in this analysis, but be sure to specify which datatype of the two makes the most sense
for each operation.
For your answers, n is the number of elements in the input, f is the requested number of values in the output
(only relevant for the ‘first sorted’ operation), and l is the maximum length of each string (only use this
variable for the ‘longest substring match’ operation). Type your answers into your README.txt file.
You’ll also paste these answers into Submitty for autograding.
sort
remove
duplicates mode
closest
pair
first f
sorted
longest
substring
match
vector
list
BST (set/map)
priority queue/
binary heap
hash table
(unordered_set/
unordered_map)
Provided Framework
We provide a framework to implement and test these operations with each data structure and measure the
runtime and overall memory usage. The input will come from a file, redirected to std::cin on the command
line. Similarly, the program will write to std::cout and we can redirect that to write to a file. Some basic
statistics will be printed to std::cerr to help with complexity analysis. Here’s examples of how to compile
and run the provided code:
clang++ -g -Wall -Wextra performance*.cpp -o perf.out
./perf.out vector mode string < small_string_input.txt
./perf.out vector remove_duplicates string < small_string_input.txt > my_out.txt
diff my_out.txt small_string_output_remove_duplicates.txt
./perf.out vector closest_pair integer < small_integer_output_remove_duplicates.txt
./perf.out vector first_sorted string 3 < small_string_input.txt
./perf.out vector longest_substring string < small_string_output_remove_duplicates.txt
./perf.out vector sort string < medium_string_input.txt > vec_out.txt 2> vec_stats.txt
./perf.out list sort string < medium_string_input.txt > list_out.txt 2> list_stats.txt
diff vec_out.txt list_out.txt
2
The first example reads string input from small string input.txt, uses an STL vector to find the most
frequently occurring value (implemented by first sorting the data), and then outputs that string (the mode)
to std::cout.
The second example uses an STL vector to remove the duplicate values (without otherwise changing the
order) from small string input.txt storing the answer in my out.txt, and then uses diff to compare
that file to the provided answer.
The next 3 command lines show examples of how to run the closest pair, first sorted and
longest substring operations. Note that the first sorted operation takes an additional argument, the
number of elements to output from the sorted order. Also note that the closest pair and longest substring
operations are more interesting when the input does not contain duplicate values.
The final example sorts a larger input of random strings first using an STL vector, and then using an STL
list and confirms that the answers match.
Generating Random Input
We provide a small standalone program to generate input data files with random strings. Here’s how you
compile and use this program to generate a file named medium string input.txt with 10,000 strings, each
with 5 random letters (‘a’-‘z’). And also a file named medium integer input.txt with 10,000 integers, each
with 3-5 digits (ranging in value from 100-99999).
clang++ -g -Wall -Wextra generate_input.cpp -o generate_input.out
./generate.out string 10000 5 5 > medium_string_input.txt
./generate.out integer 10000 3 5 > medium_integer_input.txt
Measuring Performance
First, create and save several large randomly generated input files with different numbers of elements. Test the
vector code for each operation with each of your input files. The provided code uses the clock() function
to measure the processing time of the computation. The resolution accuracy of the timing mechanism is
system and hardware dependent and may be in seconds, milliseconds, or something else. Make sure you use
large enough inputs so that your running time for the largest test is about a second or more (to ensure the
measurement isn’t just noise). Record the results in a table like this:
Sorting random 5 letter strings using STL vector
# of strings vector sort operation time (sec)
10000 0.031
20000 0.067
50000 0.180
100000 0.402
As the dataset grows, does your predicted big ‘O’ notation match the raw performance numbers? We know
that the running time for sorting with the STL vector sorting algorithm is O(n log2 n) and we can estimate
the coefficient k in front of the dominant term from the collected numbers.
vector sort operation time(n) = kvector sort ∗ n log2 n
Thus, on the machine which produced these numbers, coefficient kvector sort ≈ 2.3 x 10−7
sec. Of course
these constants will be different on different operating systems, different compilers, and different hardware!
3
These constants will allow us to compare data structures / algorithms with the same big ‘O’ notation. The
STL list sorting algorithm is also O(n log2 n), but what is the estimate for klist sort?
Be sure to try different random string lengths because this number will impact the number of repeated/
duplicate values in the input. The ratio of the number of input strings to number of output strings is
reported to std::cerr with the operation running time. Which operations are impacted by the number of
repeated/duplicate values? What is the relative impact?
Operation Implementation using Different Data Structures
The provided code includes the implementation of each operation (except longest substring) for the vector
datatype. Your implementation task for this assignment is to extend the program to the other data structures
in the table. You should carefully consider the most efficient way (minimize the running time) to use each
data structure to complete the operation.
Focus on the first three operations from the table first (sort, remove duplicates, and mode). Once those
are debugged and tested, and you’ve analyzed the relative performance, you can proceed to implement the
other operations.
Estimate of Total Memory Usage
When you upload your code to Submitty, the autograder will measure not only the running time, but also
the total memory usage. Compare the memory used by the different data structures to perform the same
operation on the same input dataset. Does the total memory usage match your understanding of the relative
memory requirements for the internal representation of each data structure?
You can also run this tool on your local GNU/Linux machine (it may not work on other systems):
clang runstats.c -o runstats.out
./runstats.out ./perf.out vector sort string < medium_string_input.txt > my_out.txt
Results and Discussion
For each data type and each operation, run several sufficiently large tests and collect the operation time output
by the program. Organize these timing measurements in your README.txt file and estimate the coefficients
for the dominant term of your Big ‘O’ Notation. Do these measurements and the overall performance match
your predicted Big ‘O‘ Notation for the data type and operation? Did you update your initial answers for
the Big ‘O‘ Notation of any cell in the table?
Compare the relative coefficients for different data types that have the same Big ‘O’ Notation for a specific
operation. Do these match your intuition? Are you surprised by any of the results? Will these results impact
your data structure choices for future programming projects?
Submission
You must do this assignment on your own, as described in the “Collaboration Policy &
Academic Integrity”. If you did discuss the problem or error messages, etc. with anyone,
please list their names in your README.txt file. Important Note: Do not include any large test datasets
with your submission, because this may easily exceed the submission size. Instead describe any datasets you
created, citing the original source of the data as appropriate.
4
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做ceng0013 design of a pro...
2024-11-13
代做mech4880 refrigeration a...
2024-11-13
代做mcd1350: media studies a...
2024-11-13
代写fint b338f (autumn 2024)...
2024-11-13
代做engd3000 design of tunab...
2024-11-13
代做n1611 financial economet...
2024-11-13
代做econ 2331: economic and ...
2024-11-13
代做cs770/870 assignment 8代...
2024-11-13
代写amath 481/581 autumn qua...
2024-11-13
代做ccc8013 the process of s...
2024-11-13
代写csit040 – modern comput...
2024-11-13
代写econ 2070: introduc2on t...
2024-11-13
代写cct260, project 2 person...
2024-11-13
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!