首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
代写CS257、c/c++编程设计代做
项目预算:
开发周期:
发布时间:
要求地区:
CS257 Advanced Computer Architecture
Coursework Assignment
Term 2, 2023/24
Contents
1 Introduction 2
2 Submission 2
3 Introduction to ACACGS 3
4 Compiling and Running the Code 4
4.1 Visualisation Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5 Hardware Details 6
6 How will my code be tested for performance? 7
7 Rules 7
8 Where do I start? 7
9 Instructions for Submission 7
10 Support 7
1
1 Introduction
The purpose of this coursework is to give you some hands-on experience in code optimisation. By the time you read
this, you will have encountered a variety of code optimisation techniques including loop unrolling and vectorisation.
2 Submission
Your submission will consist of two parts:
1. Optimised Code (60%)
A piece of C code based on the initial implementation provided. This C code will be assessed with respect
to your selection and understanding of optimisations, functional correctness, i.e., producing the right answer,
and execution speed.
2. Written Report (40%)
A report (4 pages maximum, excluding references) detailing your design and implementation decisions. Your
report will be evaluated with respect to your understanding of code optimisation techniques and the optimisations you attempted. This means that your report should explain:
(a) which optimisations you did and did not use;
(b) why your chosen optimisations improve performance; and
(c) how your chosen optimisations affect floating-point correctness.
Given that you may apply many different optimisations, a sensible approach is to build your solution incrementally, saving each partial solution and documenting the impact of each optimisation you make. This means that it
is in your interest to attempt as many different optimisations or combinations of optimisations as you can.
You may discuss optimisation techniques with others but you are not allowed to collaborate on solutions to this
assignment. Please remember that the University takes all forms of plagiarism seriously.
2
3 Introduction to ACACGS
ACSCGS is a conjugate gradient proxy application for a 3D mesh. The simulation will execute for either a fixed
number of timesteps or alternatively until the residual value falls below a given threshold. This is done for a given
mesh size, which is passed in at runtime through command-line arguments.
In this proxy application, a force is applied to each edge boundary of the cuboid, which is then propagated
throughout the mesh. As each time step passes, the force is dissipated within the mesh, until the amount of residual
is significantly small that the simulation stops (as there are no more calculations to perform), or a set number of
time steps have passed.
In addition to providing numeric solutions, the code can also generate visuals which depict the pressure within
the mesh throughout the simulation run. Creating the visualisations relies on two optional packages, Silo and VisIt,
which are available on the DCS systems.
Figure 1: Pressure Matrix Visualisation
3
4 Compiling and Running the Code
The code includes a Makefile file to build the program. You can compile all of the code using the command make.
You should not modify the Makefile file, but examining it may prove helpful in some situations.
While the DCS machines do include a version of gcc, it is preferable to use a more recent version. On the DCS
systems, you can make version 9 the default by using the module load gcc9 command. Once this is loaded you
can simply type make to build the code, which will create an executable named acacgs in the directory. To clean
up the directory, you can run make clean.
To run the code, you need to provide the three dimensions for the mesh as three parameters to the executable.
For example to execute the provided code on a small 10x10x10 mesh you would enter ./acacgs 10 10 10. On my
system the output for the code is below. This information is also stored in a file, which is named after the wallclock
date and time of when the program was first executed (for example, 2023_01_26_12_00_00.txt).
===== Final Statistics =====
Executable name: ./acacgs
Dimensions: 10 10 10
Number of iterations: 149
Final residual: 2.226719e-92
=== Time ==
Total: 1.126600e-02 seconds
ddot Kernel: 8.390000e-04 seconds
waxpby Kernel: 1.087000e-03 seconds
sparsemv Kernel: 9.123000e-03 seconds
=== FLOP ==
Total: 9.536000e+06 floating point operations
ddot Kernel: 5.960000e+05 floating point operations
waxpby Kernel: 8.940000e+05 floating point operations
sparsemv Kernel: 8.046000e+06 floating point operations
=== MFLOP/s ==
Total: 8.464406e+02 MFLOP/s
ddot Kernel: 7.103695e+02 MFLOP/s
waxpby Kernel: 8.224471e+02 MFLOP/s
sparsemv Kernel: 8.819467e+02 MFLOP/s
Difference between computed and exact = 1.110223e-15
You will find more detailed instructions to build the code in the README.md file, including flags to turn on
verbose mode, which will output details for each timestep in the simulation, and flags for enabling visualisation.
4.1 Visualisation Generation
To enable visualisation outputs, you must build your code using make SILO=1. This will then compile your code
in a way which produces files suitable for visualisation in VisIt. If you are working remotely and want to visualise
the coursework, it will be quicker and easier for you to copy the files to your local machine, then utilise VisIt on
the local machine to visualise the cuboid. Before you make the program, make sure you load the SILO module
(module load cs257-silo).
When the program is ran with visualisations, each timestep will produce a SILO file within a directory named
after the wallclock date and time (for example: 2023_01_26_12_00_00). In this directory will be a collection of
.silo files, each named outputXXXX.silo, where XXXX represents the timestep it relates to.
Once the program has finished, these can be utilised in Visit. To do so, load the VisIt module (module load
cs257-visit) and open VisIt using the command visit. From here, you will get 2 windows. The smaller, skinner
one is the control window and is used to manage everything that will be displayed. The larger window is the display
window. In the control window, select Open, and navigate to the directory with the SILO files. You should then
be able to select these SILO files.
4
Now that the SILO files have been loaded, we can now draw some given variables. To do this, click on the Add
and select a mode and a variable that should be viewed. One of the nicest ones to use is Volume and either x_nodal
or p_nodal. When you have finished adding elements, click on Draw. This will generate an image in the display
window, that can be dragged around so that the cuboid can be viewed from different angles. The control window
has a play button, which will run through each timestep.
Visualisations are nice to have, but for performance purposes we turn them off as they write a significant amount
of data to disk.
Table 1: Visualisation Data File Sizes
x y z Cells Approximate Data Size
10 10 10 1000 4MB
25 25 25 15,625 39MB
50 50 50 125,000 301MB
100 100 100 1,000,000 2.4GB
200 200 200 8,000,000 19.3GB
There is the potential to go significantly over your DCS disk quota with large meshes. I recommend that you
do not exceed 30x30x30 for producing visualisations on the DCS machines. If you are developing your solution on
your personal machine then you may wish to produce larger visualisations.
5
5 Hardware Details
On a Linux system, you can read the processor information using the command cat /proc/cpuinfo or lscpu.
This will provide full details on the CPU in the machine, including the CPU model, number of cores, the clock
frequency and supported extensions. I strongly recommend taking a look at this on your development machine.
For the purposes of assessment, your code will be run on a DCS machine with 4 cores. The output from lscpu
can be seen below:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
Stepping: 9
CPU MHz: 3400.000
CPU max MHz: 3800.0000
CPU min MHz: 800.0000
BogoMIPS: 6816.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm
constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid
aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3
sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd
ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1
avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec
xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear
flush_l1d arch_capabilities
Machines matching this specification are available in the cs257 queue of the Batch Compute System in the
Department (referred to as kudu in the labs). You will learn how to use this system during the lab sessions, so
there will be time to get used to it.
6
6 How will my code be tested for performance?
Your submission will be tested on a range of input sizes to evaluate how robust your performance improvements
are. It is recommended that you try testing your solution on inputs that are not cubes to see if there are any
weaknesses in your optimisation strategies. The 7-pt stencil option will not be used for testing your code.
Your code will be executed five times for each problem size on the target hardware. The highest and lowest
runtimes will be discarded, and the mean of the three remaining values will be taken as your runtime for that
problem size.
7 Rules
Your submitted solution must:
• Compile on the DCS workstations.
Your submitted solution must not:
• Alter the Makefile or add or edit any compiler flags;
• Use instruction sets not supported by the DCS machines;
• Require additional hardware e.g., GPUs;
• Add relaxed math options to the compile line, e.g., -ffast-math. Note: Manual use of approximate math
functions is acceptable.
8 Where do I start?
This can seem like a daunting project, but we can break it down into a number of steps.
1. Compile and run the code as provided. This is a quick easy check to make sure your environment is setup
correctly.
2. Read the code. Start in main.c and follow it through. The functions are well documented with Doxygen
comments. Don’t panic - you are not expected to understand the physics in the code.
3. Measure the runtime of the code for reference purposes.
4. Figure our where the most intensive sections of code are.
5. Develop a small optimisation.
6. Run the code and review the impact of your changes.
7. Repeat steps 5 and 6 until you have exhausted your performance ideas.
9 Instructions for Submission
Your solution should be submitted using Tabula. Please ensure that your code works on DCS machines prior to
submission.
Submission Deadline: Wednesday 20th March 2024 @ 12 Noon
Files Required: A single file named coursework.zip which should contain all of your code at the top-level (i.e.
no subdirectories) and the report file as a PDF. All files should be submitted through Tabula.
10 Support
Support can be found from one of your Teaching Assistants: Stephen Xu (stephen.xu@warwick.ac.uk), James
Macer-Wright james.macer-wright@warwick.ac.uk or the module organiser via email.
7
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代写dts207tc、sql编程语言代做
2024-12-25
cs209a代做、java程序设计代写
2024-12-25
cs305程序代做、代写python程序...
2024-12-25
代写csc1001、代做python设计程...
2024-12-24
代写practice test preparatio...
2024-12-24
代写bre2031 – environmental...
2024-12-24
代写ece5550: applied kalman ...
2024-12-24
代做conmgnt 7049 – measurem...
2024-12-24
代写ece3700j introduction to...
2024-12-24
代做adad9311 designing the e...
2024-12-24
代做comp5618 - applied cyber...
2024-12-24
代做ece5550: applied kalman ...
2024-12-24
代做cp1402 assignment - netw...
2024-12-24
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!