首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
代做program、代写Python/C++设计程序
项目预算:
开发周期:
发布时间:
要求地区:
Assessed Coursework
Course Name Database Systems (H)
Coursework Number 1/1
Deadline Time: 16h30 Date: 14/03/2025
% Contribution to final
course mark
20%
Solo or Group ✓ Solo Group ✓
Anticipated Hours Average 20 hours
Submission Instructions
Please, read the description.
Please Note: This Coursework cannot be Re-Assessed
Code of Assessment Rules for Coursework Submission
Deadlines for the submission of coursework which is to be formally assessed will be published in course
documentation, and work which is submitted later than the deadline will be subject to penalty as set out
below.
The primary grade and secondary band awarded for coursework which is submitted after the published
deadline will be calculated as follows:
(i) in respect of work submitted not more than five working days after the deadline
a. the work will be assessed in the usual way;
b. the primary grade and secondary band so determined will then be reduced by two secondary
bands for each working day (or part of a working day) the work was submitted late.
(ii) work submitted more than five working days after the deadline will be awarded Grade H.
Penalties for late submission of coursework will not be imposed if good cause is established for the late
submission. You should submit documents supporting good cause via MyCampus.
Penalty for non-adherence to Submission Instructions is: 2 bands
Data Engineering 2025
Task 1 [60 Marks]
Your team is hired by the HM Revenue & Customs (HMRC) to execute an analytics query over the last 4 years
estimating the moving average of the collected city council tax across all the UK City Councils. Specifically, you
are given access to the relation HMRC (Council, RecordedDate, Tax). The attribute Council is 128 bytes with
320 different values, each one associated with a city council regional authority. The attribute RecordedDate is
64 bytes with 48 different values, each one representing the date of the first day of each month from ‘01-01-
2020’ to ’01-12-2024’. The Tax attribute is a positive real number with size 64 bytes representing the revenue
(in GBP) of a specific Council in a specific month. An example of this relation is provided below (the tuple values
do not reflect any real figures):
HMRC
Council RecordedDate Tax
Birmingham City Council 01-01-2020 261,000,000
Birmingham City Council 01-02-2020 236,000,000
Birmingham City Council 01-03-2020 240,000,000
… … …
Birmingham City Council 01-12-2024 310,000,000
Liverpool City Council 01-01-2020 110,000,000
Liverpool City Council 01-02-2020 98,000,000
…
Liverpool City Council 01-12-2024 135,000,000
Glasgow City Council 01-01-2020 280,000,000
…. … …
Glasgow City Council 01-12-2024 199,000,000
… …
For each city council, there are 48 tax revenue values across 48 months (one per month), thus, the HMRC
relation has 320*48 = 15,360 tuples. The file accommodating the HMRC relation is not sorted by any attribute.
Your team is asked to execute a window function-based analytics query showing the moving average of the tax
per month per city. Given a city, the tax moving average of a month X takes the average of the tax values of the
months X-1, X, an X+1. For example, for Birmingham City Council, the moving average for ’01-02-2020’ is:
(£261M + £236M + £240M)/3 = £245.67M. The outcome of the moving average is stored in the attribute
MovingAVG with size 64 bytes.
The analytics query is provided below:
SELECT Council, RecordedDate, Tax,
AVG (Tax) OVER (PARTITION BY Council
ORDER BY RecordedDate ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
AS MovingAVG FROM HMRC;
Objective: Practice with processing advanced analytics SQL operators using File and Indexing
Structures
Given that this query may be deployed across diverse operational scenarios—such as high-frequency real-time
dashboards for finance departments, annual compliance reporting for councils, or ad-hoc analytics for external
auditors—your team must investigate how the underlying file and indexing structures perform under varying
workloads, access patterns, and resource constraints (e.g., memory, latency tolerance, or infrastructure costs).
Currently, three strategies are proposed for further consideration:
Strategy 1: Sorting File Structure [20 Marks]
Your team proposes to sort the HMRC file by the Council attribute using external sorting. You are asked to
devise a query processing algorithm implementing the query using the sorted HMRC file only. For the external
sorting, the degree of merging M = 2 and the available memory for sorting a ‘sub-file’ is 1,280 blocks.
Strategy 2: Hashing File Structure [20 Marks]
Your team proposes to hash the HMRC file using the Council attribute. You are asked to devise a query
processing algorithm implementing the query using the hashed HMRC file only. The available memory
dedicated for hashing is 10,000 blocks.
Strategy 3: Indexing Structure with B+ Tree [20 Marks]
Your team proposes to create a B+ Tree (secondary index) to index the HMRC file using the non-ordering, no unique Council attribute. Each internal node is of order p = 4 (4 pointers/3 key values). Each leaf node stores 3
key values along with 3 pointers and 1 sibling pointer to the ‘next’ leaf node. Each node is stored in 1 block.
You are asked to devise a query processing algorithm implementing the query using the B+ Tree. The cost for
building the B+ Tree is equal to the cost for accessing all the blocks of the HMRC file.
For each Strategy:
• Optional: Provide a simple sketch/diagram of the underlying data structure(s), which can help you
visualizing the query processing steps of your solutions.
• Optional: You can use pgAdmin4 to create the relation HMRC populating with fictitious tuples. Then,
use the EXPLAIN tool over the provided query and interpret the query processing outcome from your
DB server. You could also explore more options, like creating an index over the Council and/or
RecordedDate, and then use the EXPLAIN tool again to check whether there you obtain more efficient
query processing.
• Compulsory:
o Describe the steps of your method/algorithm. You must explain the steps you follow to access
the blocks and how you calculate the moving average having accessed the required records.
Note: the outcome of the query is stored in the main memory where the moving average is
stored in the attribute MovingAVG (defined in the query). Assume that there is sufficient
memory for storing the outcome of the query. [10 marks per strategy, 30 total]
o Report on (i) the expected number of block accesses of each proposed query processing
strategy and (ii) the required storage of the proposed structure, if any (e.g., size of B+ Tree).
[5 marks per strategy, 15 total]
o Calculate the total memory (in blocks) required to store the results of the query. Justify your
answer. [5 marks total]
o Recommend the best overall strategy for different use cases (e.g., high-frequency queries vs.
annual reporting) supported by your cost analysis. [10 marks total]
A file block has size 512 bytes, and any pointer has size 16 bytes.
Notes & Submission
Note 1: The Assessed Group Work is graded out of 60 Marks. The group submits one product, and all group
members receive the same grade.
Note 2: If there are any issues regarding the load contribution/effort of members, please let me know by
emailing the whole group and me.
Note 3: Answer ALL tasks.
Note 4: Only one member of your group will need to submit a document including your group details and
answers in one PDF document file.
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代写cs918 sentiment classifi...
2025-04-02
代做llp714 corporate social ...
2025-04-02
代做cs 338 – winter 2025 as...
2025-04-02
代做21797 strategic supply c...
2025-04-02
代做ee 5711: power electroni...
2025-04-02
代写llaw6055 law of internat...
2025-04-02
代写dts208tc data analytics ...
2025-04-02
代做bees2041 data analysis f...
2025-04-02
代做econ154 business statist...
2025-04-02
代写cit 596 - hw5代做留学生j...
2025-04-02
代做data driven business代写...
2025-04-02
代写envi5705 – assessment 2...
2025-04-02
代写econ154 - statistical fo...
2025-04-02
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!