首页 > > 详细

代做program、代写Python/C++设计程序

项目预算:   开发周期:  发布时间:   要求地区:
Assessed Coursework
Course Name Database Systems (H)
Coursework Number 1/1
Deadline Time: 16h30 Date: 14/03/2025
% Contribution to final
course mark
20%
Solo or Group ✓ Solo Group ✓
Anticipated Hours Average 20 hours
Submission Instructions
Please, read the description.
Please Note: This Coursework cannot be Re-Assessed
Code of Assessment Rules for Coursework Submission
Deadlines for the submission of coursework which is to be formally assessed will be published in course
documentation, and work which is submitted later than the deadline will be subject to penalty as set out
below.
The primary grade and secondary band awarded for coursework which is submitted after the published
deadline will be calculated as follows:
(i) in respect of work submitted not more than five working days after the deadline
a. the work will be assessed in the usual way;
b. the primary grade and secondary band so determined will then be reduced by two secondary
bands for each working day (or part of a working day) the work was submitted late.
(ii) work submitted more than five working days after the deadline will be awarded Grade H.
Penalties for late submission of coursework will not be imposed if good cause is established for the late
submission. You should submit documents supporting good cause via MyCampus.
Penalty for non-adherence to Submission Instructions is: 2 bands
Data Engineering 2025
Task 1 [60 Marks]
Your team is hired by the HM Revenue & Customs (HMRC) to execute an analytics query over the last 4 years
estimating the moving average of the collected city council tax across all the UK City Councils. Specifically, you
are given access to the relation HMRC (Council, RecordedDate, Tax). The attribute Council is 128 bytes with
320 different values, each one associated with a city council regional authority. The attribute RecordedDate is
64 bytes with 48 different values, each one representing the date of the first day of each month from ‘01-01-
2020’ to ’01-12-2024’. The Tax attribute is a positive real number with size 64 bytes representing the revenue
(in GBP) of a specific Council in a specific month. An example of this relation is provided below (the tuple values
do not reflect any real figures):
HMRC
Council RecordedDate Tax
Birmingham City Council 01-01-2020 261,000,000
Birmingham City Council 01-02-2020 236,000,000
Birmingham City Council 01-03-2020 240,000,000
… … …
Birmingham City Council 01-12-2024 310,000,000
Liverpool City Council 01-01-2020 110,000,000
Liverpool City Council 01-02-2020 98,000,000

Liverpool City Council 01-12-2024 135,000,000
Glasgow City Council 01-01-2020 280,000,000
…. … …
Glasgow City Council 01-12-2024 199,000,000
… …
For each city council, there are 48 tax revenue values across 48 months (one per month), thus, the HMRC
relation has 320*48 = 15,360 tuples. The file accommodating the HMRC relation is not sorted by any attribute.
Your team is asked to execute a window function-based analytics query showing the moving average of the tax
per month per city. Given a city, the tax moving average of a month X takes the average of the tax values of the
months X-1, X, an X+1. For example, for Birmingham City Council, the moving average for ’01-02-2020’ is:
(£261M + £236M + £240M)/3 = £245.67M. The outcome of the moving average is stored in the attribute
MovingAVG with size 64 bytes.
The analytics query is provided below:
SELECT Council, RecordedDate, Tax,
AVG (Tax) OVER (PARTITION BY Council
ORDER BY RecordedDate ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
AS MovingAVG FROM HMRC;
Objective: Practice with processing advanced analytics SQL operators using File and Indexing
Structures
Given that this query may be deployed across diverse operational scenarios—such as high-frequency real-time
dashboards for finance departments, annual compliance reporting for councils, or ad-hoc analytics for external
auditors—your team must investigate how the underlying file and indexing structures perform under varying
workloads, access patterns, and resource constraints (e.g., memory, latency tolerance, or infrastructure costs).
Currently, three strategies are proposed for further consideration:
Strategy 1: Sorting File Structure [20 Marks]
Your team proposes to sort the HMRC file by the Council attribute using external sorting. You are asked to
devise a query processing algorithm implementing the query using the sorted HMRC file only. For the external
sorting, the degree of merging M = 2 and the available memory for sorting a ‘sub-file’ is 1,280 blocks.
Strategy 2: Hashing File Structure [20 Marks]
Your team proposes to hash the HMRC file using the Council attribute. You are asked to devise a query
processing algorithm implementing the query using the hashed HMRC file only. The available memory
dedicated for hashing is 10,000 blocks.
Strategy 3: Indexing Structure with B+ Tree [20 Marks]
Your team proposes to create a B+ Tree (secondary index) to index the HMRC file using the non-ordering, no unique Council attribute. Each internal node is of order p = 4 (4 pointers/3 key values). Each leaf node stores 3
key values along with 3 pointers and 1 sibling pointer to the ‘next’ leaf node. Each node is stored in 1 block.
You are asked to devise a query processing algorithm implementing the query using the B+ Tree. The cost for
building the B+ Tree is equal to the cost for accessing all the blocks of the HMRC file.
For each Strategy:
• Optional: Provide a simple sketch/diagram of the underlying data structure(s), which can help you
visualizing the query processing steps of your solutions.
• Optional: You can use pgAdmin4 to create the relation HMRC populating with fictitious tuples. Then,
use the EXPLAIN tool over the provided query and interpret the query processing outcome from your
DB server. You could also explore more options, like creating an index over the Council and/or
RecordedDate, and then use the EXPLAIN tool again to check whether there you obtain more efficient
query processing.
• Compulsory:
o Describe the steps of your method/algorithm. You must explain the steps you follow to access
the blocks and how you calculate the moving average having accessed the required records.
Note: the outcome of the query is stored in the main memory where the moving average is
stored in the attribute MovingAVG (defined in the query). Assume that there is sufficient
memory for storing the outcome of the query. [10 marks per strategy, 30 total]
o Report on (i) the expected number of block accesses of each proposed query processing
strategy and (ii) the required storage of the proposed structure, if any (e.g., size of B+ Tree).
[5 marks per strategy, 15 total]
o Calculate the total memory (in blocks) required to store the results of the query. Justify your
answer. [5 marks total]
o Recommend the best overall strategy for different use cases (e.g., high-frequency queries vs.
annual reporting) supported by your cost analysis. [10 marks total]
A file block has size 512 bytes, and any pointer has size 16 bytes.
Notes & Submission
Note 1: The Assessed Group Work is graded out of 60 Marks. The group submits one product, and all group
members receive the same grade.
Note 2: If there are any issues regarding the load contribution/effort of members, please let me know by
emailing the whole group and me.
Note 3: Answer ALL tasks.
Note 4: Only one member of your group will need to submit a document including your group details and
answers in one PDF document file.

软件开发、广告设计客服
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 9951568
© 2021 www.rj363.com
软件定制开发网!