Coursework - EMATM0051 Large Scale Data Engineering (Resit)
Summary
This coursework is divided into two parts:
Part 1: A written task (only) related to the knowledge gained in the AWS Academy Cloud Foundations course (weeks 1-7).
Part 2: A combined practical and written activity architecting a scaling application on the Cloud,
where you will be required to use knowledge gained and a little further research to implement the scaling infrastructure, followed by a report that will focus on your experience in the practical activity together with knowledge gained in the entire LSDE course.
Weighting: This assessment is worth 100% of your total unit 20 credits. Pre-requisites:
• You must have completed the AWS Academy Cloud Foundations course set in weeks 1-8
• You will require an AWS Academy Lab account for the practical activity. You should receive an invite when this document is released. Please contact the LSDE Unit Director if you have no invitation email or are having issues with the registration.
• A Secure Shell (SSH) client, such as MacOS Terminal or PuTTy on Windows, for server admin.
Submission:
Via the LSDE BlackBoard coursework assessment page, submit one .pdf file (named ‘your_username.pdf, e.g. tl18303.pdf), containing:
• A report (‘report.pdf’) in PDF format containing:
o Part 1
o Part 2
o Your AWS Academy account credentials (username, password)
In this document we provide a detailed explanation of the tasks and the approach to marking.
Task 1: (25%)
Write a maximum of 1000 words (minimum: 600) debating the statement:
“Transform. the retail industry by leveraging cloud technology” . You could know some background
information via this link: https://anywhere.tech/cloud-services/cloud-computing-in-retail/ Include your own descriptions of the following:
• At least 5 AWS features or services introduced in the Cloud Foundations course that make cloud service advantageous for retail industry.
• At least 3 different scenarios where the cloud service may have challenges for transforming retail industry.
Task 2: Scaling the WordPress Application (75%)
Overview
WordPress is by far the most popular open-source software for hosting online blogs and small-scale websites. It is a PHP application, backed by a MySQL database (NOTE: you are NOT expected to understand or modify the source code in any way).
WordPress includes a password-secured browser admin interface that enables blog posts and other content to be created, management of users, review of blog metrics, installation of extensions (known as ‘plugins’), and so on.
WordPress is typically installed on a single EC2 server, but as we saw in the Cloud Foundations course, a single server has limitations in availability, scalability, performance, etc. This can affect the speed of response (latency) and thus performance and cost (seethis article).
Your task will be to take a default, minimal installation of WordPress and implement a resilient, highly available, scalable, cost effective and secure architecture for it on AWS. This will include performing load testing on your application to demonstrate improved performance under stress.
You will be required to initially set up and test the application, using instructions given with the zip download file. You will then need to identify how to scale and improve the application architecture, based on principles learned in the CF course. Finally, you will write a report covering this process, along with some extra material.
Task A – Install the Application
Ensure you have set up access to your AWS Academy Lab account and have at least $40 credit (you are provided with $100 to start with). If you are running short of credit, please inform your instructor.
Refer to the WordPress installation instructions in the coursework.zip download on the BlackBoard site, to install and configure the application in your AWS Academy Lab account. These instructions do not cover every step – you are assumed to be confident in certain tasks, such as in the use of IAM permissions, launching an EC2 instance, etc.
You will set up a single server installation of WordPress, using a pre-built community AMI, then configure it appropriately for this assessment.
Before moving on to the next task, ensure that:
• You can access the WordPress administration interface and can create & manage blog posts.
• You have the required plugin(s) installed and configured.
• You have successfully set up SSH (command line) access to the WordPress instance.
• You have successfully set up the load testing site and run some trial load tests.
NOTE: The application and plugin code are programmed in the PHP language. You are NOT expected to understand or modify it. Any code changes will be ignored and may lose marks.
Task B - Design and Implement Auto-scaling
Review the architecture of the existing application. Although the website is usable for one visitor (client), when run under the load tester for multiple clients the response (latency) becomes noticeably slow (5000ms / 5 seconds or more to load a page).
To better handle multiple clients, we need to add scaling to the application. This should function as follows:
- When a given maximum performance metric threshold is exceeded, an identical WordPress instance is launched (to a maximum of 3 instances) and begins to also respond to incoming requests.
- When a given minimum performance metric threshold is exceeded, the most recently launched WordPress instance is removed (terminated).
- There must always be at least one WordPress instance available to respond to incoming requests when the WordPress website architecture is 'live'.
Using the knowledge gained from the Cloud Foundations course, architect and implement auto-scaling functionality for the WordPress application. You can refer to the Lab 6 in Module 10, which is also for a web application. You will need to identify a CloudWatch performance metric to use for the ‘scale out’ and ‘scale in’ rules – it’s wise to review CloudWatch metrics for the EC2 service after running the application for a while under load, to pinpoint the most appropriate metric.
NOTE: The free version loader.io only provide 1mins test, so you could manually run tests continuously to show the performance of auto-scaling.
Task C - Perform Load Testing
Once you have set up your auto-scaling infrastructure, test that it works. Set (edit) the test in the load tester to use initially 250 clients, then 350 clients, then 500 clients.
If your autoscaling functionality is configured correctly, you should be able to achieve a latency response of about 1 second with 500 clients. If the load tester produces an error during testing, the response time is too high and you will need to fine tune your auto scaling parameters more.
• Watch the behaviour of your WordPress application, to check the scale out (add instances) and scale in (remove instances) behaviour works.
• Take screenshots of the EC2 instance page showing launched / terminated instances along with the load tester graphs.
• Try to optimise the scaling operation so that instances are launched quickly when required and terminated soon (but not immediately) when not required. Note settings you used and the fastest processing time you can achieve.
• Try using a few different EC2 instance types – with more CPU power, memory, etc. Note down any changes in processing time.
You will need to list all the configurations of scaling policies and instance types used for testing the performance of your auto-scaling and record the time for each experiment. Detailed discussion about how to optimise the operation is required.
NOTE: Academy Lab accounts are limited in which EC2 Types and services they can use.
Task D - Secure and Optimise the WordPress Architecture
Using services, features and techniques learned from the Cloud Foundations course, improve the architecture in the areas of 1) security, 2) availability and reliability, and 3) cost optimization. Your discussion should cover all these three aspects.
NOTE: You are NOT expected to modify the Web application.
NOTE: The Academy Lab account is limited in enabling full configuration of security features, so if your account prevents you from implementing your requirements, explain this in the final report.
Task E - Create the Final Report
Write a report of no more than 3500 words and 20 A4 pages (there is NO minimum), including:
• A brief summary of the benefits and pitfalls of the initial WordPress architecture in Task A.
• Your design process to architect the scaling behaviours (task B).
• An overview of the testing and your results, including screenshots (task C).
• Your optimisation steps (task D).
• Details of any issues you had and whether you resolved them.
Add one final section:
• Discuss (briefly): If you add a blog post to an instance in your scaled architecture, this post may not always appear if you have two or more instances running. Why is this?
• Explain (at a high level): Based on services and frameworks covered in the LSDE course, describe step by step how you could fix the above issue using a relevant scalable, highly available, managed service of your choice.
[Do not implement this idea, just explain the basic workflow of configuring this for your WordPress architecture].
And add your lab credentials at the last page.
The report should be in the form. of PDF. It does not need to follow any academic format, but you should use grammar and spelling checkers on it and make good use of paragraphs and sub-headings.
Double-spacing is not required. Use diagrams where they make sense and include captions & references from the text.
[IMPORTANT: All text not originally created by you must be cited, leading to a final numbered reference section (based on e.g. the British Standard Numeric System) to avoid accusations of plagiarism.]
[IMPORTANT: Disable autoscaling at end of each lab session: – Desired capacity = 0 ; Minimum capacity = 0. This saves credit and avoids multiple instances from launching and terminating when starting / stopping a lab session]
AWS Academy Learner Lab
You are given an AWS Academy Learner Lab account for this coursework. Each account has $100 assigned to it, which is updated every 24 hours and displayed on the Academy Lab page.
To access the lab from AWS Academy, select Courses > AWSAcademy Learner Lab > Modules >
Learner Lab - Foundation Services. On this page click ‘Start Lab’ to start a new lab session, then the ‘AWS’ link to open the AWS Console once the button beside the link is green.
Please note:
• Ensure you shut down (stop or terminate) EC2 instances when you are not using them.
These will use the most credit in your account in this exercise. Note that the Learner Lab will stop running instances when a session ends, then restart them when a new session begins.
• AWS Learner Lab accounts have only a limited subset of AWS services / features available to them, see the Readme file on the Lab page (Service usage and other restrictions) or thisAWS Academy Page.
• If you haveinstalled the AWS CLIon your PC and wish to access your Learner Lab account, you will need the credentials (access key ID & secret access key) shown by pressing the AWS Details button on the Lab page. Note that these only remain valid for the current session.
• If you have any issues with AWS Academy or the Learner Lab, please book an Office Hours session to seek help FIRST, email the instructors if there is no other option.
Support
The normal options for support are available for you.
• Office Hours: will vary over the Summer – please check with the instructor
Marking
Below are the marking bands with maximum possible mark range achievable given approximate scope of work.
+80% Outstanding report and implementation. Extensive exploration, analysis and implementation demonstrating deep understanding and reading outside of the CF course and lectures.
70 - 80% Excellent report. Well architected, fully functional auto-scaling, great optimisation techniques, very good understanding of cloud principles gained in the CF course.
60 - 70% Report of correct length, fully functional auto-scaling, good optimisation techniques, good understanding of cloud principles gained in the CF course.
50 - 60% Report of correct length, basic but functional auto-scaling, some good ideas about optimisation techniques, correct understanding of main cloud principles in the CF course.
<50% (Fail) Report is not at an appropriate standard, auto-scaling not implemented. Objectives of the assignment have not been demonstrated.