首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
代做program、代写Python编程语言
项目预算:
开发周期:
发布时间:
要求地区:
Assignment 6
1/4
Assignment 6
5/1/2024
10 Points Possible
In Progress
NEXT UP: Submit Assignment
Unlimited Attempts Allowed
4/22/2024
Attempt 1 Add Comment
Details
For this assignment, you will submit a README.md with your answers to the questions below, along with the code you used to
produce your answers (including all boto3 scripts necessary to reproduce your cloud infrastructure, where relevant). You should
commit your Assignment 6 file(s) to your private “a6” GitHub repository (click here (https://classroom.github.com/a/jXPdPm3s) to
accept the GitHub Classroom invitation to access this repository) and submit a link to your repository here on the Canvas (clicking
the “Submit Assignment” button to make your submission). You must work alone on this assignment. Before submitting your
assignment, please take a look at the tips one of the previous TAs for the course (Jinfei Zhu) compiled for writing a grader-friendly
README file and organizing your assignment GitHub repository (https://github.com/lsc4ss-a21/assignment-submission?template) if you have not already done so.
1. (6 Points Total) This first prompt builds on the survey submission pipeline you have been working on in Assignments 4 and 5. As
a final step in your survey submission pipeline, you will write a Python function that can be invoked on a survey participant’s
mobile device when they complete a survey to send their survey submission into an SQS queue, which should then trigger the
Lambda function you wrote in Assignments 4 and 5.
Note that each survey submission is initially saved as a JSON file (on the mobile device) when a participant completes a survey
via the mobile app (see example files here ()
() ). For the purposes of this prompt, you do
not need to worry about the implementation of the mobile app or the creation of these JSON files. Your job is to write a Python
function that will send a string representation of this JSON data (an individual survey) into an AWS SQS queue (your function
will then be incorporated into the mobile app by another researcher). The SQS queue should then trigger your AWS Lambda
function from Assignment 5, which will take this survey submission data and perform necessary processing and storage
operations in the cloud. You should accomplish all of these tasks programmatically (using boto3 ) to ensure reproducibility of
your architecture. Specifically, you should complete the following tasks:
a. (1 Point) Write a Python function send_survey (which you can assume will be installed with the mobile app and will
automatically be invoked after a survey is saved as a JSON file on the device) that has the following signature:
def send_survey(survey_path, sqs_url):
'''
Input: survey_path (str): path to JSON survey data
(e.g. `./survey.json')
sqs_url (str): URL for SQS queue
Output: StatusCode (int): indicating whether the survey
was successfully sent into the SQS queue (200) or not (400)
'''
In the function body, you should use boto3 to send the data from a survey (a JSON file on the mobile device, converted into
a string representation) into an AWS SQS queue.
b. (2 Points) Create an SQS queue and configure it to act as a trigger for your Lambda function from Assignment 5 (which will
process your data and write it to storage).
Note that if you test your full survey submission pipeline using the example JSON files provided above (in a loop,
using time.sleep(10) in between survey submissions, as in Assignment 5), you should see the following keys in your S3
Bucket:
['0001092821120000.json', '0001092921120000.json', '0001093021120300.json',
'0002092821120000.json', '0003092821120001.json', '0004092821120002.json',
Assignment 6
2/4
'0005092821122000.json']
You should also see the following records if you query your DynamoDB table:
{'q1': Decimal('1'), 'q2': Decimal('1'), 'user_id': '0001',
'q3': Decimal('2'), 'q4': Decimal('2'), 'q5': Decimal('2'),
'num_submission': Decimal('3'),
'freetext': "I lost my car keys this afternoon at lunch, so I'm more stressed than normal"}
{'q1': Decimal('4'), 'q2': Decimal('1'), 'user_id': '0002',
'q3': Decimal('1'), 'q4': Decimal('1'), 'q5': Decimal('3'),
'num_submission': Decimal('1'),
'freetext': "I'm having a great day!"}
{'q1': Decimal('1'), 'q2': Decimal('3'), 'user_id': '0003',
'q3': Decimal('3'), 'q4': Decimal('1'), 'q5': Decimal('4'),
'num_submission': Decimal('1'),
'freetext': 'It was a beautiful, sunny day today.'}
{'q1': Decimal('1'), 'q2': Decimal('1'), 'user_id': '0004',
'q3': Decimal('1'), 'q4': Decimal('1'), 'q5': Decimal('1'),
'num_submission': Decimal('1'),
'freetext': 'I had a very bad day today...'}
{'q1': Decimal('3'), 'q2': Decimal('3'), 'user_id': '0005',
'q3': Decimal('3'), 'q4': Decimal('3'), 'q5': Decimal('3'),
'num_submission': Decimal('1'),
'freetext': "I'm feeling okay, but not spectacular"}
c. (3 Points) Your PI, who is overseeing this project, is worried that if all of the participants in the study (potentially thousands)
submit surveys at the same time in the day, this might cause the system to crash and your lab might lose data (this
happened to your PI when they ran a similar digital survey via on-premise servers in the early 2000s). How would you
reassure your PI that your architecture is scalable and will be able to handle such spikes in demand? Your response should
be at least 200 words and discuss the scalability of each of the cloud services you used in your pipeline in detail.
2. (4 points) For this prompt, we ask you to declare whether you will complete a Final Project or a Final Exam as your capstone
assignment for the course. You are welcome to meet with course staff and discuss your options and ideas with us before
making your election and submitting your answer to this prompt.
If you wish to complete a Final Project, you should additionally write a ~250 word-proposal in your README for this
assignment, detailing your plan for the project (see expectations and sample projects on the Final Exam/Final Project
Assignment page () ). You should explain why your project
idea helps to solve a social science research problem using large-scale computing methods and outline a schedule for
completing the project by the deadline. If you are working in a group, you should also write down the names of your group
members and describe how you are going to split up the work amongst yourselves.
If you wish to take a Final Exam, you should instead write one question for possible inclusion in the Final Exam and submit it
in your README for this assignment. The better the question you submit, the higher the likelihood you will see the question
(or a closely related one) on the exam. We will additionally post the best questions to the Final Exam page on Canvas so that
you can use them as study material for the exam. Note: YOU WILL NEED TO PROVIDE THE SOLUTION FOR YOUR
QUESTIONS. A good question is one that goes beyond memorization and asks the student to apply a concept in a way that
is similar to what we do in our in-class activities and conceptual questions in assignments (we will not ask implementation
questions that involve writing code from scratch). Specifically, we plan to include questions of the following types (for
additional examples, you can take a look at past examples of questions used on the exam on the Final Exam/Final Project
assignment () page):
Applied Conceptual Questions, such as:
You are conducting a large digital experiment, in which you have designed an online music sharing application and
recruited participants to use the platform over the course of a month. During the experiment, you will manipulate features
of the website in order to test your research hypotheses. In order to run the experiment, you need to be able to
collect/record thousands of data points per second; for instance: tracking the songs that participants download, the
treatments that they were exposed to (by you the researcher), as well as all of the things that participants click on. When
the experiment is over, you would like to perform a statistical analysis on a subset of the data to identify experimental
interventions that caused participants to change their clicking/downloading behavior. Ultimately, when your work is
published, you would also like to have your (de-identified) data publicly accessible, so that future scholars can replicate
your statistical analysis.
Assignment 6
3/4
What databases and/or storage solutions would you use to solve these problems (storing data while you run the
experiment, as well as afterwards) in the AWS cloud ecosystem? Why? How about if you scaled the experiment up by
several orders of magnitude to include millions of participants? Would this change your data storage/management
solution?
Code Interpretation Questions, such as:
Below is a serial version of a Monte Carlo simulation to estimate π that is written in Python. Identify parts of this code
that could be accelerated using a GPU, as well as those that would best be run on a CPU – attempting to accelerate the
estimation of π as much as possible. For each section of code, you should explain why your answers are the best
hardware options for optimal performance (e.g. thinking in terms of some of the key bottlenecks and hardware limitations
for CPUs vs. GPUs).
# NumPy Pi Estimation with Monte Carlo Simulation
import numpy as np
import time
t0 = time.time()
n_runs = 10 ** 8 # Simulate Random Coordinates in Unit Square:
ran = np.random.uniform(low=-1, high=1, size=(2, n_runs))
# Identify Random Coordinates that fall within Unit Circle and count them
result = ran[0] ** 2 + ran[1] ** 2 <= 1
n_in_circle = np.sum(result)
# Estimate Pi
print("Pi Estimate: ", 4 * n_in_circle / n_runs)
print("Time Elapsed: ", time.time() - t0)
Troubleshooting Questions, such as:
You are training a linear regression model to predict the price of an AirBnB listing given a variety of text features derived
from the listing’s description on AirBnB (note that AirBnB publishes this data in CSV format for listings across the world
and the data is updated on a monthly basis).
You have written a machine learning workflow in PySpark that does the following on an AWS EMR cluster composed of 3
m5.xlarge EC2 instances (1 resource manager and 2 core instances), with 10 GB in EBS storage available on each
instance:
1. Cleans the description text data (e.g. drops stop words and punctuation) from all AirBnB listings around the world
from the past month (prior to the current month).
2. Engineers features based on the clean description data (such as categorical and binary features indicating whether
the description contains certain types of words).
3. Uses MLLib’s CrossValidator to identify the optimal hyperparameters for your linear regression model given a grid of
possible values used to tune the model (i.e. a grid search)
4. Trains the regression model using the optimal hyperparameters from (3) and make predictions on the prices of AirBnB
listings from the current month.
Having successfully run this workflow on one previous month of data, you want to increase your training data size to
several years worth of data. As you increase the amount of data entering into your pipeline, though, you begin to observe
unexpected (i.e. nonlinear) diminishing performance (in terms of speed) and beyond a certain data size, your job will not
complete at all – it keeps running indefinitely.
Describe at least two possible root causes of this slowdown (considering both hardware and software). Why would these
be concerns? Is it possible to remedy them? What would be your solutions?
Some hints for writing good questions:
You shouldn’t make the question needlessly complicated or overly verbose.
Try to be clear about what you’re asking and what you’re looking for.
Try to cover multiple topics from the course – i.e. a question that touches on the memory hierarchy, GPU vs. CPU
Parallelism, and Spark’s execution model, would be better than one that is narrowly relevant to invoking a Lambda function.
Assignment 6
4/4
Anything we’ve covered in the class is fair game (and you’re welcome to continue submitting relevant questions through
Wednesday of Week 9 related to material we cover after this assignment – you just will not receive additional credit).
Clear social science tie-ins are preferred
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做 program、代写 c++设计程...
2024-12-23
comp2012j 代写、代做 java 设...
2024-12-23
代做 data 编程、代写 python/...
2024-12-23
代做en.553.413-613 applied s...
2024-12-23
代做steady-state analvsis代做...
2024-12-23
代写photo essay of a deciduo...
2024-12-23
代写gpa analyzer调试c/c++语言
2024-12-23
代做comp 330 (fall 2024): as...
2024-12-23
代写pstat 160a fall 2024 - a...
2024-12-23
代做pstat 160a: stochastic p...
2024-12-23
代做7ssgn110 environmental d...
2024-12-23
代做compsci 4039 programming...
2024-12-23
代做lab exercise 8: dictiona...
2024-12-23
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!