首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
CS3103程序讲解、Programming编程讲解、辅导C/C++程序 辅导Python程序|辅导Python程序
项目预算:
开发周期:
发布时间:
要求地区:
CS3103: Operating
Systems
Spring 2021
Programming Assignment 2
1 Goals
The purpose ofthis assignment is to helpyou:
• get familiar with multi-threaded programming using pthread
• get familiar with mutual exclusion using mutexes
• get familiar with synchronization using semaphores
2 Background
Sentiment analysis, which is a powerful technique based on natural language processing,
has a wide range of applications, including consumer reviews analysis, recommender
system, political campaigning, stock speculation, etc. A sentiment analysis model requires a
large text corpus, which consists of classified articles grabbed from the internet using web
crawlers.
In the simplest scenario, a text corpus can be built by two components: a web crawler and a
classifier. The crawler browses through web pages and grabs articles from websites. The
grabbed articles are stored in a buffer, from which the classifier processes articles and
classifies them.
Considering the complexity of modern websites, it usually takes a long time for a crawler to
locate and grab an article from the web page. So, the speed of crawlers is usually too slow
for the classifier. Thus, multiple crawlers would be a better choice.
3 Components and Requirements
You are required to design and implement three crawlers, a buffer and a classifier in
C/C++ on Linux (other languages are not allowed). Mutual exclusion and synchronization
must be done with mutex and semaphore provided in libraries
and
.
3.1 crawler
Each crawler thread is created to grab articles from websites and load them into the
buffer. It keeps doing grabbing and loading job, which takes time interval_A, until
the buffer is full. And then it starts waiting until the classifier deletes an article from
the buffer.
A function char* str_generator(void), is provided to generate articles for the
crawler to grab and each article is represented by a string of 50 characters.
3.2 buffer
The buffer structure is a first-in-first-out (FIFO) queue. It is used to store the grabbed
articles from crawlers temporarily, until they are taken by the classifier. It can store up to
12 articles at the same time. You need to implement your own queue. You are not allowed
to use standard c++ library (e.g., queue or other container provided by standard template
library) or third-party libraries.
2
3.3 classifier
A classifier thread is created to classify the articles grabbed by the crawlers in FIFO order.
Specifically, there are two steps in the procedure:
1. Pre-processing: the classifier makes a copy of the article at the head of the buffer, changes
all the uppercase letter (‘A’-‘Z’) to lowercase letter (‘a’-‘z’) and deletes any symbol that is not
a letter.
2. Classification: the classifier classifies the article into one of the 13 classes based on the
first letter, x, of the processed article as follows.
Class label = int(x – ‘a’)%13 + 1
Next, an auto-increasing key starting from 1 will be given to the classified article. (So, the
keys of classified articles are 1, 2, 3, …). At last, the key, the class label and the original
article, are stored to the text corpus in a text file. Then, the classifier deletes the classified
article in the buffer. The whole procedure takes time of interval_B.
3.4 termination
The articles are divided into 13 classes. Denote the number of articles in each class as C1, C2,
… C13, and p = min{ C1, C2, … C13}. When p ≥ 5, the classifier notifies all crawlers to quit after
finishing the current job at hand, and then the program terminates.
3.5 input arguments
Your program has to accept the following two arguments in input order:
interval_A, interval_B: integer, unit: microsecond.
3.6 sample outputs
The outputs of your program are:
• A table with multiple columns shown on the screen, each column shows the
activities of a single thread in time order, and each row shows only one single
activity of a thread.
• The text corpus, each line consists of a key, a class label and an article separated by a
space.
All activities that need to be recorded for each thread are listed below, together with their
abbreviations.
Crawler:
start – crawler starts.
grab – crawler starts to grab an article.
f-grab – an article has been grabbed and loaded into the buffer.
wait – crawler starts waiting for available space in the buffer.
s-wait – crawler stops waiting.
quit – crawler finished all job and about to quit.
Classifier:
start – classifier starts.
clfy – classifier starts to classify an article.
f-clfy – the article has been classified and deleted from the buffer.
k-enough – k number of articles have been classified and the classifier notifies all
threads to quit.
3
n-stored – a total n articles have been stored in the text corpus.
quit – classifier finished all job and about to quit.
Below are sample output of the table on the screen and the text corpus. For example, in the
table, crawler1 started at t1, then, crawler2 started at t2 and grabbed at t3, and so on.
Beginning of the table End of the table
text corpus
4 Challenge
This challenge is for those students wish to get an A+ grade in this programming
assignment and to take one more step to the real-world application.
Most modern websites are under anti-crawler protection. Thus, crawlers should be updated
with new IP addresses and cookies periodically to get through the barrier.
A strategy manager thread is created to update the crawlers with a new IP and cookies. Each
crawler notifies the strategy manager to update its IP and cookies after every M articles are
4
grabbed. The update takes time of interval_C. The input and extra output are listed below.
Your program has to accept the following arguments in input order:
interval_A, interval_B, interval_C: integer, unit: microsecond, M: integer.
Crawler: two more activities have to be recorded:
rest – crawler starts resting.
s-rest – crawler stops resting.
Strategy-Manager:
start – manager starts.
get-crx – manager gets a notification from crawler x.
up-crx – manager updated crawler x with new IP and cookies.
quit – manager finished all job and about to quit.
5 Helper Program and Hint
5.1 generator.cpp
The function char* str_generator(void) is provided in the file generator.cpp. It
returns a string (char array) of length 50. Use it by declaring a prototype in your code and
compiling it along with your source code.
5.2 hint
Multi-threading needs careful manipulation. A specious program may show correctness in
several tests at the beginning, but collapses at the later tests. Thus, testing your program
multiple times would be a good choice. Testing it with different arguments would be even
better.
6 Marking Scheme
Yourprogramwill be testedonour CSLabLinux servers (cs3103-01, cs3103-02, cs3103-03).
You should describe clearly how to compile and run your program as comments in your
source program file. If an executable file cannot be generated and running
successfully on our Linux servers, it will be considered as unsuccessful.
A. Design and use of multi-threading (15%)
• Thread-safe multithreaded design and correct use of thread-management
functions
• Non-multithreaded implementation (0%)
B. Design and use of mutexes (15%)
• Complete, correct and non-excessive use of mutexes
• Useless/unnecessary use of mutexes (0%)
C. Design and use of semaphores (30%)
• Complete, correct and non-excessive use of semaphores
• Useless / unnecessary use of semaphores (0%)
D. Degree of concurrency (15%)
• A design with higher concurrency is preferable to one with lower
concurrency.
o An example of lower concurrency: only one thread can access the buffer at
a time.
o An example of higher concurrency: various threads can access the buffer
5
but works on different articles at a time.
• No concurrency (0%)
E. Program correctness (15%)
• Complete and correct implementation of other features including:
o correct logic and coding of thread functions
o correct coding of queue and related operations
o passing parameters to the program on the command line
o program output conform to the format of the sample output
o successful program termination
• Fail to pass the g++ complier on our Linux servers to generate a runnable
executable file (0%)
F. Programming style and documentation (10%)
• Good programming style
• Clear comments in the program to describe the design and logic
• Unreadable program without any comment (0%)
7 Submission
• This assignment is to be done individually or by a group of two students. You are
encouraged to discuss the high-level design of your solution with your classmates but
you must implement the program on your own. Academic dishonesty such as copying
another student’s work or allowing another student to copy your work, is regarded as a
serious academic offence.
• Each submission consists of two files: a source program file (.cpp file) and a text file (.txt
file) containing the table outputted by your program and the text corpus.
• Write down your name(s), eid(s), student ID(s), the command line to compile and run
your program in the beginning of your program as comments.
• Use your student ID(s) to name your submitted files, such as 5xxxxxxx.cpp and
5xxxxxxx.txt for individual submission, or 5xxxxxxx_5yyyyyyy.cpp and
5xxxxxxx_5yyyyyyy.txt for group submission. You may ignore the version number
appended by Canvas to your files. Only one submission is required for each group.
• Submit the files to Canvas. As far as you follow the above submission procedure, there
is no need to add comment to repeat your information in Canvas.
• The deadline is 11:00am, 11-MAR-2021 (Thu). No late submission will be accepted.
8 Questions?
• This is not a programming course. You are encouraged to debug the program on your
own first.
• If you have any question, please submit your question to Mr Wu Wei via the Discussion
board “Programming Assignment #2” on Canvas.
• To avoid possible plagiarism, do not post your source code on the Discussion board.
• If necessary, you may also contact Mr Wu Wei at weiwu56-c@my.cityu.edu.hk.
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做ceng0013 design of a pro...
2024-11-13
代做mech4880 refrigeration a...
2024-11-13
代做mcd1350: media studies a...
2024-11-13
代写fint b338f (autumn 2024)...
2024-11-13
代做engd3000 design of tunab...
2024-11-13
代做n1611 financial economet...
2024-11-13
代做econ 2331: economic and ...
2024-11-13
代做cs770/870 assignment 8代...
2024-11-13
代写amath 481/581 autumn qua...
2024-11-13
代做ccc8013 the process of s...
2024-11-13
代写csit040 – modern comput...
2024-11-13
代写econ 2070: introduc2on t...
2024-11-13
代写cct260, project 2 person...
2024-11-13
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!