首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
代写Java程序|代做Database|代做Matlab程序|代写SPSS
项目预算:
开发周期:
发布时间:
要求地区:
CISC3025代写、代做c++,Java程序设计
University of Macau
CISC3025 - Natural Language Processing
Project#3, 2023/2024
(Due date: 18th April)
Person Name ('Named Entity') Recognition
This is a group project with two students at most. You need to enroll in a group here. In this project,
you will be building a maximum entropy model (MEM) for identifying person names in newswire
texts (Label=PERSON or Label=O). We have provided all of the machinery for training and testing
your MEM, but we have left the feature set woefully inadequate. Your job is to modify the code
for generating features so that it produces a much more sensible, complete, and higher-performing
set of features.
NOTE: In this project, we expect you to design a web application for demonstrating your final
model. You need to design a web page that provides at least such a simple function: 1) User inputs
sentence; 2) Output the named entity recognition results. Of course, more functionalities in your
web application are highly encouraged. For example, you can integrate the previous project’s work,
i.e., text classification, into your project (It would be very cool!).
You NEED to submit:
• Runnable program
o You need to implement a Named Entity Recognition model based on the given starter
codes
• Model file
o Once you have finished the designing of your features and made it functions well, it
will dump a model file (‘model.pkl’) automatically. We will use it to evaluate
your model.
• Web application
o You also need to develop a web application (freestyle, no restriction on programming
languages) to demonstrate your NER model or even more NLP functions.
o Obviously, you need to learn how to call your python project when building the web
application.
• Report
o You should finish a report to introduce your work on this project. Your report should
contain the following content:
§ Introduction;
§ Description of the methods, implementation, and additional consideration to
optimize your model;
§ Evaluations and discussions about your findings;
2
§ Conclusion and future work suggestions.
• Presentation
o You need to give a 8-minute presentation in the class to introduce your work followed
by a 3-minute Q&A section. The content of the presentation may refer to the report.
Starter Code
In the starter code, we have provided you with three simple starter features, but you should be able
to improve substantially on them. We recommend experimenting with orthographic information,
gazetteers, and the surrounding words, and we also encourage you to think beyond these
suggestions.
The file you will be modifying is MEM.py
Adding Features to the Code
You will create the features for the word at the given position, with the given previous label. You
may condition on any word in the sequence (and its relative position), not just the current word
because they are all observed. You may not condition on any labels other than the previous one.
You need to give a unique name for each feature. The system will use this unique name in training
to set the weight for that feature. At the testing time, the system will use the name of this feature
and its weight to make a classification decision.
Types of features to include
Your features should not just be the words themselves. The features can represent any property of
the word, context, or additional knowledge.
For example, the case of a word is a good predictor for a person's name, so you might want to add
a feature to capture whether a given word was lowercase, Titlecase, CamelCase, ALLCAP, etc.
def features(self, words, previous_label, position):
features = {}
""" Baseline Features """
current_word = words[position]
features['has_(%s)' % current_word] = 1
features['prev_label'] = previous_label
if current_word[0].isupper():
features['Titlecase'] = 1
#===== TODO: Add your features here =======#
#...
#=============== TODO: Done ================#
return features
3
Imagine you saw the word “Jenny”. In addition to the feature for the word itself (as above), you
could add a feature to indicate it was in Title case, like:
You might encounter an unknown word in the test set, but if you know it begins with a capital letter
then this might be evidence that helps with the correct prediction.
Choosing the correct features is an important part of natural language processing. It is as much art
as science: some trial and error is inevitable, but you should see your accuracy increasing as you
add new types of features.
The name of a feature is not different from an ID number. You can use assign any name for a
feature as long as it is unique. For example, you can use “case=Title” instead of “Titlecase”.
Running the Program
We have provided you with a training set and a development set. We will be running your programs
on an unseen test set, so you should try to make your features as general as possible. Your goal
should be to increase F1 on the dev set, which is the harmonic mean of the precision and the recall.
You can use three different command flags (‘-t’, ‘-d’, ‘-s’) to train, test, and show respectively.
These flags can be used independently or jointly. If you run the program as it is, you should see the
following training process:
Afterward, it can print out your score on the dev set.
You can also give it an additional flag, -s, and have it show verbose sample results. The first column
is the word, the last two columns are your program's prediction of the word’s probability to be
$ python run.py -d
Testing classifier...
f_score = 0.8715
accuracy = 0.9641
recall = 0.7143
precision = 0.9642
if current_word[0].isupper():
features['Titlecase'] = 1
$ cd NER
$ python run.py -t
Training classifier...
==> Training (5 iterations)
Iteration Log-Likelihood Accuracy
---------------------------------------
1 -0.69315 0.055
2 -0.09383 0.946
3 -0.08134 0.968
4 -0.07136 0.969
Final -0.06330 0.969
4
PERSON or O. The star ‘*’ indicates the gold result. This should help you do error analysis and
properly target your features.
Where to make your changes?
1. Function ‘features()’ in MEM.py
2. You can modify the “Customization” part in run.py in order to debug more efficiently and
properly. It should be noted that your final submitted model should be trained under at least 20
iterations.
3. You may need to add a function “predict_sentence( )” in class MEM( ) to output predictions
and integrate with your web applications.
Changes beyond these, if you choose to make any, should be done with caution.
Grading
The assignment will be graded based on your codes, reports, and most importantly final
presentation.
$ python run.py -s
Words P(PERSON) P(O)
----------------------------------------
EU 0.0544 *0.9456
rejects 0.0286 *0.9714
German 0.0544 *0.9456
call 0.0286 *0.9714
to 0.0284 *0.9716
boycott 0.0286 *0.9714
British 0.0544 *0.9456
lamb 0.0286 *0.9714
. 0.0281 *0.9719
Peter *0.4059 0.5941
Blackburn *0.5057 0.4943
BRUSSELS 0.4977 *0.5023
1996-08-22 0.0286 *0.9714
The 0.0544 *0.9456
European 0.0544 *0.9456
Commission 0.0544 *0.9456
said 0.0258 *0.9742
on 0.0283 *0.9717
Thursday 0.0544 *0.9456
it 0.0286 *0.9714
#====== Customization ======
BETA = 0.5
MAX_ITER = 5 # max training iteration
BOUND = (0, 20) # the desired position bound of samples
#==========================
5
Tips
• Start early! This project may take longer than the previous assignments if you are aiming for
the perfect score.
• Generalize your features. For example, if you're adding the above "case=Title" feature, think
about whether there is any pattern that is not captured by the feature. Would the "case=Title"
feature capture "O'Gorman"?
• When you add a new feature, think about whether it would have a positive or negative weight
for PERSON and O tags (these are the only tags for this assignment).
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做ceng0013 design of a pro...
2024-11-13
代做mech4880 refrigeration a...
2024-11-13
代做mcd1350: media studies a...
2024-11-13
代写fint b338f (autumn 2024)...
2024-11-13
代做engd3000 design of tunab...
2024-11-13
代做n1611 financial economet...
2024-11-13
代做econ 2331: economic and ...
2024-11-13
代做cs770/870 assignment 8代...
2024-11-13
代写amath 481/581 autumn qua...
2024-11-13
代做ccc8013 the process of s...
2024-11-13
代写csit040 – modern comput...
2024-11-13
代写econ 2070: introduc2on t...
2024-11-13
代写cct260, project 2 person...
2024-11-13
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!