Department of Computer Science
Summative Coursework Set Front Page
Module Title
|
Text Mining and Natural Language Processing
|
Module Code
|
CS3TM
|
Type of Assignment
(e.g., technical report, set exercise, in-class test)
|
Technical report
|
Individual or Group Assignment
|
Individual
|
Weighting of the Assignment
|
50%
|
Word count/page limit
|
12 pages (excluding appendix)
|
Expected hrs spent for the assignment (set by lecturer)
|
8 hours
|
Items to be submitted
|
Individual report in PDF with commented code
|
Work to be submitted on-line via Blackboard Ultra by
|
20/5/2025 noon
|
Work will be marked and returned by
|
15 working days after the submission deadline
|
Artificial Intelligence Tools (select one of these)
|
May not be used
|
Note
By submitting this work, you are certifying that you have read the assessment guidelines, which are displayed in the folder of Assessment on the Blackboard course for this module, and that you have conformed to and understand the associated policies and practices, including those on:
• Submitting your own work, not that of other people or systems (including those using artificial intelligence), and the associated penalties for Academic Misconduct
• Submitting by the specified deadline, and the penalties associated with late submission (if allowed)
• The exceptional circumstances system
• For students with relevant needs, attaching with a green sticker
|
1. Assessment classifications
First Class (>= 70%)
|
The coursework demonstrates:
· Exceptional understanding of the principles of natural language processing
· Solid knowledge of used techniques/algorithms for text processing and excellent technique skills in implementing these algorithms.
· Comprehensive analysis of results from the implemented algorithms
· Excellent presentation of the report
|
Upper Second (60-69%)
|
The coursework demonstrates:
· Good understanding of the principles of natural language processing
· Appropriate use of techniques/algorithms for text processing and good technique skills in implementing these algorithms.
· Good technical skills in implementing these algorithms with good result analysis.
· Clear presentation of the report
|
Lower Second (50-59%)
|
The coursework demonstrates:
· Basic understanding of the principles of natural language processing
· Basic use of algorithms in implementing these algorithms.
· Moderate technical skills in implementation
· Clear presentation of the report
|
Third (40-49%)
|
The coursework demonstrates:
· Satisfactory understanding of the principles of natural language processing
· Satisfactory use of algorithms in implementing these algorithms.
· Satisfactory technical skills in implementation.
|
Pass (35-39%)
|
The coursework demonstrates:
· Satisfactory understanding of the principles of natural language processing
· Satisfactory knowledge to implementing these algorithms.
|
Fail (0-34%)
|
The coursework fails to demonstrate understanding of NLP processing techniques and skills in implementing these techniques.
|
2. Assignment description
Summary:
A technical report is required. Please refer to report structure and marking scheme below.
This report should describe related concepts of text mining and NLP techniques, and the experimental results in two tasks. The experiments include:
Task 1 Apply NLP analysis methods of linguistic level including morphology, lexicon, syntax, and semantics to process text inputs and extract features.
Task 2 Training a Logistic regression classifier or other classifiers, based on two Newsgroups and predict the group label of your own two class data set.
· Use tf-idf weighted unigram bag-of-words model as baseline model.
· Add more text extraction methods (optional)
A skeleton code is provided in Blackboard Week 7 folder to assist your implementation. The original code (with detailed comments) should be attached at the end of the report as an appendix.
You will have own version of scikit-learn 20 newsgroups text dataset by typing student number at the beginning of the skeleton code below. You need to modify the code accordingly to achieve the two tasks above.
You will download your two Newsgroups based on your student number.