ECE 456/556 - Pattern Recognition and Machine Learning
Final Project
Due on 6 May 2020
** No late submissions accepted **
You have been learning about pattern recognition all term. We started the semester with a discussion of
the overall structure of a pattern recognition system. Feature selection and feature spaces were
explored in homeworks and were recurring themes during the semester. For your programming projects
you used moment features for an optical character recognition application and implemented 3
classifiers to see how each classifier worked as well as how the options for each classifier affected the
classification performance.
For the final project you will expand on this base knowledge to do an alternate exploration. This should
give you a chance to collect and reflect on what you have learned in this course and show your
understanding of the underlying concepts. The final project will focus on a cybersecurity application.
You should have working code written for the Nearest Neighbor classifier, the Bayesian Classifier and
the Neural Network. For this project you will go back through these classifiers (reusing code, possibly
with corrections), and evaluate the performance of each on a new dataset. You need to choose the
features you will use, the parameters for your classifier(s), and work to achieve the best performance on
the test set among all teams in the course. You will write a report summarizing these studies, and giving
a convincing argument as to why you chose the classifier you did.
Data:
Dr Loo gave you a paper describing the dataset, and a brief overview of computer network security.
There are 10 classes: Normal and nine types of attacks, {Fuzzers, Analysis, Backdoors, DoS, Exploits,
Generic, Reconnaissance, Shellcode and Worms}.
The ~50 features are mostly described in the UNSW-NB15_features.csv file. Choose which features you
want to use, and justify your choice.
You are to use the UNSW_NB15_training-set.csv Training Data for training and after you have chosen
your classifier and its parameters, then (and not until then) use the UNSW_NB15_testing-set.csv Test
data for testing. These are both available on that link. While you are doing development, you should
break up the training data into a training set and an evaluation set, and do so in an N-fold cross
validation structure (DHS 9.6.2 and lecture notes). You choose an N.
Assignment:
Design one classifier to produce two outputs, normal and attack. Design a second classifier to produce
10 outputs, normal and each of the 9 attacks.
Submit a report describing the problem, your choice of features and classifier(s), your experimental
method, and your results. As you will be working in teams, at the end of the report (in an appendix)
describe the contribution each of you made to the project and the writing.
Grading:
The project will be graded on the depth of the exploration, the writing describing the results attained.
Bonus points will go to the team with the best recognition performance on the test dataset.
Additional data about this scenario is available at
https://cloudstor.aarnet.edu.au/plus/s/ds5zW91vdgjEj9i?path=%2F
if you want (need) bonus points for working beyond the base assignment.