ENGI 46415
Artificial intelligence and Deep learning
2024-2025
Introduction
Welcome to the coursework on Artificial Intelligence and Deep Learning. In this assignment, you will engage with a locally gathered dataset, handled in strict accordance with ethical guidelines. This dataset is anonymised and labeled to support classification tasks distinguishing between healthy individuals and those diagnosed with Multiple Sclerosis (MS). Please remember, the dataset is provided solely for the purposes of this coursework and must not be used elsewhere.
Multiple Sclerosis and Its Relationship with the Eyes:
Multiple Sclerosis (MS) is a chronic autoimmune disorder that targets the central nervous system [1], often resulting in symptoms such as fatigue, muscle weakness, and coordination difficulties. The disease can also significantly impact visual function.
A common visual complication linked to MS is optic neuritis, an inflammation of the optic nerve that may lead to blurred vision, diminished color perception, and eye pain [2]. Early detection of optic neuritis can aid in the prompt diagnosis and management of MS.
Advancements in non-invasive imaging have transformed diagnostic practices. One such technology is Scanning Laser Ophthalmoscopy (SLO), which captures high-resolution retinal images using a low-intensity laser [3]. This method ensures both patient safety and comfort, offering a valuable tool for early identification of neurological conditions like MS.
Project Overview and Significance
This project is centred on enhancing the early detection of Multiple Sclerosis using retinal imaging data. By applying machine learning and deep learning approaches, the goal is to build models that can accurately differentiate between healthy subjects and those showing early MS-related signs.
Analysing retinal images with advanced algorithms offers a non-invasive pathway to identify ocular indicators of MS, supporting earlier diagnosis and intervention. Early detection is vital, as it enables timely treatment that may slow disease progression and improve patient quality of life.
Beyond its technical depth, the project highlights the transformative role of emerging technologies in healthcare. It provides a meaningful opportunity to apply computational methods to real-world medical challenges.
Throughout the coursework, you will engage with tasks such as supervised and unsupervised learning, image segmentation, deep learning, and data augmentation — all aimed at contributing to the broader effort of improving MS diagnostics.
The dataset:
The dataset comprises grayscale Scanning Laser Ophthalmoscope (SLO) images. For Tasks 1 and 2, these images have been preprocessed, and relevant numerical features describing blood vessel morphology are provided in the Excel file titled SLO_features.xlsx.
To ensure proper evaluation and prevent data leakage, the data must be split on a per-patient basis rather than per image. This ensures that no information from a single patient appears in both training and testing sets.
In Task 3, you will use the original SLO images, available in the archive named SLO_hc_MS.zip. During this task, it's important to account for laterality—whether the image is from the right or left eye—by appropriately handling image flipping as part of your preprocessing.
You will develop both machine learning and deep learning models to address the following tasks:
1. Task 1: Supervised Machine Learning and Dimensionality Reduction
Objective: In this task, your objective is to design a supervised machine learning model for classifying between normal and Multiple Sclerosis (MS) cases using the numerical features extracted from the scanning laser ophthalmoscope (SLO) images. Additionally, to explore the impact of dimensionality reduction on classification performance.
Requirements:
1-1. Selecting Models: Implement three supervised machine learning models including support vector machines, k-nearest neighbours, and neural networks. You may receive partial marks if you implement only one or two models.
1-2. Dimensionality Reduction: Apply three dimensionality reduction techniques to the data, including Principal Component Analysis (PCA), t-Distributed Stochastic Neighbour Embedding (t-SNE) and autoencoders. Using only one or two dimensionality reduction techniques may result in partial marks. Visualisations should be integrated within this step to understand the impact of dimensionality reduction on data distribution.
1-3. Model Comparison: Experiment with the three selected models and compare their performances. Utilise relevant metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Provide insights into why certain models perform. better than others for this specific task. You may receive partial marks if you compare fewer than three models.
1-4. Evaluation of Dimensionality Reduction: For the purpose of this evaluation, select the best-performing model, and test it using the mentioned three different dimensionality reduction techniques. Compare the performance metrics (accuracy, precision, recall, F1-score, ROC- AUC) across the three-dimensionality reduction methods. Analyse and discuss how dimensionality reduction affects the model's performance, highlighting its strengths and limitations.
By conducting this task, you'll gain a deeper understanding of how dimensionality reduction techniques impact the performance of your chosen model. This approach encourages a thorough exploration of the problem space while managing the workload effectively.
2. Task 2: Unsupervised Learning
Objective: In this task, your goal is to design an unsupervised learning model for clustering the data and compare its performance with the provided labels.
Requirements:
2-1. Clustering Algorithm: Choose an appropriate unsupervised clustering algorithm such as k-means.
2-2. Comparative Study: Perform. a comparative study between the clustering results and the ground truth labels. Use relevant metrics for evaluating clustering performance.
2-3. Discussion: Discuss the findings, highlighting any insights gained from the unsupervised clustering. Identify any discrepancies or agreements between clustering and the labelled classes.
3. Task 3: Convolutional Neural Network (CNN) for Disease Classification and Data Augmentation
Objective: Design a Convolutional Neural Network (CNN) architecture from scratch for classifying normal and Multiple Sclerosis (MS) cases from the scanning laser ophthalmoscope (SLO) images. Additionally, apply data augmentation techniques to assess their impact on the CNN's performance and use a pre-trained classifier to perform. fine-tuning.
Requirements:
3-1. Network Design and Hyperparameter Optimization: Create a custom CNN architecture, defining the number of layers, types of layers (e.g., convolutional, pooling), activation functions, and other architectural choices. Discuss the optimisation of hyperparameters and network design using techniques such as Optuna, grid search, or any other method of your choice.
3-2. Data Augmentation: Apply four data augmentation techniques (rotation, flipping, scaling, and adding noise) to increase the diversity of the dataset. It's important to test all of them (with different ranges) and describe their suitability for this task. If any augmentation is deemed unsuitable, provide a clear explanation and exclude it from further consideration or limit its range.
3-3. Performance Analysis and Metrics: Show the performance of the CNNs with learning curves and analyse these curves in detail to understand how the model's performance evolves during training. Calculate performance metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Analyse these metrics in the context of disease classification and discuss the impact of data augmentation on these metrics. Make sure to compare the performance before and after augmentation to highlight its effects.
3-4. Fine-Tuning with Pre-trained Model: Select one pre-trained classifier (VGG16), and fine-tune it for the disease classification task. Discuss the depth of freezing in the pre-trained model and why you made this choice. Evaluate the performance of the fine-tuned model using the same performance metrics and learning curve analysis.
This comprehensive approach combines the analysis of learning curves and the assessment of performance metrics, including a clear comparison of performance before and after data augmentation. It encourages a thorough evaluation of the CNN's performance and the impact of data augmentation on disease classification.
Report
You can use the report to explain the methods you have implemented and discuss the results. In particular, you must include answer to all requirements in Tasks 1 to 4, details of the design of the model or the choices made and your justification as well as any diagrams or quantitative evidences. Feel free to discuss any other aspect of your work that you consider interesting within the space limitations given above.
References
[1] A. Thompson, S. Baranzini, J. Geurts, B. Hemmer, and O. Ciccarelli, "Multiple sclerosis. Lancet (Lond, Engl) 391 (10130): 1622-1636," ed, 2018.
[2] R. C. Kenney et al., "The role of optical coherence tomography criteria and machine learning in multiple sclerosis and optic neuritis diagnosis," Neurology, vol. 99, no. 11, pp. e1100-e1112, 2022.
[3] J. Fischer, T. Otto, F. Delori, L. Pace, and G. Staurenghi, "Scanning laser ophthalmoscopy (SLO)," High resolution imaging in microscopy and ophthalmology: new frontiers in biomedical optics, pp. 35-57, 2019.
[4] C. Chen, J. H. Chuah, R. Ali, and Y. Wang, "Retinal vessel segmentation using deep learning: a review," IEEE Access, vol. 9, pp. 111985-112004, 2021.
[5] G. Du, X. Cao, J. Liang, X. Chen, and Y. Zhan, "Medical image segmentation based on u-net: A review," Journal of Imaging Science and Technology, 2020.