EE514 Assignment 2023/2024 Release Date = 16-10-2023 Due Date: 01-12-2023 Total Marks: 25 This assignment is designed for students to demonstrate their ability of performing machine learning and data analysis with an aim to gain hands-on experience. There are two datasets provided for this assignment. The first dataset consists of sensory data of human activity and the second dataset contains a set of images of Mars surface. Details of each dataset are provided in the later part of this assignment description and both datasets are well known benchmarks for data analysis and machine learning techniques. Assignment Tasks: For each of the datasets you are required to attempt the following tasks; 1. Briefly describe the dataset and apply different pre-processing techniques. 2. Perform exploratory data analysis by generating relevant summary statistics (if relevant) and visualisation techniques. 3. Define objectives of the problem you wish to solve. You can select one or two of the challenges for each dataset and clearly explain your objectives for these challenges and their relevance to the datasets. Your are recommended but not restricted to; a. For ExtraSensory Dataset you can choose activity detection (except the walking activity detection for which the relevant code is already available on their webpage) or any of the open challenges listed on their webpage http://extrasensory.ucsd.edu/ . b. For Mars Surface Image Dataset you can select image classification, clustering or pattern recognition approaches.Dataset available at: https://www.kaggle.com/datasets/brsdincer/mars-surface-and-curiosityimage-set-nasa 4. Perform model selection and train your datasets using any or combination of supervised, unsupervised machine learning models and/or neural networks. 5. Evaluate performance of your trained model using different error detection or accuracy measurement techniques. 6. Revisit your results and try to improve (if required) the evaluation results with model reselection or hyperparameters fine tuning. 7. Prepare a Jupyter Notebook or Google Collab and submit your ipynb file with any additional data, required resources and installation/execution guidelines. 8. Prepare a comprehensive report on your approach (approx. 20-30 pages) and explain how you performed all of the above listed tasks including your own interpretation of the data analysis and machine learning outcomes. Dataset 1: Extrasensory Dataset Our first dataset is using sensor data to predict human activity and is based on the ExtraSensory dataset, created by Ph.D. students and staff at the Department of Electrical and Computer Engineering, University of California, San Diego. You can read more about the dataset on their website http://extrasensory.ucsd.edu/. A copy of the dataset will be made available on the loop. Dataset 2: Mars Surface Image (Curiosity Rover) Labelled Dataset Our second dataset is an image dataset containing 6691 images of the Mars surface collected by the Mars Science Laboratory (MSL, Curiosity). The dataset has labels and spans 24 classes. The provided dataset has low resolution images of roughly 256 X 256 pixels each. There is a high resolution version of the dataset available, however for this assignment you should only use this low resolution image dataset. You can get further details of the dataset at their website: https://data.nasa.gov/Space-Science/Mars-surface-image-Curiosity-rover-labeled-dat a-se/cjex-ucks. A copy of the dataset will be made available on the loop. Submission Guidelines The final submission should be submitted on the loop before the due deadline. A final submission should contain; 1. A report (PDF or Word document) documenting all assumptions, design decisions, and findings. Include visualisations, plots, and tables. You should strive to make your work completely reproducible using only the report document: include details on everything you tested and all results. Document and justify all design decisions. 2. Two or more separate notebooks (at least one for each dataset) containing all of your relevant code. In case you have tested multiple versions of the datasets (e.g. after cleansing) or trained different models for the same task, do not overwrite the previous code. You can either submit multiple versions of the notebook or separate code sections for each iteration/repetition of any task. 3. Any additional resources (e.g. datasets or installation, preparation or execution guidelines) bundled as archive file(s) e.g. zip. If your zip folder size is greater than 250MB, you have to break it down into multiple zip files ensuring each file is less than 250MB so it can be uploaded on the loop. Important: do NOT include code as images (e.g. screenshots of code) in your report. Include code snippets as text. Compute platforms Fitting deep neural networks is computationally expensive and is greatly accelerated by the use of GPU hardware. Most modern deep learning frameworks support acceleration via NVIDIA’s CUDA toolkit. If you have a desktop or laptop with a high-end NVIDIA graphics card that has CUDA support and sufficient on-board memory, you should be able to train deep networks on your own machine. Unfortunately, the GPUs typically used in deep learning research are quite expensive. Fortunately, there are some excellent online platforms that offer access to NVIDIA GPU hardware for free for small projects and demonstration purposes. Two such environments are Google Colab and Gradient. These platforms allow you to interactively edit and execute Jupyter notebooks on cloud GPUs and support all the most popular deep learning frameworks (e.g. PyTorch and TensorFlow). Plagiarism Please read and strictly adhere to the DCU Academic Integrity and Plagiarism Policy. Note that reports are automatically checked against each other and against external web sources for plagiarism. Any suspected plagiarism will be treated seriously and may result in penalties and a zero grade (see Sec 6.2 of the DCU Academic Integrity and Plagiarism Policy). You are not allowed to copy any code (in full or parts) from jupyter notebooks provided by the dataset owner or third party for their own analysis. In case of suspected code plagiarism, you will be asked for an interview to demonstrate your programming skills and ability to code independently. Grading The assignment is worth 25% of the overall mark for the module. Marks will be awarded based on the quality of the resulting report. In particular, I will be checking to see if you are handling data correctly, carrying out exploratory analysis to gain insights, correctly performing model selection, and critically, documenting everything in a clear and concise way. The submitted code will also be checked to ensure that the work is your own.