Uploaded by cotat40286

Assignment

advertisement
EE514 Assignment 2023/2024
Release Date = 16-10-2023
Due Date: 01-12-2023
Total Marks: 25
This assignment is designed for students to demonstrate their ability of performing
machine learning and data analysis with an aim to gain hands-on experience.
There are two datasets provided for this assignment. The first dataset consists of
sensory data of human activity and the second dataset contains a set of images of
Mars surface. Details of each dataset are provided in the later part of this
assignment description and both datasets are well known benchmarks for data
analysis and machine learning techniques.
Assignment Tasks:
For each of the datasets you are required to attempt the following tasks;
1. Briefly describe the dataset and apply different pre-processing techniques.
2. Perform exploratory data analysis by generating relevant summary statistics
(if relevant) and visualisation techniques.
3. Define objectives of the problem you wish to solve. You can select one or two
of the challenges for each dataset and clearly explain your objectives for
these challenges and their relevance to the datasets. Your are recommended
but not restricted to;
a. For ExtraSensory Dataset you can choose activity detection (except
the walking activity detection for which the relevant code is already
available on their webpage) or any of the open challenges listed on
their webpage http://extrasensory.ucsd.edu/ .
b. For Mars Surface Image Dataset you can select image classification,
clustering or pattern recognition approaches.Dataset available at:
https://www.kaggle.com/datasets/brsdincer/mars-surface-and-curiosityimage-set-nasa
4. Perform model selection and train your datasets using any or combination of
supervised, unsupervised machine learning models and/or neural networks.
5. Evaluate performance of your trained model using different error detection or
accuracy measurement techniques.
6. Revisit your results and try to improve (if required) the evaluation results with
model reselection or hyperparameters fine tuning.
7. Prepare a Jupyter Notebook or Google Collab and submit your ipynb file with
any additional data, required resources and installation/execution guidelines.
8. Prepare a comprehensive report on your approach (approx. 20-30 pages) and
explain how you performed all of the above listed tasks including your own
interpretation of the data analysis and machine learning outcomes.
Dataset 1: Extrasensory Dataset
Our first dataset is using sensor data to predict human activity and is based on the
ExtraSensory dataset, created by Ph.D. students and staff at the Department of
Electrical and Computer Engineering, University of California, San Diego. You can
read more about the dataset on their website http://extrasensory.ucsd.edu/. A copy
of the dataset will be made available on the loop.
Dataset 2: Mars Surface Image (Curiosity Rover) Labelled Dataset
Our second dataset is an image dataset containing 6691 images of the Mars surface
collected by the Mars Science Laboratory (MSL, Curiosity). The dataset has labels
and spans 24 classes. The provided dataset has low resolution images of roughly
256 X 256 pixels each. There is a high resolution version of the dataset available,
however for this assignment you should only use this low resolution image dataset.
You
can
get
further
details
of
the
dataset
at their website:
https://data.nasa.gov/Space-Science/Mars-surface-image-Curiosity-rover-labeled-dat
a-se/cjex-ucks. A copy of the dataset will be made available on the loop.
Submission Guidelines
The final submission should be submitted on the loop before the due deadline. A
final submission should contain;
1. A report (PDF or Word document) documenting all assumptions, design
decisions, and findings. Include visualisations, plots, and tables. You should
strive to make your work completely reproducible using only the report
document: include details on everything you tested and all results. Document
and justify all design decisions.
2. Two or more separate notebooks (at least one for each dataset) containing all
of your relevant code. In case you have tested multiple versions of the
datasets (e.g. after cleansing) or trained different models for the same task,
do not overwrite the previous code. You can either submit multiple versions of
the notebook or separate code sections for each iteration/repetition of any
task.
3. Any additional resources (e.g. datasets or installation, preparation or
execution guidelines) bundled as archive file(s) e.g. zip. If your zip folder size
is greater than 250MB, you have to break it down into multiple zip files
ensuring each file is less than 250MB so it can be uploaded on the loop.
Important: do NOT include code as images (e.g. screenshots of code) in your
report. Include code snippets as text.
Compute platforms
Fitting deep neural networks is computationally expensive and is greatly accelerated
by the use of GPU hardware. Most modern deep learning frameworks support
acceleration via NVIDIA’s CUDA toolkit. If you have a desktop or laptop with a
high-end NVIDIA graphics card that has CUDA support and sufficient on-board
memory, you should be able to train deep networks on your own machine.
Unfortunately, the GPUs typically used in deep learning research are quite
expensive. Fortunately, there are some excellent online platforms that offer access to
NVIDIA GPU hardware for free for small projects and demonstration purposes. Two
such environments are Google Colab and Gradient. These platforms allow you to
interactively edit and execute Jupyter notebooks on cloud GPUs and support all the
most popular deep learning frameworks (e.g. PyTorch and TensorFlow).
Plagiarism
Please read and strictly adhere to the DCU Academic Integrity and Plagiarism Policy.
Note that reports are automatically checked against each other and against external
web sources for plagiarism. Any suspected plagiarism will be treated seriously and
may result in penalties and a zero grade (see Sec 6.2 of the DCU Academic Integrity
and Plagiarism Policy). You are not allowed to copy any code (in full or parts) from
jupyter notebooks provided by the dataset owner or third party for their own analysis.
In case of suspected code plagiarism, you will be asked for an interview to
demonstrate your programming skills and ability to code independently.
Grading
The assignment is worth 25% of the overall mark for the module. Marks will be
awarded based on the quality of the resulting report. In particular, I will be checking
to see if you are handling data correctly, carrying out exploratory analysis to gain
insights, correctly performing model selection, and critically, documenting everything
in a clear and concise way. The submitted code will also be checked to ensure that
the work is your own.
Download