MATH 3220 Assignment #6 - Self-Organizing Map Exercise [Messy Data version] This exercise will allow you to experiment with a multivariate clustering algorithm. The algorithm that we’ll use is the Self-Organizing Map (SOM) network. The SOM is an artificial neural network algorithm that maps multivariate data into a two-dimensional grid. The resulting map has the property that there is a strong correlation between proximity of the map nodes and similarity of the vectors associated with these nodes. For this assignment you will use the Diabetes.xls dataset as your training data [use the dataset posted on the course website – there are other versions on the web]. You must design your own experiment to answer the following questions: How effective is the SOM algorithm in clustering data? Is this effectiveness sensitive to scalar differences in the data vector attributes? How tolerant is the SOM algorithm to noise? The Diabetes dataset is a complicating issue in this assignment. This dataset contains the values of eight attributes for individuals classified as either healthy or sick based on a medical study conducted in 1994. Several of the attributes have missing or incorrect data. You will have to preprocess the data prior to training SOM. You can consult with medical research literature to determine relevant or irrelevant attributes. This research can help guide your data preprocessing methodology. Write a COMPLETE Analysis Report describing your analysis and recommendations. Your research report is due in two weeks. References: Self-Organizing Map: http://en.wikipedia.org/wiki/Self_organizing_map Diabetes: http://en.wikipedia.org/wiki/Diabetes Diabetes dataset: http://archive.ics.uci.edu/ml/datasets/Diabetes