T-SNE ASSIGNMENT Research Question: I intent to re-run the problem of deciding to invest in a Credit card business. The objective was to find out if a person subscribed to a credit card would repay the bill or not using SVMs. I will be using t-SNE algorithm to perform a dimensionality reduction and then use SVM as a classifier in this assignment. Methodology: t-SNE is a technique used to visualize complex, high-dimensional data by reducing it to two or three dimensions. This helps to reveal hidden patterns and relationships in the data that are difficult to see in higher dimensions. These are the key ideas in t-SNE algorithm: High-Dimensional Data: Imagine you have a dataset where each item has many attributes (e.g., customer data with age, income, spending habits, etc.). This is called high-dimensional data because each item is described by many features. Dimensionality Reduction: t-SNE reduces the number of dimensions while preserving the structure and relationships within the data. It transforms the highdimensional data into a low-dimensional map (usually 2D or 3D) that is easier to visualize and understand. Preserving Relationships: The key idea is to keep similar items close together and dissimilar items far apart in the low-dimensional space, much like they are in the high-dimensional space. This means if two customers have similar behaviours, they will be close on the t-SNE map. The below are the steps of t-SNE. Calculate Pairwise Similarities: t-SNE first measures how similar each pair of data points is in the high-dimensional space. Similarity is often measured using a probability distribution that reflects the likelihood that two points are neighbors. Map to Low Dimensions: It then creates a similar map in the low-dimensional space, trying to maintain these similarities. This involves placing points in the 2D or 3D space such that similar points remain close together. Optimize the Map: The algorithm iteratively adjusts the positions of the points in the low-dimensional space to best match the similarities from the highdimensional space. This is done by minimizing a mathematical function that measures the difference between the two sets of similarities. Output: The result is a 2D or 3D plot where points that were close in the original high-dimensional data remain close, and those that were far apart remain far apart. This visual representation makes it easier to see clusters and patterns. The dimensionality of the data was reduced to both 2 and 3 using t-SNE and then applied SVC with gamma = 0.001. K-Fold validation was used, and the accuracy measured was the averaged accuracy over all folds. Results: The previous result showed us that we shouldn't invest in the company as the accuracy is 78.14 % which is less than 85%. 77.88 % was the observed accuracy for the both 2D and 3D reduced data (after the application of t-SNE) which is less than 85%. Hence, we shouldn’t invest in the company. Answer: The results have changed after the application of t-SNE for the worse but the ultimate decision should remain unchanged, that is we shouldn’t invest in the company.