Uploaded by dasariujjwal2000

t-SNE & SVM for Credit Card Investment Analysis

advertisement
T-SNE ASSIGNMENT
Research Question:
I intent to re-run the problem of deciding to invest in a Credit card business. The
objective was to find out if a person subscribed to a credit card would repay the bill or
not using SVMs. I will be using t-SNE algorithm to perform a dimensionality reduction
and then use SVM as a classifier in this assignment.
Methodology:
t-SNE is a technique used to visualize complex, high-dimensional data by reducing it to
two or three dimensions. This helps to reveal hidden patterns and relationships in the
data that are difficult to see in higher dimensions.
These are the key ideas in t-SNE algorithm:



High-Dimensional Data: Imagine you have a dataset where each item has many
attributes (e.g., customer data with age, income, spending habits, etc.). This is
called high-dimensional data because each item is described by many features.
Dimensionality Reduction: t-SNE reduces the number of dimensions while
preserving the structure and relationships within the data. It transforms the highdimensional data into a low-dimensional map (usually 2D or 3D) that is easier to
visualize and understand.
Preserving Relationships: The key idea is to keep similar items close together
and dissimilar items far apart in the low-dimensional space, much like they are in
the high-dimensional space. This means if two customers have similar
behaviours, they will be close on the t-SNE map.
The below are the steps of t-SNE.



Calculate Pairwise Similarities: t-SNE first measures how similar each pair of
data points is in the high-dimensional space. Similarity is often measured using a
probability distribution that reflects the likelihood that two points are neighbors.
Map to Low Dimensions: It then creates a similar map in the low-dimensional
space, trying to maintain these similarities. This involves placing points in the 2D
or 3D space such that similar points remain close together.
Optimize the Map: The algorithm iteratively adjusts the positions of the points in
the low-dimensional space to best match the similarities from the highdimensional space. This is done by minimizing a mathematical function that
measures the difference between the two sets of similarities.

Output: The result is a 2D or 3D plot where points that were close in the original
high-dimensional data remain close, and those that were far apart remain far
apart. This visual representation makes it easier to see clusters and patterns.
The dimensionality of the data was reduced to both 2 and 3 using t-SNE and then
applied SVC with gamma = 0.001. K-Fold validation was used, and the accuracy
measured was the averaged accuracy over all folds.
Results:
The previous result showed us that we shouldn't invest in the company as the accuracy
is 78.14 % which is less than 85%.
77.88 % was the observed accuracy for the both 2D and 3D reduced data (after the
application of t-SNE) which is less than 85%. Hence, we shouldn’t invest in the
company.
Answer:
The results have changed after the application of t-SNE for the worse but the ultimate
decision should remain unchanged, that is we shouldn’t invest in the company.
Download