B.Ramamurthy

advertisement
B.Ramamurthy
Intuition/
understand
ing
*
EDA
Data
*
Big-data
analytics
StatsAlgs
Discoveries
/
intelligence
Statistical
Inference
Decisions/
Answers/
Results


1.
2.
3.
Pipelines to prepare data
Three types:
Data preparation algorithms such as sorting,
workflows
Optimization algorithms stochastic gradient
descent, least squares…
Machine learning algorithms…







Comes from Artificial Intelligence
No underlying generative process
Build to predict or classify something
Three basic algorithms:
linear regression, k-nn, k-means
We already looked at linear regression as a
case study for R/Rstudio
We will start with k-means…






K-means is unsupervised: no prior knowledge of
the “right answer”
Goal of the algorithm is to determine the
definition of the right answer by finding clusters
of data
Kind of satisfaction survey data, incident report
data,
Assume data {age, gender, income, state,
household, size}, your goal is to segment the
users.
K-means is the simplest of the clustering
algorithms.
Lets understand kmeans using an example.






{Age, income range, education, skills, social, paid
work}
Lets take just the age { 23, 25, 24, 23, 21, 31,
32, 30,31, 30, 37, 35, 38, 37, 39, 42, 43, 45, 43,
45}
Classify this data using K-means
Lets assume K = 3 or 3 groups
Give me a guess of the centroids? Lets assume
initial value of centroids to {21, 30, 40}
First lets hand calculate and then use R-Studio






Supervised ML
You know the “right answers” or at least data that
is “labeled”: training set
Set of objects have been classified or labeled
(training set)
Another set of objects are yet to be labeled or
classified (test set)
Your goal is to automate the processes of
labeling the test set.
Intuition behind k-NN is to consider most
similar items --- similarity defined by their
attributes, look at the existing label and assign
the object a label.
Age
Loan (X1000)
Default
25
40
N
35
60
N
45
80
N
20
20
N
35
120
N
52
18
Y
23
95
Y
40
62
Y
60
100
Y
48
220
Y
33
150
Y



K = 3, whether you can lend money to a
person age 48 requesting a loan amount of
142K
K=5, repeat the same.
We need lot more data for the application of
K-NN.
Download