IEOR165 Discussion Week 7 Sheng Liu Mar 3, 2016

advertisement
IEOR165 Discussion
Week 7
Sheng Liu
University of California, Berkeley
Mar 3, 2016
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Outline
1 Nadaraya-Watson Estimator
2 Partially Linear Model
3 Support Vector Machine
IEOR165 Discussion
Sheng Liu
2
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Nadaraya-Watson Estimator: Motivation
Given the data {(X1 , Y1 ), . . . , (Xn , Yn )}, we want to find the relationship
between Yi (response) and Xi (predictor):
Yi = g(Xi ) + i
Basically, we find to find g(xi ), which is
g(x) = E(Yi |Xi = x)
where we assume E(i |Xi ) = 0.
Parametric methods: assume the form of g(x), e.g. linear,
polynomial, exponential... Then we only need to estimate the
parameters that characterize g(x).
Nonparametric methods: make no assumptions about the form of
g(x), which implies the number of parameters is infinite...
IEOR165 Discussion
Sheng Liu
3
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
An Intuitive Nonparametric Method
K-nearest neighbor average:
ĝ(x) =
=
=
=
1
k
X
Yi
Xi ∈Nk (x)
n
X
I(Xi ∈ Nk (x))
· Yi
k
i=1
n
X
I(X ∈ Nk (x))
Pn i
· Yi
i=1 I(Xi ∈ Nk (x))
i=1
Pn
i=1 I(Xi ∈ Nk (x)) · Yi
P
n
i=1 I(Xi ∈ Nk (x))
You can compare this with our intuitive method for estimating pdf
Not continuous and indifferentiable
IEOR165 Discussion
Sheng Liu
4
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Nadaraya-Watson Estimator
To reach smoothness, replace the indicator function by kernel function
Pn
Xi −x
i=1 K( h ) · Yi
ĝ(x) = P
n
Xi −x
i=1 K( h )
Then we get the Nadaraya-Watson Estimator.
Weighted average: kernel weights
Relation to kernel density estimate
n
Xi − x
1 X
fˆ(x) =
K(
)
nh i=1
h
n
1 X
Yi − y
Xi − x
fˆ(x, y) =
)K(
)
K(
nh2 i=1
h
h
And by the formula
Z
g(x) = E(Y |X = x) =
IEOR165 Discussion
Z
yf (y|x)dy =
Sheng Liu
y
f (x, y)
dy
f (x)
5
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Nadaraya-Watson Estimator
IEOR165 Discussion
Sheng Liu
6
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Partially Linear Model
A compromise between parametric model and nonparametric model
Yi = Xi β + g(Zi ) + i
Predictor: (Xi , Zi )
A parametric part: linear model
A nonparametric part: g(Zi )
Estimation:
E(Yi |Zi ) = E(Xi |Zi )β + g(Zi )
⇒ Yi − E(Yi |Zi ) = (Xi − E(Xi |Zi ))β + For E(Yi |Zi ) and E(Xi |Zi ), use Nadaraya-Watson Estimators
IEOR165 Discussion
Sheng Liu
7
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Classification
When our response is qualitative, or categorical, predicting the response
is actually doing classification.
Predictor: Income and Balance (X)
Response: Default or not (Y)
IEOR165 Discussion
Sheng Liu
8
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Classification
How to classify? Draw a hyperplane X 0 β + β0 = 0
IEOR165 Discussion
Sheng Liu
9
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Support Vector Machine
How to find the optimal hyperplane X 0 β + β0 = 0? when perfectly
separable
Objective: Maximize margin = Minimize ||β||
Constraints: separate the data correctly
IEOR165 Discussion
Sheng Liu
10
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Support Vector Machine
If Xi0 β + β0 ≥ 1, yi = 1
If Xi0 β + β0 ≤ −1, yi = −1
So the constraint is
yi (Xi0 β + β0 ) ≥ 1
∀i = 1, . . . , n
The optimization problem is
min ||β||
β,β0
s.t. yi (Xi0 β + β0 ) ≥ 1
∀i = 1, . . . , n
or
min ||β||2
β,β0
s.t. yi (Xi0 β + β0 ) ≥ 1
IEOR165 Discussion
∀i = 1, . . . , n
Sheng Liu
11
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Support Vector Machine
When not perfectly separable
Objective: Minimize ||β|| and errors
Constraints: separate the data correctly with possible errors
IEOR165 Discussion
Sheng Liu
12
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
Support Vector Machine
When not perfectly separable, the optimization problem is
min ||β||2 + λ
β,β0
n
X
ξi
i=1
s.t. yi (Xi0 β + β0 ) ≥ 1 − ξi
ξi ≥ 0
∀i = 1, . . . , n
∀i = 1, . . . , n
How to find the dual function? Recall the Lagrangian dual and KKT
conditions.
IEOR165 Discussion
Sheng Liu
13
Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine
References
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An
introduction to statistical learning (Vol. 112). New York: springer.
Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The
elements of statistical learning: data mining, inference and
prediction. The Mathematical Intelligencer, 27(2), 83-85.
Bruce E. Hansen, Lecture Notes on Nonparametrics http:
//www.ssc.wisc.edu/~bhansen/718/NonParametrics1.pdf
IEOR165 Discussion
Sheng Liu
14
Download