IEOR165 Discussion Week 7 Sheng Liu University of California, Berkeley Mar 3, 2016 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Outline 1 Nadaraya-Watson Estimator 2 Partially Linear Model 3 Support Vector Machine IEOR165 Discussion Sheng Liu 2 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Nadaraya-Watson Estimator: Motivation Given the data {(X1 , Y1 ), . . . , (Xn , Yn )}, we want to find the relationship between Yi (response) and Xi (predictor): Yi = g(Xi ) + i Basically, we find to find g(xi ), which is g(x) = E(Yi |Xi = x) where we assume E(i |Xi ) = 0. Parametric methods: assume the form of g(x), e.g. linear, polynomial, exponential... Then we only need to estimate the parameters that characterize g(x). Nonparametric methods: make no assumptions about the form of g(x), which implies the number of parameters is infinite... IEOR165 Discussion Sheng Liu 3 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine An Intuitive Nonparametric Method K-nearest neighbor average: ĝ(x) = = = = 1 k X Yi Xi ∈Nk (x) n X I(Xi ∈ Nk (x)) · Yi k i=1 n X I(X ∈ Nk (x)) Pn i · Yi i=1 I(Xi ∈ Nk (x)) i=1 Pn i=1 I(Xi ∈ Nk (x)) · Yi P n i=1 I(Xi ∈ Nk (x)) You can compare this with our intuitive method for estimating pdf Not continuous and indifferentiable IEOR165 Discussion Sheng Liu 4 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Nadaraya-Watson Estimator To reach smoothness, replace the indicator function by kernel function Pn Xi −x i=1 K( h ) · Yi ĝ(x) = P n Xi −x i=1 K( h ) Then we get the Nadaraya-Watson Estimator. Weighted average: kernel weights Relation to kernel density estimate n Xi − x 1 X fˆ(x) = K( ) nh i=1 h n 1 X Yi − y Xi − x fˆ(x, y) = )K( ) K( nh2 i=1 h h And by the formula Z g(x) = E(Y |X = x) = IEOR165 Discussion Z yf (y|x)dy = Sheng Liu y f (x, y) dy f (x) 5 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Nadaraya-Watson Estimator IEOR165 Discussion Sheng Liu 6 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Partially Linear Model A compromise between parametric model and nonparametric model Yi = Xi β + g(Zi ) + i Predictor: (Xi , Zi ) A parametric part: linear model A nonparametric part: g(Zi ) Estimation: E(Yi |Zi ) = E(Xi |Zi )β + g(Zi ) ⇒ Yi − E(Yi |Zi ) = (Xi − E(Xi |Zi ))β + For E(Yi |Zi ) and E(Xi |Zi ), use Nadaraya-Watson Estimators IEOR165 Discussion Sheng Liu 7 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Classification When our response is qualitative, or categorical, predicting the response is actually doing classification. Predictor: Income and Balance (X) Response: Default or not (Y) IEOR165 Discussion Sheng Liu 8 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Classification How to classify? Draw a hyperplane X 0 β + β0 = 0 IEOR165 Discussion Sheng Liu 9 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Support Vector Machine How to find the optimal hyperplane X 0 β + β0 = 0? when perfectly separable Objective: Maximize margin = Minimize ||β|| Constraints: separate the data correctly IEOR165 Discussion Sheng Liu 10 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Support Vector Machine If Xi0 β + β0 ≥ 1, yi = 1 If Xi0 β + β0 ≤ −1, yi = −1 So the constraint is yi (Xi0 β + β0 ) ≥ 1 ∀i = 1, . . . , n The optimization problem is min ||β|| β,β0 s.t. yi (Xi0 β + β0 ) ≥ 1 ∀i = 1, . . . , n or min ||β||2 β,β0 s.t. yi (Xi0 β + β0 ) ≥ 1 IEOR165 Discussion ∀i = 1, . . . , n Sheng Liu 11 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Support Vector Machine When not perfectly separable Objective: Minimize ||β|| and errors Constraints: separate the data correctly with possible errors IEOR165 Discussion Sheng Liu 12 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine Support Vector Machine When not perfectly separable, the optimization problem is min ||β||2 + λ β,β0 n X ξi i=1 s.t. yi (Xi0 β + β0 ) ≥ 1 − ξi ξi ≥ 0 ∀i = 1, . . . , n ∀i = 1, . . . , n How to find the dual function? Recall the Lagrangian dual and KKT conditions. IEOR165 Discussion Sheng Liu 13 Nadaraya-Watson Estimator Partially Linear Model Support Vector Machine References James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). New York: springer. Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical Intelligencer, 27(2), 83-85. Bruce E. Hansen, Lecture Notes on Nonparametrics http: //www.ssc.wisc.edu/~bhansen/718/NonParametrics1.pdf IEOR165 Discussion Sheng Liu 14