Regression versus Classification: An Overview DNSC 6311 Refik Soyer The George Washington University Department of Decision Sciences 1/4 ◮ If we want to predict a random variable Y without using any other information, then the best we can do is to model Y by a probability distribution and obtain a point prediction using E [Y ] or median or the mode of Y ’s distribution. We can also obtain a q% prediction interval (yL , yH ) such that P(yL < Y < yH ) = q. ◮ Instead, in regression and classification models, we predict a random variable Y in the presence of knowledge (or assumed knowledge) of other variable(s) X . ◮ In so doing, our interest is to use the conditional distribution of Y given X and typically report the conditional mean E (Y |X = x) = µY |x as our prediction for Y . Thus, most of the regression and classification models focus on modeling µY |x . 2/4 ◮ Regression versus classification If our target variable Y is continuous then the models for µY |x are known as regression models. If our target variable Y is categorical then the models for µY |x are known as classification models. ◮ We have only talked the linear regression models but there are nonlinear regression models as well as nonparametric regression models where the distribution of Y given X is not specified as a normal (or any other) distribution. ◮ There are other classification models than the logistic regression; for example, in the probit model we use the CDF of standard normal distribution for f in pi = f (β0 + β1 xi ), that is, pi = Φ(β0 + β1 xi ). 3/4 ◮ Other classification models. Note that the regression models and classifiers like logit and probit do not model uncertainty about X and treat it as a fixed (or known) quantity. ◮ Classifiers such as Linear Discriminant Analysis (LDA) and Naive Bayes assume a model for distribution of X given Y , say fX (x|Y = y ) and for Y as P(Y = y ) and obtain te posterior distribution of P(Y = y |x) using the Bayes rule fX (x|Y = y )P(Y = y ) P(Y = y |x) = P y fX (x|Y = y )P(Y = y ) ◮ In LDA distribution of fX (x|Y = y ) is assumed to be (multivariate) normal. In Naive Bayes fX (x|Y = y ) can be any form, but components of X are conditionally independent given Y . ◮ Graphical model representation. 4/4