Uploaded by Hai Hoang Nguyen

Regression versus Classification(1)

advertisement
Regression versus Classification: An Overview
DNSC 6311
Refik Soyer
The George Washington University
Department of Decision Sciences
1/4
◮
If we want to predict a random variable Y without using any
other information, then the best we can do is to model Y by a
probability distribution and obtain a point prediction using
E [Y ] or median or the mode of Y ’s distribution.
We can also obtain a q% prediction interval (yL , yH ) such that
P(yL < Y < yH ) = q.
◮
Instead, in regression and classification models, we predict a
random variable Y in the presence of knowledge (or assumed
knowledge) of other variable(s) X .
◮
In so doing, our interest is to use the conditional distribution
of Y given X and typically report the conditional mean
E (Y |X = x) = µY |x as our prediction for Y .
Thus, most of the regression and classification models focus
on modeling µY |x .
2/4
◮
Regression versus classification
If our target variable Y is continuous then the models for
µY |x are known as regression models.
If our target variable Y is categorical then the models for µY |x
are known as classification models.
◮
We have only talked the linear regression models but there are
nonlinear regression models as well as nonparametric
regression models where the distribution of Y given X is not
specified as a normal (or any other) distribution.
◮
There are other classification models than the logistic
regression; for example, in the probit model we use the CDF
of standard normal distribution for f in pi = f (β0 + β1 xi ),
that is,
pi = Φ(β0 + β1 xi ).
3/4
◮
Other classification models.
Note that the regression models and classifiers like logit and
probit do not model uncertainty about X and treat it as a
fixed (or known) quantity.
◮
Classifiers such as Linear Discriminant Analysis (LDA) and
Naive Bayes assume a model for distribution of X given Y ,
say fX (x|Y = y ) and for Y as P(Y = y ) and obtain te
posterior distribution of P(Y = y |x) using the Bayes rule
fX (x|Y = y )P(Y = y )
P(Y = y |x) = P
y fX (x|Y = y )P(Y = y )
◮
In LDA distribution of fX (x|Y = y ) is assumed to be
(multivariate) normal.
In Naive Bayes fX (x|Y = y ) can be any form, but
components of X are conditionally independent given Y .
◮
Graphical model representation.
4/4
Download