Document 10784912

advertisement
 A Fact About "Binomial Deviance" and Classification
This concerns the figures on page 426 HTF and 358 of JWHT and the "binomial deviance" entry
of HTF Table 21.1 on their page 427.
Consider logistic regression, where y takes values 1 or 1 with probabilities conditional on an
input x of the form
p 1 
exp   0  β ' x 
1
and p1 
1  exp   0  β ' x 
1  exp   0  β ' x 
Then (for what it is worth) the contribution to the likelihood for  0 and β provided by an
observed y and corresponding covariate x is a term (in a product of such things)
 exp   0  β ' x  


 1  exp   0  β ' x  
I  y 1


1


 1  exp   0  β ' x  
I  y 1
from which it follows that the (additive) contribution to the negative log-likelihood for 0 and β
provided by this observed y and corresponding covariate x is a term
 ln 1  exp  
0

 β ' x      0  β ' x  I  y  1  ln 1  exp   0  β ' x   I  y  1
(*)
As it turns out
 exp  u   1 
1  exp  u 
 exp  u  
  exp  u 
1  exp  u 
1

exp

u




so that
ln 1  exp  u    ln 1  exp  u    u
Then the additive contribution to the negative log-likelihood is
 ln 1  exp   y  
0




 β ' x    y   0  β ' x  I  y  1  ln 1  exp  y   0  β ' x   I  y  1
that is




ln 1  exp  y   0  β ' x   I  y  1  ln 1  exp  y   0  β ' x   I  y  1
1 and ultimately

ln 1  exp  y   0  β ' x  

(**)
(almost) as in Table 427 of HTF.1
The point of the figures in JWHT and HTF is that maximum likelihood fitting (minimum
negative log-likelihood fitting) of parameters in logistic regression (penalized or un-penalized)
involves a sum of such things (the analog of sums of squared residuals for normal error
regression) AND that since the hinge loss
1  y   0  β ' x   
used as a summand in optimality arguments for SV classifiers is fairly similar to (**) one might
expect to get SV classifier decision boundaries derived from voting functions  0  β ' x that are
not terribly different from ones derived from logistic regression with appropriate cut-offs at a
target value for (fitted) p1 .
1


The HTF table actually lists ln 1  exp   y   0  β ' x   so that the plots of binomial deviance
and hinge loss as functions of y   0  β ' x  on page 426 are comparable. This sign change is
immaterial, meaning simply that (roughly speaking) one expects sets of SV classifier voting
function coefficients with signs opposite those from logistic regression based on the
coding/parameterization we've used here. As parameterized here, for logistic regression y  1
is associated with large (positive) values of  0  β ' x while a SV classifier intends to associate
y  1 with small (large negative) values of 0  β ' x .
2 
Download