A Fact About "Binomial Deviance" and Classification This concerns the figures on page 426 HTF and 358 of JWHT and the "binomial deviance" entry of HTF Table 21.1 on their page 427. Consider logistic regression, where y takes values 1 or 1 with probabilities conditional on an input x of the form p 1 exp 0 β ' x 1 and p1 1 exp 0 β ' x 1 exp 0 β ' x Then (for what it is worth) the contribution to the likelihood for 0 and β provided by an observed y and corresponding covariate x is a term (in a product of such things) exp 0 β ' x 1 exp 0 β ' x I y 1 1 1 exp 0 β ' x I y 1 from which it follows that the (additive) contribution to the negative log-likelihood for 0 and β provided by this observed y and corresponding covariate x is a term ln 1 exp 0 β ' x 0 β ' x I y 1 ln 1 exp 0 β ' x I y 1 (*) As it turns out exp u 1 1 exp u exp u exp u 1 exp u 1 exp u so that ln 1 exp u ln 1 exp u u Then the additive contribution to the negative log-likelihood is ln 1 exp y 0 β ' x y 0 β ' x I y 1 ln 1 exp y 0 β ' x I y 1 that is ln 1 exp y 0 β ' x I y 1 ln 1 exp y 0 β ' x I y 1 1 and ultimately ln 1 exp y 0 β ' x (**) (almost) as in Table 427 of HTF.1 The point of the figures in JWHT and HTF is that maximum likelihood fitting (minimum negative log-likelihood fitting) of parameters in logistic regression (penalized or un-penalized) involves a sum of such things (the analog of sums of squared residuals for normal error regression) AND that since the hinge loss 1 y 0 β ' x used as a summand in optimality arguments for SV classifiers is fairly similar to (**) one might expect to get SV classifier decision boundaries derived from voting functions 0 β ' x that are not terribly different from ones derived from logistic regression with appropriate cut-offs at a target value for (fitted) p1 . 1 The HTF table actually lists ln 1 exp y 0 β ' x so that the plots of binomial deviance and hinge loss as functions of y 0 β ' x on page 426 are comparable. This sign change is immaterial, meaning simply that (roughly speaking) one expects sets of SV classifier voting function coefficients with signs opposite those from logistic regression based on the coding/parameterization we've used here. As parameterized here, for logistic regression y 1 is associated with large (positive) values of 0 β ' x while a SV classifier intends to associate y 1 with small (large negative) values of 0 β ' x . 2