Document

advertisement
ECES 680
Statistical Pattern Recognition
Fall 2003
ROC and Bootstrap Example
We compare the ROC Az values and confidence intervals obtained by Monte Carlo and
by bootstrap. The Monte Carlo values are the ‘true’ values, except for finite size of the
MC sample. The bootstrap values are based on a single set of observations. We find that
the true value of the Az, as found from the Monte Carlo method, is within the bootstrap
confidence interval.
Background:
See D. Bamber, “The Area above the Ordinal Dominance Graph and the Area below the
Receiver Operating Characteristic Graph,” Journal of Mathematical Psychology, vol. 12,
pp. 387-415, 1975. for information about the ‘empirical ROC.
We construct the ROC for two exponentially distributed variables: the ‘negative’ has
mean zero, the positive has mean a. The ROC curve has the equation
y = x1/a,
with Az = a/(1+a). On the next page we show a sample plot.
Monte Carlo Trials
We generate 50 pseudorandom exponential RV’s with mean 1, and 50
pseudorandom RV’s with mean 5. The ROC expected value is 5/6 = 0.833, with a ruleof-thumb standard deviation equal to sqrt(Az(1-Az)/min(n1,n2)) = sqrt(0.167*0.833/50) =
0.0527. The standard error on the mean Az is 0.0527/sqrt(200) = 0.0037. Assuming that
the distribution of the mean in Gaussian, the 95% confidence interval is mean ± 1 .96
sigma and has value [0.8260, 0.8406].
We run 200 repetitions of a random trial that generates these variables and the
ROC. The average Az over these trials is 0.8325 and is within the 95% confidence
interval computed above. The standard deviation is 0.0410. The 95% confidence interval,
computed from this mean and standard deviation and a normal assumption, is
The obtained Az values are sorted and plotted below as an estimate of the
cumulative distribution of Az estimates. We estimate the 95% confidence interval by
sorting the Az estimates and finding the 5th smallest (5/200 = 2.5%) and the 195th. These
are [0.7348, 0.9096]. This is the confidence interval on a single set of observations. The
confidence interval is not symmetric – this reflects compression on the right of the data.
The 95% confidence interval computed on the basis of the rule-of-thumb standard
deviation and Gaussian assumption is [0.7521, 0.9129]. We see that there is a substantial
overlap between this and the Monte Carlo confidence interval.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Monte Carlo estimates of Az. 200 Monte Carlo trials. Expected values
Bootstrap trials.
One set of pseudorandom numbers was used to estimate the ROC area variation by
generating bootstrap samples. The mean (over the trials) is 0.7767, the standard deviation
of the bootstrap trials is 0.051, and the 95% confidence interval is [0.660, 0.850].
The mean is different from the MC, but within the 95% confidence interval of the
true mean. Furthermore, the true mean is within this confidence interval.
Methods
The Monte Carlo estimates were computed with function mcROC. The bootstrap
estimates were computed with function bootROC. These call functions empir_roc2 to
obtain the ROC curve, atrap to compute the area, and rande to obtain exponentially
distributed random variables. All these are file boot_trial.zip.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Download