Normal Distribution and Diagnostic Tests - BMI for WNBA and NBA Players 2013-2014

advertisement
Normal Distribution, Likelihood Ratios,
and ROC Curves
Body Mass Indices for WNBA and NBA
Players 2013-2014 Seasons
Data Description
•
•
•
•
Body Mass Index: BMI = 703*Weight(lbs)/(Height(in))2
WNBA (Females): 139 w/ Mean=23.135, SD=2.105
NBA (Males): 505 w/ Mean=24.741, SD=1.720
Distributions are approximately normal
WNBA and NBA BMI Distributions
0.25
Females: mF = 23.135 sF = 2.105
Males: mM = 24.741 sM = 1.720
0.2
Normal Density
0.15
f(y_F)
f(y_M)
0.1
0.05
0
15
18
21
24
Body Mass Index
27
30
Probability and Quantile Calculations
YF ~ N  m F  23.135 , s F2  2.1052 
YM ~ N  m M  24.741 , s M2  1.7202 


Y  m F 24.00  23.135
P YF  24.00   P  Z F  F

 0.41  .3409
sF
2.105




Y  m M 24.00  24.741
P YM  24.00   P  Z M  M

 0.43   1  P  Z M  0.43  1  P  Z M  0.43  1  .3336  .6664
sM
1.720


95th-Percentile (0.95th-quantile) for Females:
P YF  q.95   .95  1  P YF  q.95 
P  Z F  1.645   .0500 (From Z-table and interpolation)
Y  23.135


P  Z F  1.645   P  Z F  F
 1.645   P YF  23.135  1.645  2.105   26.60   .05
2.105


10th-Percentile (0.10th-quantile for Males)
P YM  q.10   .10  P YM  q.90 
P  Z M  1.282   P  Z M  1.282   .1000 (From Z-table and interpolation)
Y  24.741


P  Z M  1.282   P  Z M  M
 1.282   P YM  24.741  1.282 1.720   22.54   .10
1.720


Normal Probabilities
BMI\Gender F
M
>24
0.3409
0.6664
<24
0.6591
0.3336
Total
1
1
Note: If we used >24 vs <24 as a classifier between Males
and Females, about 2/3 of Males and 2/3 of Females would
be classified correctly
Other Choices of Cut-Off Values
Cut-Off Z_F
Z_M
P(F<CO) P(F>CO) P(M<CO) P(M>CO) CorrectF FalseM FalseF
CorrectM
20 -1.4893 -2.7564
0.0682
0.9318
0.0029
0.9971
0.0682
0.9318
0.0029
0.9971
21 -1.0143 -2.1750
0.1552
0.8448
0.0148
0.9852
0.1552
0.8448
0.0148
0.9852
22 -0.5392 -1.5936
0.2949
0.7051
0.0555
0.9445
0.2949
0.7051
0.0555
0.9445
23 -0.0641 -1.0122
0.4744
0.5256
0.1557
0.8443
0.4744
0.5256
0.1557
0.8443
24
0.4109 -0.4308
0.6594
0.3406
0.3333
0.6667
0.6594
0.3406
0.3333
0.6667
25
0.8860
0.1506
0.8122
0.1878
0.5598
0.4402
0.8122
0.1878
0.5598
0.4402
26
1.3610
0.7320
0.9133
0.0867
0.7679
0.2321
0.9133
0.0867
0.7679
0.2321
27
1.8361
1.3134
0.9668
0.0332
0.9055
0.0945
0.9668
0.0332
0.9055
0.0945
28
2.3112
1.8948
0.9896
0.0104
0.9709
0.0291
0.9896
0.0104
0.9709
0.0291
In this table: Z F 
CO  23.135
2.105
and Z M 
CO  24.741
1.720
If we make the cut-off very low (say BMI=20), we get very accurate test for Males
(.9971 Correct), but very inaccurate test for Females (.0682) correct.
Similarly, if we make the cut-off very high (say BMI=28), we get very accurate test for
Females (.9896 correct), but very inaccurate for Males (.0291 correct)
This situation is very similar to diagnostic tests for patients for a disease
Prior/Posterior Probabilities, Odds, Likelihood Ratios
In this population of professional basketball players, there are:
139 Females and 505 Males (644 Total).
T  represents having a BMI above the cut-off Value, and testing "Positive" as being Male
139
505
Prior Probabilities: P  F  
 .2158
PM  
 .7842
644
644
p
.2158
.7842
Prior Odds: odds 
 odds  F  
 .2752
odds  M  
 3.6339
1 p
.7842
.2158
Likelihood Ratio of a Positive Test: LR T


P T  | M 
P T  | F 
Likelihood Ratio of a Negative Test: LR T   
P T  | F 
P T  | M 
Posterior odds given a Positive Test (similar for a negative test):

odds M T

P T  | M 
  odds  M   LR T   P T


| F

odds F T

Posterior Probabilities given a Positive Test (similar for a negative test):
odds
p
1  odds

 P MT




odds M T 


1  odds M T 


P FT



P T  | F 
  odds  F   LR T   P T


odds F T 


1  odds F T 


|M
Computations
Cut-Off P(F)
P(M)
odds(F) odds(M) P(T+|F) P(T+|M) LR(T+)
odds(M|T+) P(M|T+)
20
0.2158
0.7842
0.2752
3.6331
0.9318
0.9971
1.0701
3.8876
0.7954
21
0.2158
0.7842
0.2752
3.6331
0.8448
0.9852
1.1662
4.2370
0.8091
22
0.2158
0.7842
0.2752
3.6331
0.7051
0.9445
1.3395
4.8664
0.8295
23
0.2158
0.7842
0.2752
3.6331
0.5256
0.8443
1.6064
5.8363
0.8537
24
0.2158
0.7842
0.2752
3.6331
0.3406
0.6667
1.9576
7.1123
0.8767
25
0.2158
0.7842
0.2752
3.6331
0.1878
0.4402
2.3436
8.5144
0.8949
26
0.2158
0.7842
0.2752
3.6331
0.0867
0.2321
2.6754
9.7200
0.9067
27
0.2158
0.7842
0.2752
3.6331
0.0332
0.0945
2.8497
10.3533
0.9119
28
0.2158
0.7842
0.2752
3.6331
0.0104
0.0291
2.7912
10.1407
0.9102
Alternative Calculation using Law of Total Probability and Bayes' Rule (CO = 24):
P  F   .2158 P  M   .7842 P T  | F   .3406 P T  | M   .6667
 P T    P  F  P T  | F   P  M  P T  | M   .2158 .3406   .7842 .6667   .5963
 PM |T


P  M  P T  | M 
P T  
.7842 .6667 

 .8767
.5963
Receiver Operating Characteristic (ROC) Curve - BMI Classify as M/F
1.000
0.900
0.800
Sensitivity = P(True +) = P(T+|M)
0.700
0.600
0.500
True+
45DegLine
0.400
0.300
0.200
0.100
0.000
0.000
0.100
0.200
0.300
0.400
0.500
0.600
1-Specificity = P(False +) = P(T+|F)
0.700
0.800
0.900
1.000
Performance of BMI as Test for M/F
• An excellent test would have a high arc to the Northwest
corner of the graph, allowing for a high sensitivity,
P(T+|M) along with a low 1-specificity, P(T+|F)
• Clearly, this test does not perform particularly well (due
to large overlap in the Male/Female BMI densities
• Commonly reported measure is the Area Under the ROC
Curve (AUC) 0.5 ≤ AUC ≤ 1
• Rule of Thumb: 0.9-1 = Excellent, 0.8-0.9 = Good,
0.7-0.8 = Fair, 0.6-0.7 = Poor, 0.5-0.6 = Fail
• For this Test, AUC = 0.6621 (applying trapezoidal rule)
ba
 f  x  dx  2n  f  x   2 f  x   ...  2 f  x   f  x 
b
a
0
1
n 1
n
with a  0, b  1, n  197,
f  x   P T  | M 
Download