LOGO Results & Analysis

advertisement
Comparing correlated
correlations
Advisor: Rhonda Decook
Client: Vinayak
Consultants: Tianyu Li, Qinbin Fan
Department of Statistics and Actuarial Science
University of Iowa
Outline
Introduction
Data Highlights
Results & Analysis
Conclusion
LOGO
Introduction
LOGO
 Gold Standard: first take the average score of
each image from 3 graders and then re-rank
them. ( we also tried other ways to define the
gold standard, but this definition is the one we
mainly use)
 Let ralg.i,GS represent the pearson correlation
coefficient between algorithm i and the gold
standard (GS).
 The ralg.i,GS values for the data set with 12
algorithms and 25 images are shown below
(from largest correlation to smallest):
Introduction
LOGO
Correlation
with
Algorithm
GS
12*
0.6583
10
0.6259
3
0.5806
9
0.5718
4
0.5385
8
0.5040
6
0.5000
5
0.4630
7
0.4601
1
0.4300
11
0.3104
2
0.2422
Introduction
LOGO
 For each competing algorithm i=1,2,…,11, we
wish to test Ho: ralg.12,GS-ralg.i,GS=0
 And we improve our method by using The Fish
1+𝑟
Transformation for r which is 𝑍 = log 𝑒
. It is a
1−𝑟
‘normalizing’ transformation. Correlations fall
between -1 and 1, but after transformation, they
fall between −∞ and +∞.
 Our method is based on the journal article:
Cohen A. (1989) "Comparison of correlated
correlations." Statistics in Medicine, 8(12):14851495.
Introduction
LOGO
 We use the bootstrap to form the confidence
interval on the difference between the transformed
version of r.
 The bootstrap method takes into account the fact
that the algorithms were all applied to the same
set of 25 images.
 The bootstrap method resamples with replacement
from the original set of n=25 images, to create a
‘new hypothetical’ data set.
We calculate the
difference in correlations in each of 5000
bootstrapped data sets to provide us with sampling
distribution for the difference
Introduction
LOGO
If the confidence interval does not include
zero, then the algorithms have significantly
different correlations, and one is better than
the other.
We apply the Bonferroni multiple comparison
adjustment to account for the fact that we
are doing 11 comparisons. (C.I. level is 10.05/11 = 0.995). The adjustment allows us
to maintain the type I family-wise error rate
at the 𝛼=0.05 level.
Data Highlights
LOGO
Patient ID
1
2
…
A
1.16154
2.114705
…
B
1.203186
2.126865
…
…
…
…
…
1
2
3
GS
16
21
13
16
20
13
23
20
…
…
…
…
Results & Analysis
r12-rj
(raw)
z12-zj
(fisher)
CI of Diff
(99.5%)
LOGO
CI of Diff
(95%)
Significan
t Diff
A2 VS
A12
0.416
0.542
( 0.042, 1.338 )
( 0.169, 1.048 )
YES
A11 VS
A12
0.347
0.468
(-0.109, 1.266 )
( 0.087, 0.975 )
NO/YES
A1 VS
A12
0.228
0.329
(-0.253, 0.977 ) (-0.058, 0.761 )
NO
A7 VS
A12
0.198
0.292
(-0.229, 0.870 ) (-0.046, 0.664 )
NO
A5 VS
A12
0.195
0.288
(-0.255, 0.845 ) (-0.066, 0.670 )
NO
A6 VS
A12
0.158
0.240
(-0.466, 0.835 ) (-0.208, 0.620 )
NO
Results & Analysis
A8 VS
A12
0.154
A4 VS
A12
LOGO
0.235 (-0.310, 0.753 )
(-0.113, 0.596
)
NO
0.119
0.187
(-0.431, 0.768 )
(-0.211, 0.584 )
NO
A9 VS
A12
0.086
0.139
(-0.461, 0.689 )
(-0.256, 0.504 )
NO
A3 VS
A12
0.077
0.126
(-0.422, 0.652 )
(-0.222, 0.464 )
NO
A10 VS
A12
0.032
0.055
(-0.411, 0.576 )
(-0.255, 0.392 )
NO
Results & Analysis
A2(worst)
VS A12
LOGO
Results & Analysis
A11(2nd worst)
VS A12
LOGO
Results & Analysis
LOGO
We also tried other ways to define the
gold standard, for example, we tried to
use only the 2 most strongly correlated
columns and do the same, to use the
median rank of all 3 for each image, and
do the same, and to use averages
without re-ranking. We found pretty
similar results from all four cases. To
save space and time, we do not present
results of the other three.
Conclusion
LOGO
We did not find statistically significant
difference between our client’s method
and all the other methods.
Results may vary when we have more
data. ( more image grades)
Download