LOGO Results & Analysis

Comparing correlated correlations Advisor: Rhonda Decook Client: Vinayak Consultants: Tianyu Li, Qinbin Fan Department of Statistics and Actuarial Science University of Iowa Outline Introduction Data Highlights Results & Analysis Conclusion LOGO Introduction LOGO  Gold Standard: first take the average score of each image from 3 graders and then re-rank them. ( we also tried other ways to define the gold standard, but this definition is the one we mainly use)  Let ralg.i,GS represent the pearson correlation coefficient between algorithm i and the gold standard (GS).  The ralg.i,GS values for the data set with 12 algorithms and 25 images are shown below (from largest correlation to smallest): Introduction LOGO Correlation with Algorithm GS 12* 0.6583 10 0.6259 3 0.5806 9 0.5718 4 0.5385 8 0.5040 6 0.5000 5 0.4630 7 0.4601 1 0.4300 11 0.3104 2 0.2422 Introduction LOGO  For each competing algorithm i=1,2,…,11, we wish to test Ho: ralg.12,GS-ralg.i,GS=0  And we improve our method by using The Fish 1+𝑟 Transformation for r which is 𝑍 = log 𝑒 . It is a 1−𝑟 ‘normalizing’ transformation. Correlations fall between -1 and 1, but after transformation, they fall between −∞ and +∞.  Our method is based on the journal article: Cohen A. (1989) "Comparison of correlated correlations." Statistics in Medicine, 8(12):14851495. Introduction LOGO  We use the bootstrap to form the confidence interval on the difference between the transformed version of r.  The bootstrap method takes into account the fact that the algorithms were all applied to the same set of 25 images.  The bootstrap method resamples with replacement from the original set of n=25 images, to create a ‘new hypothetical’ data set. We calculate the difference in correlations in each of 5000 bootstrapped data sets to provide us with sampling distribution for the difference Introduction LOGO If the confidence interval does not include zero, then the algorithms have significantly different correlations, and one is better than the other. We apply the Bonferroni multiple comparison adjustment to account for the fact that we are doing 11 comparisons. (C.I. level is 10.05/11 = 0.995). The adjustment allows us to maintain the type I family-wise error rate at the 𝛼=0.05 level. Data Highlights LOGO Patient ID 1 2 … A 1.16154 2.114705 … B 1.203186 2.126865 … … … … … 1 2 3 GS 16 21 13 16 20 13 23 20 … … … … Results & Analysis r12-rj (raw) z12-zj (fisher) CI of Diff (99.5%) LOGO CI of Diff (95%) Significan t Diff A2 VS A12 0.416 0.542 ( 0.042, 1.338 ) ( 0.169, 1.048 ) YES A11 VS A12 0.347 0.468 (-0.109, 1.266 ) ( 0.087, 0.975 ) NO/YES A1 VS A12 0.228 0.329 (-0.253, 0.977 ) (-0.058, 0.761 ) NO A7 VS A12 0.198 0.292 (-0.229, 0.870 ) (-0.046, 0.664 ) NO A5 VS A12 0.195 0.288 (-0.255, 0.845 ) (-0.066, 0.670 ) NO A6 VS A12 0.158 0.240 (-0.466, 0.835 ) (-0.208, 0.620 ) NO Results & Analysis A8 VS A12 0.154 A4 VS A12 LOGO 0.235 (-0.310, 0.753 ) (-0.113, 0.596 ) NO 0.119 0.187 (-0.431, 0.768 ) (-0.211, 0.584 ) NO A9 VS A12 0.086 0.139 (-0.461, 0.689 ) (-0.256, 0.504 ) NO A3 VS A12 0.077 0.126 (-0.422, 0.652 ) (-0.222, 0.464 ) NO A10 VS A12 0.032 0.055 (-0.411, 0.576 ) (-0.255, 0.392 ) NO Results & Analysis A2(worst) VS A12 LOGO Results & Analysis A11(2nd worst) VS A12 LOGO Results & Analysis LOGO We also tried other ways to define the gold standard, for example, we tried to use only the 2 most strongly correlated columns and do the same, to use the median rank of all 3 for each image, and do the same, and to use averages without re-ranking. We found pretty similar results from all four cases. To save space and time, we do not present results of the other three. Conclusion LOGO We did not find statistically significant difference between our client’s method and all the other methods. Results may vary when we have more data. ( more image grades)

LOGO Results & Analysis

Related documents

Products

Support

LOGO Results & Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib