ppt

advertisement

Statistical methods for assessment of agreement

Professor Dr. Bikas K Sinha

Applied Statistics Division

Indian Statistical Institute

Kolkata, INDIA

Organized by

Department of Statistics, RU

17 April, 2012

Lecture Plan

Agreement for Categorical Data [Part I]

09.00 – 10.15 hrs

Coffee Break: 10.15 – 10.30 hrs

Agreement for Continuous Data [Part II]

10.30 – 11.45 hrs

Discussion.....11.45 hrs – 12.00 hrs

Key References

Cohen , J . (1960). A Coefficient of

Agreement for Nominal Scales.

Educational & Psychological

Measurement , 20(1): 37 – 46.

[Famous for Cohen’s Kappa ]

Cohen, J.

(1968). Weighted Kappa :

Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit.

Psychological Bulletin , 70(4): 213-220.

References ….contd.

Banerjee, M., Capozzoli, M., Mcsweeney,

L. & Sinha, D.

(1999). Beyond Kappa : A

Review of Interrater Agreement Measures.

Canadian Jour. of Statistics , 27(1) : 3 - 23.

Lin. L. I.

(1989). A C oncordance C orrelation

C oefficient to Evaluate Reproducibility.

Biometrics, 45 : 255 - 268.

References …contd.

Lin. L. I.

(2000).

T otal D eviation I ndex for

Measuring Individual Agreement : With

Application in Lab Performance and Bioequivalence .

Statistics in Medicine,19 :255 -

270.

Lin, L. I., Hedayat, A. S., Sinha, Bikas &

Yang, Min (2002). Statistical Methods in

Assessing Agreement: Models, Issues, and

Tools. Jour. Amer. Statist. Assoc. 97 (457)

:

257 - 270.

Measurements : Provided by

Experts / Observers / Raters

• Could be two or more systems, assessors, chemists, psychologists, radiologists, clinicians, nurses, rating system or raters, diagnosis or treatments, instruments or methods, processes or techniques or formulae……

Diverse Application Areas…

• Cross checking of data for agreement, Acceptability of a new or generic drug or of test instruments against standard instruments, or of a new method against gold standard method, statistical process control..

Nature of Agreement Problems…

• Assessment & Recording of Responses …

• Two Assessors for evaluation and recording…

• The Raters examine each “unit” Independently of one another and report separately : “+” for

“Affected” or “-” for “OK” : Discrete Type.

Summary Statistics

UNIT Assessment Table

Assessor \ Assessor # II

# I + -

+ 40% 3%

3% 54%

Q. What is the extent of agreement of the two assessors ?

Nature of Data

Assessor \ Assessor # II

# I + -

-

+ 93% 2%

4% 1%

Assessor \ Assessor # II

# I + -

-

+ 3% 40%

44% 13%

Same Question : Extent of Agreement /

Disagreement ?

Cohen’s Kappa : Nominal Scales

Cohen (1960)

Proposed Kappa statistic for measuring agreement when the responses are nominal

Cohen’s Kappa

• Rater I vs Rater II : 2 x 2 Case

Categories Yes & No : Prop.

(i,j)

(Y,Y) &

(N,N) : Agreement Prop

(Y,N) &

(N,Y) :Disagrmnt. Prop

0

 e

=

(Y,Y) +

(N,N) = P[agreement]

=

(Y,.)

(.,Y) +

(N,.)

(.,N)

P [Chancy Agreement]

= [

0

-

 e

] / [ 1 -

 e

]

Chance-corrected Agreement Index

Kappa Computation….

II Total

Yes No

I Yes 0.40 0.03 0.43

No 0.03 0.54 0.57

Total 0.43 0.57 1.00

0

Observed Agreement

=

(Y,Y) +

(N,N)= 0.40 + 0.54 = 0.94 …94%

 e

Chance Factor towards agreement……

=

(Y,.)

(.,Y) +

(N,.)

(.,N)

= 0.43x0.43 + 0.57x0.57=0.5098 ……51%

= [

0

-

 e

] / [ 1 -

 e

]=.4302/.4902

= 87.76%....Chance-Corrected Agreement

Kappa Computations..

• Raters I vs II

0.40 0.03

0.03 0.54 K = 0.8776

0.93 0.02

0.04 0.01 K = 0.2208

0.03 0.40

0.54 0.03 K = - 0.8439

0.02 0.93

0.01 0.04 K = - 0.0184

Nature of Categorical Data

Illustrative Example

Study on Diabetic Retinopathy Screening

Problem : Interpretation of

Single-Field Digital Fundus Images

Assessment of Agreement

WITHIN / ACROSS 4 EXPERT GROUPS

Retina Specialists / General Opthalmologists /

Photographers / Nurses : 3 from each Group

Description of Study Material

400 Diabetic Patients

Selected randomly from a community hospital

One Good Single-Field Digital Fundus Image

Taken from each patient with Signed Consent

Approved by Ethical Committee on

Research with Human Subjects

Raters : Allowed to Magnify / Move the Images

NOT TO MODIFY Brightness / Contrasts

THREE Major Features

#1. D iabetic R etinopathy Severity [6 options]

No Retinopathy / Mild / Moderate NPDR

Severe NPDR / PDR / Ungradable

#2. M acular E dema [ 3 options]

Presence / Absence / Ungradable

#3. R eferrals to O pthalmologists [3 options]

Referrals / Non-Referrals / Uncertain

Retina Specialists’ Ratings [ DR ]

RS1 \ RS2

CODES 0 1 2 3 4 9 Total

0 247 2 2 1 0 0 252

1

2

3

12 18 7 1 0 0

22 10 40 8 0 1

38

81

0 0 3 2 2 0 7

4

9

0 0 0 1 9 0

5 0 1 0 0 6

10

12

Total 286 30 53 13 11 7 400

Retina Specialists’ Ratings [ DR ]

RS1 \ RS3

CODES 0 1 2 3 4 9 Total

0 249 2 0 1 0 0 252

1

2

3

23 8 7 0 0 0

31 4 44 2 0 0

38

81

0 0 7 0 0 0 7

4

9

0 0 0 0 10 0

9 1 0 0 0 2

10

12

Total 312 15 58 3 10 2 400

Retina Specialists’ Ratings [ DR ]

RS2 \ RS3

CODES 0 1 2 3 4 9 Total

0 274 5 6 1 0 0 286

1

2

3

16 5 8 1 0 0

15 2 35 0 0 1

2 2 7 1 1 0

30

53

13

4

9

0 0 2 0 9 0 11

5 1 0 0 0 1 7

Total 312 15 58 3 10 2 400

Retina Specialists’

Consensus Rating [ DR ]

RS1 \ RSCR

CODES 0 1 2 3 4 9 Total

0 252 0 0 0 0 0 252

1

2

3

17 19 2 0 0 0

15 19 43 2 1 1

38

81

0 0 2 4 1 0 7

4

9

0 0 0 0 10 0

8 0 0 0 0 4

10

12

Total 292 38 47 6 12 5 400

Retina Specialists’ Ratings

[ Macular Edema ]

RS1 \ RS2

CODES Presence Absence Subtotal Ungradable Total

Presence 326 11 337

Absence 18 22 40

1 338

3 43

Subtotal 344 33 377

Ungradable 9 0 --

Total 353 33 --

---

10 19

14 400

Retina Specialists’ Ratings [ ME ]

RS1 \ RS3

CODES Presence Absence Subtotal Ungradable Total

Presence 322 13 335

Absence 8 32 40

3 338

3 43

Subtotal 330 45 375

Ungradable 12 0 --

Total 342 45 --

7 19

13 400

Retina Specialists’

Consensus Rating [ ME ]

RS1 \ RSCR

CODES Presence Absence Subtotal Ungradable

Total

Presence 335 2 337

Absence 10 33 43

1 338

0 43

Subtotal 345 35 380

Ungradable 10 0 --

Total 355 35 --

---

9 19

10 400

Photographers on Diabetic ME

PHOTOGRAPHERS 1 vs 2

1 2

Codes Presence Absence SubTotal Ungradable Total

Presence 209 5 214

Absence 65 41 106

Subtotal 274 46 320

Ungradable 2 2 ---

Total 276 48 ---

51 265

4 110

---

21 25

76 400

Photographers’ Consensus Rating on Diabetic Macular Edema

# 1

PHOTOGRAPHERS’s

Consensus Rating

Codes Presence Absence SubTotal Ungradable Total

Presence 257 5 262

Absence 74 30 104

3 265

6 110

Subtotal 331 35 366

Ungradable 24 0 ---

Total 355 35 ---

---

1 25

10 400

Study of RS’s Agreement [ME]

2 x 2 Table : Cohen’s Kappa (K) Coefficient

Retina Specialist Retina Specialist 2

1 Presence Absence Subtotal

Presence 326 11 337

Absence 18 22 40

Subtotal 344 33 377

IGNORED ’Ungradable’ to work with 2 x 2 table

% agreement : (326 + 22) / 377 = 0.9231 = Theta_0

% Chancy Agreement : %Yes. %Yes + %No. %No

(337/377)(344/377) + (40/377)(33/377) = 0.8250 = Theta_e

K = [Theta_0 – Theta_e] / [ 1 – Theta_e] = 56% only !

Nett Agreement Standardized

Study of Photographers’’s

Agreement on Macular Edema

2 x 2 Table : Cohen’s Kappa (K) Coefficient

Photographer Photographer 2

1 Presence Absence Subtotal

Presence 209 5 214

Absence 65 41 106

Subtotal 274 46 320

IGNORED ’Ungradable’ to work with 2 x 2 table

% agreement : (209 + 41) / 320 = 0.7813 = Theta_0

% Chancy Agreement : %Yes. %Yes + %No. %No

(214/320)(274/320) + (106/320)(46/320) = 0.6202 = Theta_e

K = [Theta_0 – Theta_e] / [ 1 – Theta_e] = 42% only !

Nett Agreement Standardized

What About Multiple Ratings like

Diabetic Retinopathy [DR] ?

1 Retina Specialists 2

CODES 0 1 2 3 4 9 Total

0

1

2

247

12

2 2 1 0 0 252

18

22 10

7 1 0 0 38

40 8 0 1 81

3

4

9

0 0 3 2

0 0 0 1

2 0 7

9

5 0 1 0 0

0 10

6 12

Total 286 30 53 13 11 7 400

K Computation……

% Agreement =(247+18+40+2+9+6)/400

= 322/400 =0.8050 = Theta_0

% Chance Agreement = (252/400)(286/400) +

….+(12/400)(7/400) = 0.4860 = Theta_e

K = [Theta_0 – Theta_e] / [ 1 – Theta_e] = 62% !

Note : 100% Credit for ’Hit’ & No Credit for ’Miss’.

Criticism : Heavy Penalty for narrowly missed !

Concept of Unweighted Versus Weighted Kappa

Table of Weights for 6x6 Ratings

Ratings Ratings [ 1 to 6 ]

1

1 2 3 4 5 6

1 24/25 21/25 16/25 9/25 0

2 24/25 1 24/25 21/25 16/25 9/25

3 21/25 24/25 1 24/25 21/25 16/25

4 16/25 21/25 24/25 1 24/25 21/25

5 9/25 16/25 21/25 24/25 1 24/25

6 0 9/25 16/25 21/25 24/25 1

Formula w_ij = 1 – [(i – j)^2 / (6-1)^2]

Formula for Weighted Kappa

• Theta_0(w) = sum sum w_ij f_ij / n

• Theta_e(w) = sum sum w_ij (f_i. /n)(f_.j/n)

• These sum sum are over ALL cells

• For unweighted Kappa : we take into account only the cell freq. along the main diagonal with 100% weight

Computations for

Weighted Kappa

• Theta_0(w) =

• Theta_e(w) =

Theta_0(w) – Theta_e(w)

Weighted Kappa = ---------------------------------

1 – Theta_e(w)

• Unweighted Kappa =........

K works for pairwise evaluation of Raters’ agreement …….

Kstatistics for Pairs of Raters…

Categories DR ME Referral

Retina Specialists

1 vs 2 0.63 0.58 0.65

1 vs 3 0.55 0.64 0.65

2 vs 3 0.56 0.51 0.59

1 vs CGroup 0.67 0.65 0.66

2 vs CGroup 0.70 0.65 0.66

3 vs CGroup 0.71 0.73 0.72

Unweighted Kappa......

Kstatistics for Pairs of Raters…

Categories DR ME Referral

General Opthalmologists

1 vs 2 0.35 0.17 0.23

1 vs 3 0.44 0.27 0.27

2 vs 3 0.33 0.19 0.27

1 vs CGroup 0.33 0.16 0.18

2 vs CGroup 0.58 0.50 0.51

3 vs CGroup 0.38 0.20 0.24

Kstatistics for Pairs of Raters…

Categories DR ME Referral

Photographers…..

1 vs 2 0.33 0.35 0.23

1 vs 3 0.49 0.38 0.41

2 vs 3 0.34 0.45 0.32

1 vs CGroup 0.33 0.29 0.33

2 vs CGroup 0.26 0.29 0.20

3 vs CGroup 0.39 0.49 0.49

Kstatistics for Pairs of Raters…

Categories DR ME Referral

Nurses………..

1 vs 2 0.28 0.15 0.20

1 vs 3 0.32 NA NA

2 vs 3 0.23 NA NA

1 vs CGroup 0.29 0.27 0.28

2 vs CGroup 0.19 0.15 0.17

3 vs CGroup 0.50 NA NA

NA : Rater #3 did NOT rate ’ungradable’.

K for Multiple Raters’ Agreement

• Judgement on Simultaneous Agreement of

Multiple Raters with Multiple Classification of Attributes…....

# Raters = n

# Subjects = k

# Mutually Exclusive & Exhaustive

Nominal Categories = c

Example....Retina Specialists (n = 3),

Patients (k = 400) & DR (6 codes)

Formula for Kappa

• Set k_ij = # raters to assign ith subject to jth category

P_ j = sum_i k_ij / nk = Prop. Of all assignments to jth category

Chance-corrected assignment to category j

[sum k^2_ij – knP_ j{1+(n-1)P_ j}

K_ j = ------------------------------------------kn(n-1)P_ j (1 – P_ j)

Computation of Kappa

• Chance-corrected measure of over-all agreement

• Sum_ j Numerator of K_ j

• K = -----------------------------------------

• Sum_ j Denominator of K_ j

• Interpretation ….Intraclass correlation

Kstatistic for multiple raters…

CATEGORIES DR ME Referral

Retina Specialsts 0.58 0.58 0.63

Gen. Opthalmo. 0.36 0.19 0.24

Photographers 0.37 0.38 0.30

Nurses 0.26 0.20 0.20

All Raters 0.34 0.27 0.28

Other than Retina Specialists, Photographers also have good agreement for DR & ME…

Conclusion based on K-Study

• Of all 400 cases…..

• 44 warranted Referral to Opthalmologists due to

Retinopathy Severity

• 5 warranted Referral to Opthalmologists due to uncertainty in diagnosis

• Fourth Retina Specialist carried out Dilated

Fundus Exam of these 44 patients and substantial agreement [K = 0.68] was noticed for

DR severity……

• Exam confirmed Referral of 38 / 44 cases.

Discussion on the Study

• Retina Specialists : All in active clinical practice :

Most reliable for digital image interpretation

• Individual Rater’s background and experience play roles in digital image interpretation

• Unusually high % of ungradable images among nonphysician raters, though only 5 out of 400 were declared as ’ungradable’ by consensus of the Retina Specialists’ Group.

• Lack of Confidence of Nonphysicians, rather than true image ambiguity !

• For this study, other factors [blood pressure, blood sugar, cholesterol etc] not taken into account……

Cohen’s Kappa : Need for

Further Theoretical Research

• COHEN’S KAPPA STATISTIC: A

CRITICAL APPRAISAL AND SOME

MODIFICATIONS

• BIKAS K. SINHA^ 1, PORNPIS

YIMPRAYOON^ 2, AND MONTIP

TIENSUWAN^ 2

• ^1 : ISI, Kolkata

• ^2 : Mahidol Univ., Bangkok, Thailand

• CSA BULLETIN, 2007

CSA Bulletin (2007) Paper…

• ABSTRACT : In this paper we consider the problem of assessing agreement between two raters while the ratings are given independently in 2-point nominal scale and critically examine some features of Cohen’s

Kappa Statistic, widely and extensively used in this context. We point out some undesirable features of K and, in the process, propose three modified Kappa Statistics. Properties and features of these statistics are explained with illustrative examples.

Further Theoretical Aspects of

Kappa – Statistics ….

• Recent Study on Standardization of Kappa

• Why standardization ?

• K = [Theta_0 – Theta_e] / [ 1 – Theta_e]

Range : -1 <= K <= 1

• K = 1 iff 100% Perfect Rankings

• = 0 iff 100% Chancy Ranking

• = -1 iff 100% Imperfect BUT Split-Half

Why Split Half ?

Example

Presence Absence

• Presence ---30%

• Absence 70% ---

K_C = - 73% [& not -100 % ]

************************************

Only Split Half ---50% provides

50% ---K_C = - 100 %

KModified….

• [Theta_0 – Theta_e]

K_C(M) = ------------------------------------------

P_I[Marginal Y ]. P_I[Marginal N ]

+ P_II[Marginal Y ]. P_II[Marginal N ]

Y : ’Presence’ Category & N : ’Absence’ Category

’I’ & ’II’ represent the Raters I & II

K_C(M) Satisfies

K = 1 iff 100% Perfect Rankings..whatever

• = 0 iff 100% Chancy Ranking…whatever

• = 1 iff 100% Imperfect Ranking…whatever…

Other Formulae..….

• What if it is known that there is 80% Observed

Agreement i.e., Theta_0 = 80% ?

• K_max = 1 ? K_min = -1 ?...NOT Really....

• So we need standardization of K_C as

• K_C(M2) = [K_C – K_C(min)] OVER

[K_C (max) – K_C(min)] where Max. & Min. are to be evaluated under the stipulated value of observed agreement

Standardization yields….

K_C + (1-Theta_0) / (1+Theta_0)

K_C(M2) = -----------------------------------------------

Theta_0^2 / [1+(1-Theta_0)^2] +

(1-Theta_0) / (1+Theta_0)

K_C(M3)={[K_C(M) + (1-Theta_0)/(1+Theta_0)}

OVER

{[Theta_0 /(2-Theta_0) + (1-Theta_0)/(1+Theta_0)}

Revisiting Cohen’s Kappa…..

2 x 2 Table : Cohen’s Kappa (K) Coefficient

Retina Specialist Retina Specialist 2

1 Presence Absence Subtotal

Presence 326 11 337

Absence 18 22 40

Subtotal 344 33 377

K_C = 56% [computed earlier]

Kappa - Modified

• K_C(M) = 56 % [same as K_C]

Given Theta_0 = 92.30 %

• 0.5600 + 0.0400

• K_C(M2) = ------------------------- = 61 %

0.8469 + 0.0400

0.5600 + 0.0400

• K_C(M3) = --------------------------- = 67 %

0.8570 + 0.0400

Beyond Kappa …..

• A Review of Inter-rater Agreement

Measures

• Banerjee et al : Canadian Journal of Statistics : 1999; 3-23

• Modelling Patterns of Agreement :

• Log Linear Models

• Latent Class Models

That’s it for now……

• Thanks for your attention….

• This is the End of Part I of my talk.

Bikas Sinha

UIC, Chicago

April 29, 2011

Download