Approaches to test evaluation

Approaches to test evaluation

Evan Sergeant

AusVet Animal Health Services

10 May 2010 1

Comparing tests

 Kappa – how well tests agree

 McNemar’s chi-sq – are tests significantly different?

Kappa

Test 1 result

Test 2 result

T

2

+

T

1

+ 121

T

1

34

T

2

-

36

931

Total

157

965

Total 155 967 1122

 Expected no. both +ve = (157 x 155)/1122 = 21.7

 Expected no. both -ve = (965 x 967)/1122 = 831.6

 Total Agreement = 1052

 Chance Agreement = 853.4

 K=(1052-853.4)/(1122-853.4) = 0.739

McNemar Chi-Squared

Test 1 result

Test 2 result

T

2

+

T

1

+ 58

T

1

5

T

2

-

37

196

Total

95

201

Total 63 233 296

McNemar's Chi-squared test with continuity correction

McNemar's chi-squared = 22.881, df = 1, p-value = 1.724e-06

OJD AGID and ELISA

ELISA

AGID

+

–

+

34

15

–

21

154

Total 49 175

 Enter data into epitools

• Application of diagnostic tests > compare 2 tests

• see kappa, McNemar’s and level of agreement

Total

55

169

224

Kappa

SE for kappa = 0

Z(kappa) p(kappa) - one-tailed

Proportion positive agreement

Proportion negative agreement

Overall proportion agreement

McNemar's Chi sq p(Chi sq)

0.5496

0.0666

8.25

0

0.6538

0.8953

0.8393

0.6944

0.4

Gold Standard Tests

 Use tests with perfect sensitivity and/or specificity to identify the true disease status of the individual from which the samples were taken.

 What are the advantages and disadvantages of this approach?

Gold Standards Tests

 Advantages

• Known disease status,

• Relatively simple calculations

 Disadvantages

• May not exist, or be prohibitively expensive

• Rare diseases may only allow small sample size

• Disease may not be present in the country?

• Difficult to get representative (or even comparable) samples of diseased/non-diseased individuals

Exercises

 Calculate Se and Sp for OJD AGID using data provided in

OJD_AGID_Data.xls

• Calculate confidence limits using epitools

Non-gold standard methods

 Do not depend on determining true infection status of individual.

 Rely on statistical approaches to calculate best fit values for Se and Sp.

 Tests must satisfy some important assumptions.

Comparison with a known reference test

 Assumptions

• Independence of tests

• Se/Sp of reference test is known.

 For ~100% specific reference test,

• Se(new test) = Number positive both tests /

Total number positive to the reference test

Culture vs Serology

 Estimate sensitivity of culture and serology (as flock tests)

 Serology followed-up by histopathology to confirm flock status

 Both tests 100% specificity (as flock tests)

 How would you estimate sensitivity for these test(s)

 Which test has better Se? Is the difference significant?

All Flocks

PFC +ve

-ve

Total

Serology

+ve

58

5

63

-ve Total

37

196

95

201

233 296

Example

 Se (PFC) = 58/63 = 92% (83% - 97%)

 Se (Serology) = 58/95 = 61% (51% - 70%)

Value

Kappa

SE for kappa = 0

Z(kappa) p(kappa) - one-tailed

Proportion positive agreement

Proportion negative agreement

Overall proportion agreement

McNemar's Chi sq p(Chi sq)

0.6427

0.0559

11.49

0

0.7342

0.9032

0.8581

22.881

0

Estimation from routine testing data

 test-positives are subject to follow-up and truly infected animals are identified and removed from the population

 Can be used to estimate specificity when the disease is rare in the population of interest.

 Sp = 1 – (Number of reactors / Total number tested)

Se and Sp of equine influenza ELISA

 During the equine influenza outbreak in

Australia, horses were tested by PCR and serology:

• to confirm infection;

• to demonstrate seroconversion and/or absence of infection >30 days later;

• As part of random and targeted surveillance for case detection, to confirm area status and for zone progression in presumed “EI free” areas.

 How could you use the resulting data to estimate sensitivity and specificity of the

ELISA?

Equine influenza

ELISA

 475 PCR-positive horses, 471 also positive on ELISA

 1323 horses from properties in areas with no infection, 1280 ELISA negative

 Analyse in Epitools

• Application of diagnostic tests> test evaluation against gold standard

 Sergeant, E. S. G., Kirkland, P. D. & Cowled, B. D. 2009. Field Evaluation of an equine influenza ELISA used in New South Wales during the 2007 Australian outbreak response. Preventive Veterinary Medicine, 92 , 382-385.

Sensitivity

Specificity

Point Estimate Lower 95% CL Upper 95% CL

0.9916

0.9786

0.9977

0.9675

0.9565

0.9764

Mixture modelling

 Assumptions

• observed distribution of test results (for a test with a continuous outcome reading such as an ELISA) is actually a mixture of two frequency distributions, one for infected individuals and one for uninfected individuals

 Opsteegh, M., Teunis, P., Mensink, M., Zuchner, L., Titilincu, A., Langelaar, M.

& van der Giessen, J. 2010. Evaluation of ELISA test characteristics and estimation of Toxoplasma gondii seroprevalence in Dutch sheep using mixture models. Preventive Veterinary Medicine .

Latent Class Analysis

 What is Latent Class Analysis?

 Maximum Likelihood

 Bayesian

Maximum likelihood estimation

 Assumptions

• The tests are independent conditional on disease status (the sensitivity [specificity] of one test is the same, regardless of the result of the other test);

• The tests are compared in two or more populations with different prevalence between populations;

• Test sensitivity and specificity are constant across populations; and

• There are at least as many populations as there are tests being evaluated.

 TAGS software

•

Hui, S. L. & Walter, S. D. 1980. Estimating the error rates of diagnostic tests. Biometrics, 36 , 167-171

.

TAGS

 Open R – shortcut in root directory of stick

 Open tags.R in text editor or word

 Select all and copy/paste into R console

 Type TAGS() and <Enter> to run

 Hui Walter example

• 2 tests for TB

• Test 1 = Mantoux

• Test 2 = Tine test

 Follow the prompts to enter data:

• Data set = new

• Name = test

• Number of tests = 2, Number of populations = 2

• Reference population? = No (0)

• Enter results for each population from table below

• Best guesses use defaults

Data

• Bootstrap CI = Yes (1000 iterations)

Test 1 Test 2 Population 1 Population 2

0 0 528 367

1

0

1

0

1

1

4

9

14

31

37

887

 $Estimations pre1 pre2 Sp1 Sp2 Se1

Se2

Est 0.0268 0.7168 0.9933 0.9841 0.9661 0.9688

CIinf 0.0159 0.6911 0.9797 0.9684 0.9495 0.9540

CIsup 0.0450 0.7412 0.9978 0.9921 0.9774 0.9790

Bayesian estimation

 What is Bayesian estimation?

• Combines prior knowledge/belief (what you think you know) with data to give best estimate

• Incorporates existing knowledge on parameters (Se, Sp, prevalence)

• “Priors” entered as probability (usually Beta) distributions

• Uses Monte Carlo simulation to solve

• Outputs also as probability distributions

• Can get very complex

 Assumptions

• Independence of the tests

• Appropriate prior distributions chosen.

• Need information on prior probabilities

• Some methods can adjust for correlated tests

• Multiple tests in multiple populations

 Methods

• EpiTools (only allows one population so must have good information on one or more test characteristics)

• WinBUGS models

Bayesian analysis surra data

Test 1

CATT

+ve

-ve

Test 2

ELISA

+ve

0

0

-ve

39

251

Total

39

251

Total 0 290

Inputs for Bayesian analysis for revised sensitivity and specificity estimates

Prior distributions for Bayesian analysis

290 x n

Prev

Se_CATT (81%)

Sp_CATT (99.4%)

Se_ELISA_2 (75%)

Sp_ELISA_2 (97.5%)

100

160

100

120

81

159

75

117 alpha

1

82

160

76

118 beta

1

20

2

26

4

EpiTools

 Run EpiTools > Estimating true prevalence >

Bayesian estimation with two tests

 Enter parameters:

• Data from 2x2 table: 0, 39, 0, 251

• Prevalence = Beta(1,1) (uniform = don’t know)

• Test 1 (CATT): Se = Beta(82, 20), Sp = Beta(160,

2)

• Test 2 (ELISA): Se = Beta(76, 26), Sp = Beta(118,

4)

• Starting values: 0, 38, 0, 245

• Other values as defaults and click submit

Minimum

2.5%

Median

97.5%

Maximum

Mean

SD

Iterations

Prevalence Sensitivity-1 Specificity-1 Sensitivity-2 Specificity-2

<0.0001

0.0001

0.0038

0.0201

0.0567

0.0055

0.0055

20000

0.6219

0.7210

0.8064

0.8749

0.9370

0.8044

0.0393

20000

0.8535

0.8818

0.9109

0.9354

0.9517

0.9103

0.0136

20000

0.5475

0.6510

0.7418

0.8217

0.8891

0.7406

0.0436

20000

0.9554

0.9789

0.9910

0.9973

0.9998

0.9903

0.0048

20000

Approaches to test evaluation

Approaches to test evaluation

Comparing tests

Kappa

McNemar Chi-Squared

OJD AGID and ELISA

Gold Standard Tests

Gold Standards Tests

Exercises

Non-gold standard methods

Comparison with a known reference test

Example

Estimation from routine testing data

Se and Sp of equine influenza ELISA

Equine influenza

ELISA

Mixture modelling

Latent Class Analysis

Maximum likelihood estimation

TAGS

Bayesian estimation

Bayesian analysis surra data

EpiTools

Related documents

Products

Support

Approaches to test evaluation