An Introduction to Serology for diagnosis of Animal Diseases

advertisement
Serology: Test performance measurement
Serology:
Test Performance Measurement
Authors: Dr RW Worthington.
Licensed under a Creative Commons Attribution license.
TABLE OF CONTENTS
Sensitivity .................................................................................................................................................... 2
Analytical sensitivity (A-SN) ..................................................................................................................... 2
Diagnostic sensitivity (D-SN) ................................................................................................................... 2
Specificity .................................................................................................................................................... 3
Analytical specificity (A-SP) ..................................................................................................................... 3
Determination of D-SN and D-SP ............................................................................................................... 4
Specificity and sensitivity requirements for a test .................................................................................. 5
Predictive value (PV) of a test .................................................................................................................... 6
Predictive value of a positive test (PV+) .................................................................................................. 6
Predictive value of a negative test (PV-) .................................................................................................. 7
Comparing two tests ................................................................................................................................. 10
Precision .................................................................................................................................................... 11
Repeatability .............................................................................................................................................. 11
Reproducibility .......................................................................................................................................... 12
Accuracy .................................................................................................................................................... 12
Validity and validation of test methods .................................................................................................. 12
1|Page
Serology: Test performance measurement
Serological tests are often described by those using them as good, accurate, useful, unreliable,
difficult to perform etc. These loose descriptions indicate that there are differences in the accuracy
and usefulness of different tests that require accurate definition and quantitation. In this chapter the
essential terms used to describe test performance and the measurement of the necessary parameters
are described.
The primary indicators of test performance are sensitivity and specificity.
SENSITIVITY
There are two types of sensitivity, analytical sensitivity (A-SN) and diagnostic sensitivity (D-SN).
Analytical sensitivity (A-SN)
A-SN is a measurement of the amount of antibody a test can detect. It can theoretically be defined in
absolute terms such as the test can detect or X g of antibody. However, there are practical
difficulties that make absolute measurements complicated or even impossible. Polyclonal sera contain
a variety of antibodies of different antibody class, molecular mass, and affinity for the antigen and
specificity for particular epitopes on the antigen. The quantities of the various types of antibodies may
vary from serum to serum. Therefore, the amount of antibody in terms of absolute weight of antibody
that a test can detect will vary from serum to serum. A-SN is more usually used as a comparative
measure e.g. test A is three times more sensitive than Test B. In practice running the two tests on a
panel of test sera and comparing the results will give a comparison of their relative sensitivity.
Alternatively the A-SN can be defined simply by comparing it to a single international or national
reference serum. In this case the A-SN for a test would be defined in terms of the minimum number of
international units of antibody the test can detect. Comparative A-SN estimations are all that are
necessary for the verification of test procedures used for diagnostic testing.
Diagnostic sensitivity (D-SN)
D-SN is the ability of a test to detect infected animals, in a population of animals. It is the percentage
of infected animals that the test can detect in a population of known infected animals. Although this is
a simple and useful concept the accurate measurement of D-SN is often difficult.
To measure D-SN it is necessary to have a suitable number of sera, from known infected animals. In
practice it may be difficult to find suitable sera because it is not permissible to use another serological
test as the criterion for identifying positive animals. If this is done no test can exceed the sensitivity of
the test used to select the positive samples. To compare the diagnostic sensitivity of two tests a
random sample of positive and negative sera from a population of infected animals must be used. In
this sample there will be some sera in which antibody can only be detected by one test or the other,
so that a comparison of the two tests is possible. However, the true situation with regard to the
infectious status of each animal may never be precisely known.
2|Page
Serology: Test performance measurement
Where it is possible to assemble a panel of sera from proven infected animals a good measure of DSN may be obtained. For example it is possible to test animals by culturing semen samples to
identify those infected with Brucella ovis, and all animals from which B. ovis can be isolated are
proven infected animals. However, even collecting a panel of sera from semen culture positive
animals, will not be ideal because culture of semen is an insensitive test of infection and the panel will
not include sera from infected animals that are not excreting the organism in their semen. If transiently
infected animals, that are not excreting organisms in their semen, are excluded, the population tested
will not represent the true distribution of reactions in infected animals and will be biased as transiently
infected animals tend to have low titres. Due to the peculiarities of the pathogenesis and nature of
different infectious diseases and the circumstances that pertain to field populations, difficulties in
assembling an ideal panel of samples can probably be cited for most diseases. Although many
different approaches can be used to collect the best possible collection of sera, the expense and
difficulties involved often results in using sample panels that are smaller than ideal.
Susceptible animals may also be infected experimentally to create a sample of known infected
animals for testing. However, this artificial situation may not yield a panel of sera with a distribution of
titres similar to those found in naturally infected populations. Furthermore, the high cost of this type of
experiment usually leads to the use of small numbers of animals.
The high cost of assembling a reliably verified set of sera, results in many estimates being made on
less than ideal numbers of sera. The recommendations of OIE are that there should be 300 sera from
known positives for making a preliminary estimate of D-SN and 1,000 for a precise estimate. These
numbers are difficult to achieve and estimates made on the basis of non-ideal sample numbers often
provide the best available information about a particular test. It is necessary to be aware of the
limitations and possible sources of error in published data and make a judgement about how useful
the information is.
SPECIFICITY
There are two types of specificity, analytical specificity (A-SP) and diagnostic specificity (D-SP).
Analytical specificity (A-SP)
A-SP is a measure of the propensity of the test not to cross react with antibodies formed against
antigens closely related to the antigen used in the test. In practice it can only be estimated by
measuring the amount of cross reactivity that a test gives with sera from animals that have been
infected with diseases which may stimulate the development of cross reactive antibodies.
Experiments can be done to measure the comparative A-SP of two tests against a panel of potentially
cross-reacting sera. However, causes of non-specific reactions are often not known and suitable sera
for testing this property are not readily available and would vary from one situation to another. A
rigorous definition of how D-SP should be defined is not available and any such definition would of
necessity vary from test to test. Until these difficulties are resolved laboratories are likely to measure
3|Page
Serology: Test performance measurement
this characteristic only sporadically and in a variety of different ways, only when they have identified a
particular problem that causes cross-reactions.
Diagnostic specificity (D-SP)
D-SP relates to the diagnostic accuracy of negative test results. The D-SP of a test is defined as the
proportion (percentage) of sera from, non-infected animals that give negative test results. It is often
easier to measure than D-SN, as populations of non-infected animals, from which suitable test
samples may be drawn, are available for many diseases. Where diseases are absent from a country
or from large areas of a country or from individual properties, suitable sera can be obtained in large
numbers from these sources. However, in some cases where a disease is widespread there may be
difficulties in obtaining sera from verified non-infected animals. Infectious bovine rhinotracheitis (IBR)
is an example of a disease that is endemic and infects a high proportion of animals in many countries,
making it difficult to find sera from known uninfected animals. The OIE recommendations for the
validation of tests are that 1,000 sera should be used to make a preliminary estimation and 5,000 for
a precise estimate.
Problems with the specificity of a test may be due to a technical problem with the test, such as a
complement fixation test in which the amount of complement or some other reagent has been
incorrectly titrated. Correcting the test methodology or technical faults or using more suitable reagents
may solve such problems. However, low D-SP may also be due to problems that occur in the animal
population from which the samples of negative sera were drawn. Poor diagnostic specificity may be
related to the prevalence of non-specifically sensitised animals in the population. For example if there
is a high prevalence of Yersinia enterocolitica 09, an organism with a similar antigen on it’s surface to
Brucella abortus, there may be a number of animals in a brucellosis free population that react
positively to serological tests for Brucella abortus. Measurement of D-SP should alert the user to the
possibility of both technical problems with the test method and non-specific sensitisation problems in
the animal population.
DETERMINATION OF D-SN AND D-SP
The most difficult part of determining the D-SN and specificity of a test is the gathering of a suitable
panel of sera from known positive and negative animals. These sera should be aliquotted,
randomised, numbered with suitable code numbers and given to two or three operators for testing.
Each operator should be given duplicates of each serum. Operators should be unaware of the
identity or status of any serum aliquots and their results should be decoded and collated by the
person who alliquoted and distributed the samples. Any sera for which duplicate results do not meet
the standard set for repeatability should be re-tested. The results are presented in a two by two
contingency table and the D-SN and D-SP calculated as shown in Table 1.
4|Page
Serology: Test performance measurement
Table 1. Calculation of D-SN and D-SP from data presented in a two by two contingency table
Infected
animals
Uninfected
animals
Totals
Positive test
945
24
969
Negative test
68
2,983
3,051
1013
3,007
4,020
Totals
D-SN = infected animals with a positive test/ total infected animals
= 945/1,013
= 0.933 (93.3%)
D-SP = uninfected animals with a negative test/ total uninfected animals
= 2,983/3,007
= 0.992 (99.2%)
SPECIFICITY AND SENSITIVITY REQUIREMENTS FOR A TEST
Most tests cannot provide perfect D-SN and D-SP and the two properties are often inversely related.
The requirements vary in different situations and for different tests.
In some cases maximum
sensitivity is required and specificity is of lesser importance in other situations the requirements may
be reversed. D-SN may be increased by making adjustments to the test procedure that increase the
A-SN or altering the test interpretation criteria by decreasing the cut-off point that divides positive and
negative results. However, increasing A-SN should be done with caution because it may not result in
a meaningful increase in D-SN and may cause a decrease in the specificity of the test. Decreasing
the A-SN of a test may sometimes result in little difference to the D-SN of the test, while increasing
specificity and increasing the robustness of the test.
The term robust describes the indefinable
property that relates to the ease with which a test can be performed giving repeatable and reliable
results in a practical working situation. Therefore it is recommended that the A-SN test and the
interpretation criteria for a test should not be fiddled with, without carefully measuring the effects this
has on the diagnostic performance of the test.
Maximum D-SN is usually required where it is important to discover every possible positive animal
and the cost of wrongly identifying negative animals as positive is of comparatively minor importance.
In a national eradication campaign, such as in a bovine brucellosis eradication programme it is
important to identify the maximum number of infected animals as rapidly as possible. If infected
5|Page
Serology: Test performance measurement
animals are missed this can lead to further spreading of the disease and the costs for further testing
far outweigh those associated with the unnecessary culling of a few uninfected reactors. Similarly
purchasers of animals wishing to keep their herds or countries free from a disease require a test of
maximal D-SN to detect all infected animals being purchased or imported and have little interest in
how many animals are rejected because of false positive reactions. For example some countries,
having eradicated bovine brucellosis from their countries, will not import any animal having 30 IU of
Brucella abortus agglutinating antibody in their serum. An agglutination test with this level of
sensitivity will identify increased numbers of non-specific reactors and probably does not meaningfully
improve the D-SN.
On the other hand most sellers of animals are primarily concerned with specificity, as they do not wish
to lose sales or have their herd classified as infected because of non-specific reactions, which they
often regard as being caused by testing laboratory incompetence.
Serological tests that have D-SN values greater than 95% and D-SP values of greater than 99% are
good tests for use in most circumstances. It should be remembered that D-SP and D-SN can be
adjusted by altering the interpretation cut-off levels of the test. However, the error rate of the test,
which is dependent on both D-SP and D-SN, is more complex to adjust and this problem has been
discussed in Chapter 2. An example of a well-validated, high performance test is the complement
fixation test for Brucella ovis. However, it is not necessary for a test to meet these high standards to
be useful. If one is trying to determine whether a disease is present in a country it is important to have
a test of high specificity to avoid making false positive diagnoses that would result in the countries
disease free status becoming compromised. Even if the D-SN is low the test results will often be valid
if large enough numbers of animals are tested. For example, it can be deduced from suitable
statistical tables that to have 99% confidence that a sample of animals will include at least one
infected animal if the prevalence of the disease is 0.5 %, a sample of 919 animals from a large
population of animals is required. By testing ten times this number there would probably be about 45
infected animals in the sample and a 99% probability of having at least 10. The high specificity test,
with a sensitivity as low as 50%, should be easily able to detect positives in this sample. Another
approach would be to use a high sensitivity test with a low specificity as a screening test. A definitive
test with high specificity, such as culturing for the infectious agents, could then be used to re-test the
positives detected by the serological test.
PREDICTIVE VALUE (PV) OF A TEST
A test predictive value may be given for a positive or a negative test result. Predictive value is
sometimes referred to as diagnosability and sometimes wrongly called validity.
Predictive value of a positive test (PV+)
PV+ is the proportion (percentage) of infected animals that react positively to the test. The PV+ can
be calculated from the data in Table 1 as follows:
6|Page
Serology: Test performance measurement
PV+ = infected positive reactors/ non-infected, test positives + infected, positive reactors
= 945/969
= 0.975 (97.5%)
For any population of animals this parameter is directly related to the specificity of the test. If the
specificity is low, there will be comparatively large numbers of reactors that are not infected leading to
a low PV+.
PV+ will also vary greatly in different populations of animals depending on the disease prevalence in
the population. This can best be demonstrated by means of an example. Consider a test with D-SN =
0.95 (95%) and D-SP = 0.99(99%) used to test 1,000,000 animals in which the disease prevalence is
0.5 (50%).
Half the animals tested (500,000) will be infected and multiplying this number by D-SN (0.95) gives
475,000 infected animals with positive tests.
Multiplying the 500,000 uninfected animals by (1-(D-SP)) = 0.01 x 500,000 gives 5,000 positive tests.
Therefore the total number of positive reactors is 475,000 +5,000 and:
PV+ = 475,000/ (475,000+5,000)
= 0.99 (99%)
If the same test is used on a million animals in which the disease prevalence is 0.01 (1%), there will
be 10,000 infected animals of which 10,000 x 0.95 = 9,500 are test positive.
There will be 990,000 uninfected animals in which there will be 0.01 x 990,000 = 9,900 false positive
tests, giving a total of 9,500 + 9,900 reactors and:
PV+ = 9,500/(9,500 + 9,900)
= 0.49 (49%)
It also follows that if the population of animals tested is free from the disease the PV+ is zero,
because every positive test is a false positive.
Predictive value of a negative test (PV-)
PV- is the proportion (percentage) of the animals that react negatively that are not infected. The PVcan be calculated from the data in Table 1 as follows:
PV- = uninfected negative animals/ test negative infected animals + uninfected negative animals
PV- = 2,983/3,051
7|Page
Serology: Test performance measurement
PV- = 0.978(97.8%)
PV- is directly related to the D-SN of the test, because if the D-SN is low there will be large numbers
of test negative infected animals which will result in a low PV-.
PV- is also affected by the disease prevalence, and similar calculations, to those made above for PV+
can be made to demonstrate the relationship. Theoretically when there is 100% prevalence of a
disease the PV- will be nil, but this is unlikely to occur in practice.
Those interested in calculating the risk of importing a disease from another country when importing
animals need to know the PV for the particular circumstances which pertain to the test and the
disease prevalence in the importing country. If disease prevalence, D-SN and D-SP are known a
formula for PV+ an be derived as follows:
PV+ = pt (pr x D-SN)/pt (pr x D-SN) + pt (1-pr)(1-D-SP) which simplifies to:
= (pr x D-SN)/ (pr x D-SN) + (1-pr)(1-D-SP)
Where pt = the population tested, pr = prevalence, D-SN = diagnostic sensitivity and D-SP =
diagnostic specificity.
A formula derived in a similar manner for PV- is:
PV- = D-SP(1-pr)/D-SP(1-pr) + pr(1-D-SN)
These formulae can be used to calculate any required PV value.
8|Page
Serology: Test performance measurement
Table 1 shows PV+ values for Tests with D-SN = 0.95 and D-SP = 0.99; D-SN = 0.90 and D-SP =
0.95; D-SN = 0.80 and D-SP = 0.80. It can be seen that at high prevalence, PV+ values are high and
the stage at which they begin to fall to unacceptable levels depends on the test sensitivity and
specificity. For example for a high quality test (D-SN = 0.95 and D-SP = 0.99) PV+ only drops below
0.90 (90%) when the prevalence is less than 0.1(10%). For a low quality test (D-SN =0.80 and D-SP =
0.80), PV+ falls below 90% when the prevalence is below 0.60 (60%).
9|Page
Serology: Test performance measurement
Table 1. Comparison of PV+ values at different levels of prevalence for tests with different D-SN and D-SP
values.
Prevalence
D-SN = 0.95
D-SN = 0.90 DSP = 0.95
D-SP = 0.99
10 | P a g e
D-SN = 0.80
D-SP = 0.80
PV+
PV+
PV+
0.90
0.9988
0.9937
0.9863
0.80
0.9970
0.9860
0.9697
0.70
0.9955
0.9763
0.9492
0.60
0.9930
0.9636
0.9231
0.50
0.9896
0.9434
0.8889
0.40
0.9844
0.9216
0.8421
0.30
0.9760
0.8832
0.7742
0.20
0.9596
0.8152
0.6667
0.10
0.9135
0.6623
0.4706
0.05
0.8333
0.4815
0.2963
0.01
0.4897
0.1513
0.0748
0.001
0.0868
0.0174
0.0079
Serology: Test performance measurement
COMPARING TWO TESTS
Two tests can be compared by testing a panel of sera containing both positive and negative sera. The
overall agreement between two tests and the relative (comparative) sensitivity and relative
(comparative) specificity of the tests can be calculated as shown for the data given in Table 1.
Table 1. Data from testing a panel of sera with two serological tests (A and B).
Test A
Test B
Positive
Negative
Total
Positive
225
32
257
Negative
23
1,312
1,335
Total
248
1,344
1,592
The overall agreement is the number of sera tested that have the same result in both tests (both
positive or both negative), divided by the total number of tests:
Overall agreement = (225 + 1,312)/1 592
= 0.965 (96.5%).
For calculating a relative sensitivity or specificity it must be assumed that one test is the gold standard
or reference point. For the purposes of the calculation this test is assumed to have perfect sensitivity
and specificity. If we take test A as the reference then there were 248 positive sera of which test B
only identified 225 as positive therefore the sensitivity of test B relative to test A is:
Relative sensitivity (B to A) = 225/248
= 0.907 (90.7%).
Similarly according to test A there are 1,344 negative sera and the specificity of test B relative to test
A is
Relative specificity (B to A) = 1,312/1344
= 0.976 (97.6%).
In the above example the overall agreement between the tests was reasonably high (0.965) and there
is no evidence that the tests are fundamentally different. Therefore, it could just as well be argued that
test B should be used as the reference test. In this case relative sensitivity and specificity become:
Relative sensitivity (A to B) = 225/257
11 | P a g e
Serology: Test performance measurement
= 0.875 (87.5%)
Relative specificity (A to B) = 1312/1335
= 0.983 (98.3%)
There are no defined rules for interpretation of relative test results. Overall agreement and relative
specificities and sensitivities calculated with both tests taken as reference should be considered when
comparing two tests.
PRECISION
The term precision encompasses both repeatability and reproducibility. Precision describes the ability
to repeatedly perform tests in a manner that consistently gives the same results. The characteristic
can be likened to the performance of a marksman. If the marksman places all his shots in close
proximity to each other his precision is good even if they are not near the bulls-eye. Someone
shooting with good repeatability may score lower than a person who scatters shots around the target
and gets some in the bulls-eye. However, high precision is the preferable characteristic because by
simply adjusting the rifle sights the shooter with high precision can place all shots near the bulls-eye
to get an excellent score.
REPEATABILITY
Within and between run repeatability.
Initially when a test is set up intraplate and interplate
repeatability within a single run should be measured by testing four replicates of each sample on each
of five plates.
By repeating the same tests on each of five days between run repeatability is
established. Standards should be defined for the acceptable variation for the particular test. Generally
a maximum coefficient of variation of 20-30% would be acceptable.
For routine diagnostic testing standards are set for the on-going control of within run repeatability by
measuring the variation between duplicate or triplicate tests on each sample in a batch of tests done
on the same day by the same operator and with the same reagents for all samples. In some
laboratories an acceptable limit is set for test repeatability. Each serum is then tested in duplicate and
if duplicate results vary by more than the specified acceptable limit the serum in question is re-tested.
An acceptable limit may be set at one doubling dilution or a half of a doubling dilution for tests done
on doubling dilutions of sera. A fixed percentage of the test score may be used for tests that give a
quantitative measure of antibody content on a single serum dilution e.g. absorbance measurements in
an ELISA test. Because of the expense and time involved in doing duplicate testing some laboratories
abandon this practice after they have repeatedly shown that they are capable of testing with a high
level of repeatability. In this case a percentage or fixed number of the sera may be tested in duplicate
in each run, thus providing on-going repeatability data.
12 | P a g e
Serology: Test performance measurement
Between run repeatability measures the repeatability of different runs of a test and for routine
testing, is usually measured on the results of tests done on the control sera tested with each batch
tests done. Records should be kept of the results of the positive control serum tests on each plate.
The laboratory should strive to improve the percentage of results falling within the defined allowable
deviation for the control serum. When it is able to regularly get a very high percentage of results
within the defined acceptable deviation limits the laboratory may choose to make the defined
deviation limits more stringent.
In addition to testing control sera, some laboratories randomly choose a percentage or a fixed number
of positive and negative sera from each test run and re-test them in the next batch of testing that is
done or on a once a week basis. This allows a good record of between run repeatability to be built up
to satisfy the requirements for quality assurance.
REPRODUCIBILITY
Reproducibility is the precision of test results obtained in different laboratories on the same samples.
In practice this is usually measured by testing panels of sera in different laboratories in interlaboratory
comparative testing programmes. The results of these tests are usually independently analysed by
the organisation that is responsible for distributing the samples and unacceptable deviations are
discussed and steps taken to correct excessive variations or identified faults.
ACCURACY
Accuracy is the agreement between a test result on a sample of known antibody content or titre and
the expected results. For example a serum may have been assayed against an international
reference serum and its antibody content precisely determined. This serum can then be used as a
control serum in every batch of sera tested and should be tested on every plate of sera tested. The
test is considered accurate if the results obtained on the control serum or sera fall within defined
acceptable limits of deviation. If the control serum result does not fall within these limits on a particular
plate, all sera on that plate should be re-tested. The records kept for the repeatability of testing the
positive control serum will also serve as a record of the accuracy of the testing. More extensive tests
of accuracy can be performed periodically, on panels of sera of known test titres.
VALIDITY AND VALIDATION OF TEST METHODS
Validation of test methods is specified as a requirement for test methods in several frequently used
Quality Standards, but a precise definition of what is meant by validation is difficult to find. In normal
dictionary terms validation of a system means to prove or confirm that the system is valuable or
worthwhile. In the widely used laboratory Standard ISO 17025, Requirements for the Competence of
13 | P a g e
Serology: Test performance measurement
Testing and Calibration Laboratories, it is stated that “The validation shall be as extensive as is
necessary to meet the needs of the given application or field of application. The laboratory shall
record the results obtained, and the procedure for the validation and a statement whether the method
is fit for the intended use”. The laboratory is allowed to specify how validation is done, thus allowing
for flexibility. In the case of serological tests validation procedures will be influenced by the great
differences in diseases, test methods and laboratory capabilities and resources. However, it is further
specified that: “The validation should provide information about the representativeness, repeatability
and reproducibility of the test and calibration method as well as the influence of instrumental, human
and environmental factors on the uncertainty of the results”.
In the chapter on Good Laboratory Practice, Quality Control and Quality Assurance in the OIE Manual
of Standards for Diagnostic Tests and Vaccines it is stated that “Validation further evaluates the test
for its fitness for a given use. Validation establishes performance standards for the method, such as
sensitivity, specificity, and isolation rate, and diagnostic parameters such as positive/negative cut-off,
titre of interest or significance, etc.”
A test is therefore validated when it has been investigated and data collected that defines the
performance capabilities of the test so that the suitability of the test for a particular use may be
judged. To validate a serological test, data should be collected to define the following parameters:
Analytical sensitivity. Comparative A-SN should be defined in relation to other standard
tests and national and international reference sera. Measurements of A-SN must be
periodically reviewed and repeated and quality control measures such as testing of reference
sera must be on-going.
Diagnostic sensitivity.
D-SN should be defined as well as possible with the available
material. When the material is insufficient to meet the OIE standards (see Section 1.2 of this
chapter), the data should be defined as well as possible and continuously upgraded. Where
good measurement of the parameters has been obtained periodic reviews and updating of
data should be undertaken.
Analytical specificity. Measurement of A-SP are not possible and should not form a part of
the validation procedure. However, known causes of non-specific reactions and any available
data on the incidence of non-specific reactions should be documented.
Diagnostic specificity. D-SP should be defined by testing the numbers of sera, from known
negative animals, defined by OIE (see Section 2,2 of this chapter).
Diagnostic cut-off point. The diagnostic threshold points for the test must be established.
Factors affecting the test performance must be defined. In the case of animal diseases this
definition must include definition of the factors relating to the pathogenesis of the disease
such as the incubation period, persistence of the disease after recovery, development of a
carrier state, immune tolerant carriers etc.
14 | P a g e
Serology: Test performance measurement
Repeatability. Repeatability should be measured in terms of the various measurements of
repeatability and reproducibility. However, these measurements are influenced as much by
the performance of the laboratory staff, as the inherent characteristics of the test. For this
reason there must be procedures for the continual measurement of these characteristics and
of staff performance and investigation of any abnormal deviations from accepted standards.
Predictive value measurements are not part of the validation process as they are dependent on the
primary characteristics of sensitivity and specificity and prevalence of disease and are in themselves
meaningless.
Validation data may be collected in a variety of ways including:

Testing reference standards and panels of sera.

Comparison with other test methods.

Accumulated data from quality control procedures.

Interlaboratory comparative tests.

Epidemiological studies in the field, including surveys of disease free and infected herds and

flocks.
Data collected from experimental transmission and challenge studies.
Whatever methods are used must be adequately documented. The data must be analysed,
continuously updated and retained for future reference and auditing. Comprehensive validation of
tests is not always possible particularly with respect to D-SN. Where this is the case the minimum
requirement should be the establishment of traceability to a standard method.
In ISO 17025 it is stated that “ The range and accuracy of the values obtainable from validated
methods (e.g. the uncertainty of the results, detection limit, selectivity of the method, linearity, limit of
repeatability and/or reproducibility, robustness against external influences and/or cross sensitivity
against interference from the matrix of the sample/test object) as assessed for the intended use shall
be relevant to the customer needs” and “The laboratory shall use test and/or calibration methods,
including methods for sampling, which meet the needs of the client and which are appropriate for the
tests and/or calibrations it undertakes; preferably those published as regional or national standards”.
There is no specific requirement for an international or regional test to be validated, but it must be
suitable for the client’s requirements. The assumption is presumably that any test that is a nationally
or internationally recognised standard has already been validated. This is contrary to a widely held
belief that the laboratory using them must validate all tests. However, a laboratory should
demonstrate that it regularly performs the tests accurately, precisely and with appropriate quality
controls that show that the test is performing as specified in the standard. In the case of serological
tests it can be assumed that all tests that OIE has nominated as prescribed or alternative test for
international trade are International standards and therefore do not need to be fully re-validated. The
OIE Manual of Standards for Diagnostic Tests and Vaccines is the appropriate international Standard
for a large part of the work done in serology laboratories particularly those dealing with import and
export testing.
15 | P a g e
Serology: Test performance measurement
In ISO 17025 it is clearly specified that non-standardised methods (meaning methods that are not
defined in appropriate standards) must be validated.
Validation does not necessarily imply suitability for a particular purpose, but should provide
information from which suitability for use can be judged. The emphasis is laid on choosing a test that
suits the client’s needs. Where a test that has maximum specificity even at the expense of sensitivity
is required, a test that has a high specificity can be selected. Non-validated or partly validated tests
can also be used and may in some instances be the tests of choice for a particular purpose. However,
when a non-validated test is used the client should be informed that the test is a non-validated test
and given appropriate information about the test and why it was chosen. Where the client chooses the
test the laboratory is not obligated to offer any information about the test but should still do so if there
is relevant information available.
16 | P a g e
Download