Serology: Test performance measurement Serology: Test Performance Measurement Authors: Dr RW Worthington. Licensed under a Creative Commons Attribution license. TABLE OF CONTENTS Sensitivity .................................................................................................................................................... 2 Analytical sensitivity (A-SN) ..................................................................................................................... 2 Diagnostic sensitivity (D-SN) ................................................................................................................... 2 Specificity .................................................................................................................................................... 3 Analytical specificity (A-SP) ..................................................................................................................... 3 Determination of D-SN and D-SP ............................................................................................................... 4 Specificity and sensitivity requirements for a test .................................................................................. 5 Predictive value (PV) of a test .................................................................................................................... 6 Predictive value of a positive test (PV+) .................................................................................................. 6 Predictive value of a negative test (PV-) .................................................................................................. 7 Comparing two tests ................................................................................................................................. 10 Precision .................................................................................................................................................... 11 Repeatability .............................................................................................................................................. 11 Reproducibility .......................................................................................................................................... 12 Accuracy .................................................................................................................................................... 12 Validity and validation of test methods .................................................................................................. 12 1|Page Serology: Test performance measurement Serological tests are often described by those using them as good, accurate, useful, unreliable, difficult to perform etc. These loose descriptions indicate that there are differences in the accuracy and usefulness of different tests that require accurate definition and quantitation. In this chapter the essential terms used to describe test performance and the measurement of the necessary parameters are described. The primary indicators of test performance are sensitivity and specificity. SENSITIVITY There are two types of sensitivity, analytical sensitivity (A-SN) and diagnostic sensitivity (D-SN). Analytical sensitivity (A-SN) A-SN is a measurement of the amount of antibody a test can detect. It can theoretically be defined in absolute terms such as the test can detect or X g of antibody. However, there are practical difficulties that make absolute measurements complicated or even impossible. Polyclonal sera contain a variety of antibodies of different antibody class, molecular mass, and affinity for the antigen and specificity for particular epitopes on the antigen. The quantities of the various types of antibodies may vary from serum to serum. Therefore, the amount of antibody in terms of absolute weight of antibody that a test can detect will vary from serum to serum. A-SN is more usually used as a comparative measure e.g. test A is three times more sensitive than Test B. In practice running the two tests on a panel of test sera and comparing the results will give a comparison of their relative sensitivity. Alternatively the A-SN can be defined simply by comparing it to a single international or national reference serum. In this case the A-SN for a test would be defined in terms of the minimum number of international units of antibody the test can detect. Comparative A-SN estimations are all that are necessary for the verification of test procedures used for diagnostic testing. Diagnostic sensitivity (D-SN) D-SN is the ability of a test to detect infected animals, in a population of animals. It is the percentage of infected animals that the test can detect in a population of known infected animals. Although this is a simple and useful concept the accurate measurement of D-SN is often difficult. To measure D-SN it is necessary to have a suitable number of sera, from known infected animals. In practice it may be difficult to find suitable sera because it is not permissible to use another serological test as the criterion for identifying positive animals. If this is done no test can exceed the sensitivity of the test used to select the positive samples. To compare the diagnostic sensitivity of two tests a random sample of positive and negative sera from a population of infected animals must be used. In this sample there will be some sera in which antibody can only be detected by one test or the other, so that a comparison of the two tests is possible. However, the true situation with regard to the infectious status of each animal may never be precisely known. 2|Page Serology: Test performance measurement Where it is possible to assemble a panel of sera from proven infected animals a good measure of DSN may be obtained. For example it is possible to test animals by culturing semen samples to identify those infected with Brucella ovis, and all animals from which B. ovis can be isolated are proven infected animals. However, even collecting a panel of sera from semen culture positive animals, will not be ideal because culture of semen is an insensitive test of infection and the panel will not include sera from infected animals that are not excreting the organism in their semen. If transiently infected animals, that are not excreting organisms in their semen, are excluded, the population tested will not represent the true distribution of reactions in infected animals and will be biased as transiently infected animals tend to have low titres. Due to the peculiarities of the pathogenesis and nature of different infectious diseases and the circumstances that pertain to field populations, difficulties in assembling an ideal panel of samples can probably be cited for most diseases. Although many different approaches can be used to collect the best possible collection of sera, the expense and difficulties involved often results in using sample panels that are smaller than ideal. Susceptible animals may also be infected experimentally to create a sample of known infected animals for testing. However, this artificial situation may not yield a panel of sera with a distribution of titres similar to those found in naturally infected populations. Furthermore, the high cost of this type of experiment usually leads to the use of small numbers of animals. The high cost of assembling a reliably verified set of sera, results in many estimates being made on less than ideal numbers of sera. The recommendations of OIE are that there should be 300 sera from known positives for making a preliminary estimate of D-SN and 1,000 for a precise estimate. These numbers are difficult to achieve and estimates made on the basis of non-ideal sample numbers often provide the best available information about a particular test. It is necessary to be aware of the limitations and possible sources of error in published data and make a judgement about how useful the information is. SPECIFICITY There are two types of specificity, analytical specificity (A-SP) and diagnostic specificity (D-SP). Analytical specificity (A-SP) A-SP is a measure of the propensity of the test not to cross react with antibodies formed against antigens closely related to the antigen used in the test. In practice it can only be estimated by measuring the amount of cross reactivity that a test gives with sera from animals that have been infected with diseases which may stimulate the development of cross reactive antibodies. Experiments can be done to measure the comparative A-SP of two tests against a panel of potentially cross-reacting sera. However, causes of non-specific reactions are often not known and suitable sera for testing this property are not readily available and would vary from one situation to another. A rigorous definition of how D-SP should be defined is not available and any such definition would of necessity vary from test to test. Until these difficulties are resolved laboratories are likely to measure 3|Page Serology: Test performance measurement this characteristic only sporadically and in a variety of different ways, only when they have identified a particular problem that causes cross-reactions. Diagnostic specificity (D-SP) D-SP relates to the diagnostic accuracy of negative test results. The D-SP of a test is defined as the proportion (percentage) of sera from, non-infected animals that give negative test results. It is often easier to measure than D-SN, as populations of non-infected animals, from which suitable test samples may be drawn, are available for many diseases. Where diseases are absent from a country or from large areas of a country or from individual properties, suitable sera can be obtained in large numbers from these sources. However, in some cases where a disease is widespread there may be difficulties in obtaining sera from verified non-infected animals. Infectious bovine rhinotracheitis (IBR) is an example of a disease that is endemic and infects a high proportion of animals in many countries, making it difficult to find sera from known uninfected animals. The OIE recommendations for the validation of tests are that 1,000 sera should be used to make a preliminary estimation and 5,000 for a precise estimate. Problems with the specificity of a test may be due to a technical problem with the test, such as a complement fixation test in which the amount of complement or some other reagent has been incorrectly titrated. Correcting the test methodology or technical faults or using more suitable reagents may solve such problems. However, low D-SP may also be due to problems that occur in the animal population from which the samples of negative sera were drawn. Poor diagnostic specificity may be related to the prevalence of non-specifically sensitised animals in the population. For example if there is a high prevalence of Yersinia enterocolitica 09, an organism with a similar antigen on it’s surface to Brucella abortus, there may be a number of animals in a brucellosis free population that react positively to serological tests for Brucella abortus. Measurement of D-SP should alert the user to the possibility of both technical problems with the test method and non-specific sensitisation problems in the animal population. DETERMINATION OF D-SN AND D-SP The most difficult part of determining the D-SN and specificity of a test is the gathering of a suitable panel of sera from known positive and negative animals. These sera should be aliquotted, randomised, numbered with suitable code numbers and given to two or three operators for testing. Each operator should be given duplicates of each serum. Operators should be unaware of the identity or status of any serum aliquots and their results should be decoded and collated by the person who alliquoted and distributed the samples. Any sera for which duplicate results do not meet the standard set for repeatability should be re-tested. The results are presented in a two by two contingency table and the D-SN and D-SP calculated as shown in Table 1. 4|Page Serology: Test performance measurement Table 1. Calculation of D-SN and D-SP from data presented in a two by two contingency table Infected animals Uninfected animals Totals Positive test 945 24 969 Negative test 68 2,983 3,051 1013 3,007 4,020 Totals D-SN = infected animals with a positive test/ total infected animals = 945/1,013 = 0.933 (93.3%) D-SP = uninfected animals with a negative test/ total uninfected animals = 2,983/3,007 = 0.992 (99.2%) SPECIFICITY AND SENSITIVITY REQUIREMENTS FOR A TEST Most tests cannot provide perfect D-SN and D-SP and the two properties are often inversely related. The requirements vary in different situations and for different tests. In some cases maximum sensitivity is required and specificity is of lesser importance in other situations the requirements may be reversed. D-SN may be increased by making adjustments to the test procedure that increase the A-SN or altering the test interpretation criteria by decreasing the cut-off point that divides positive and negative results. However, increasing A-SN should be done with caution because it may not result in a meaningful increase in D-SN and may cause a decrease in the specificity of the test. Decreasing the A-SN of a test may sometimes result in little difference to the D-SN of the test, while increasing specificity and increasing the robustness of the test. The term robust describes the indefinable property that relates to the ease with which a test can be performed giving repeatable and reliable results in a practical working situation. Therefore it is recommended that the A-SN test and the interpretation criteria for a test should not be fiddled with, without carefully measuring the effects this has on the diagnostic performance of the test. Maximum D-SN is usually required where it is important to discover every possible positive animal and the cost of wrongly identifying negative animals as positive is of comparatively minor importance. In a national eradication campaign, such as in a bovine brucellosis eradication programme it is important to identify the maximum number of infected animals as rapidly as possible. If infected 5|Page Serology: Test performance measurement animals are missed this can lead to further spreading of the disease and the costs for further testing far outweigh those associated with the unnecessary culling of a few uninfected reactors. Similarly purchasers of animals wishing to keep their herds or countries free from a disease require a test of maximal D-SN to detect all infected animals being purchased or imported and have little interest in how many animals are rejected because of false positive reactions. For example some countries, having eradicated bovine brucellosis from their countries, will not import any animal having 30 IU of Brucella abortus agglutinating antibody in their serum. An agglutination test with this level of sensitivity will identify increased numbers of non-specific reactors and probably does not meaningfully improve the D-SN. On the other hand most sellers of animals are primarily concerned with specificity, as they do not wish to lose sales or have their herd classified as infected because of non-specific reactions, which they often regard as being caused by testing laboratory incompetence. Serological tests that have D-SN values greater than 95% and D-SP values of greater than 99% are good tests for use in most circumstances. It should be remembered that D-SP and D-SN can be adjusted by altering the interpretation cut-off levels of the test. However, the error rate of the test, which is dependent on both D-SP and D-SN, is more complex to adjust and this problem has been discussed in Chapter 2. An example of a well-validated, high performance test is the complement fixation test for Brucella ovis. However, it is not necessary for a test to meet these high standards to be useful. If one is trying to determine whether a disease is present in a country it is important to have a test of high specificity to avoid making false positive diagnoses that would result in the countries disease free status becoming compromised. Even if the D-SN is low the test results will often be valid if large enough numbers of animals are tested. For example, it can be deduced from suitable statistical tables that to have 99% confidence that a sample of animals will include at least one infected animal if the prevalence of the disease is 0.5 %, a sample of 919 animals from a large population of animals is required. By testing ten times this number there would probably be about 45 infected animals in the sample and a 99% probability of having at least 10. The high specificity test, with a sensitivity as low as 50%, should be easily able to detect positives in this sample. Another approach would be to use a high sensitivity test with a low specificity as a screening test. A definitive test with high specificity, such as culturing for the infectious agents, could then be used to re-test the positives detected by the serological test. PREDICTIVE VALUE (PV) OF A TEST A test predictive value may be given for a positive or a negative test result. Predictive value is sometimes referred to as diagnosability and sometimes wrongly called validity. Predictive value of a positive test (PV+) PV+ is the proportion (percentage) of infected animals that react positively to the test. The PV+ can be calculated from the data in Table 1 as follows: 6|Page Serology: Test performance measurement PV+ = infected positive reactors/ non-infected, test positives + infected, positive reactors = 945/969 = 0.975 (97.5%) For any population of animals this parameter is directly related to the specificity of the test. If the specificity is low, there will be comparatively large numbers of reactors that are not infected leading to a low PV+. PV+ will also vary greatly in different populations of animals depending on the disease prevalence in the population. This can best be demonstrated by means of an example. Consider a test with D-SN = 0.95 (95%) and D-SP = 0.99(99%) used to test 1,000,000 animals in which the disease prevalence is 0.5 (50%). Half the animals tested (500,000) will be infected and multiplying this number by D-SN (0.95) gives 475,000 infected animals with positive tests. Multiplying the 500,000 uninfected animals by (1-(D-SP)) = 0.01 x 500,000 gives 5,000 positive tests. Therefore the total number of positive reactors is 475,000 +5,000 and: PV+ = 475,000/ (475,000+5,000) = 0.99 (99%) If the same test is used on a million animals in which the disease prevalence is 0.01 (1%), there will be 10,000 infected animals of which 10,000 x 0.95 = 9,500 are test positive. There will be 990,000 uninfected animals in which there will be 0.01 x 990,000 = 9,900 false positive tests, giving a total of 9,500 + 9,900 reactors and: PV+ = 9,500/(9,500 + 9,900) = 0.49 (49%) It also follows that if the population of animals tested is free from the disease the PV+ is zero, because every positive test is a false positive. Predictive value of a negative test (PV-) PV- is the proportion (percentage) of the animals that react negatively that are not infected. The PVcan be calculated from the data in Table 1 as follows: PV- = uninfected negative animals/ test negative infected animals + uninfected negative animals PV- = 2,983/3,051 7|Page Serology: Test performance measurement PV- = 0.978(97.8%) PV- is directly related to the D-SN of the test, because if the D-SN is low there will be large numbers of test negative infected animals which will result in a low PV-. PV- is also affected by the disease prevalence, and similar calculations, to those made above for PV+ can be made to demonstrate the relationship. Theoretically when there is 100% prevalence of a disease the PV- will be nil, but this is unlikely to occur in practice. Those interested in calculating the risk of importing a disease from another country when importing animals need to know the PV for the particular circumstances which pertain to the test and the disease prevalence in the importing country. If disease prevalence, D-SN and D-SP are known a formula for PV+ an be derived as follows: PV+ = pt (pr x D-SN)/pt (pr x D-SN) + pt (1-pr)(1-D-SP) which simplifies to: = (pr x D-SN)/ (pr x D-SN) + (1-pr)(1-D-SP) Where pt = the population tested, pr = prevalence, D-SN = diagnostic sensitivity and D-SP = diagnostic specificity. A formula derived in a similar manner for PV- is: PV- = D-SP(1-pr)/D-SP(1-pr) + pr(1-D-SN) These formulae can be used to calculate any required PV value. 8|Page Serology: Test performance measurement Table 1 shows PV+ values for Tests with D-SN = 0.95 and D-SP = 0.99; D-SN = 0.90 and D-SP = 0.95; D-SN = 0.80 and D-SP = 0.80. It can be seen that at high prevalence, PV+ values are high and the stage at which they begin to fall to unacceptable levels depends on the test sensitivity and specificity. For example for a high quality test (D-SN = 0.95 and D-SP = 0.99) PV+ only drops below 0.90 (90%) when the prevalence is less than 0.1(10%). For a low quality test (D-SN =0.80 and D-SP = 0.80), PV+ falls below 90% when the prevalence is below 0.60 (60%). 9|Page Serology: Test performance measurement Table 1. Comparison of PV+ values at different levels of prevalence for tests with different D-SN and D-SP values. Prevalence D-SN = 0.95 D-SN = 0.90 DSP = 0.95 D-SP = 0.99 10 | P a g e D-SN = 0.80 D-SP = 0.80 PV+ PV+ PV+ 0.90 0.9988 0.9937 0.9863 0.80 0.9970 0.9860 0.9697 0.70 0.9955 0.9763 0.9492 0.60 0.9930 0.9636 0.9231 0.50 0.9896 0.9434 0.8889 0.40 0.9844 0.9216 0.8421 0.30 0.9760 0.8832 0.7742 0.20 0.9596 0.8152 0.6667 0.10 0.9135 0.6623 0.4706 0.05 0.8333 0.4815 0.2963 0.01 0.4897 0.1513 0.0748 0.001 0.0868 0.0174 0.0079 Serology: Test performance measurement COMPARING TWO TESTS Two tests can be compared by testing a panel of sera containing both positive and negative sera. The overall agreement between two tests and the relative (comparative) sensitivity and relative (comparative) specificity of the tests can be calculated as shown for the data given in Table 1. Table 1. Data from testing a panel of sera with two serological tests (A and B). Test A Test B Positive Negative Total Positive 225 32 257 Negative 23 1,312 1,335 Total 248 1,344 1,592 The overall agreement is the number of sera tested that have the same result in both tests (both positive or both negative), divided by the total number of tests: Overall agreement = (225 + 1,312)/1 592 = 0.965 (96.5%). For calculating a relative sensitivity or specificity it must be assumed that one test is the gold standard or reference point. For the purposes of the calculation this test is assumed to have perfect sensitivity and specificity. If we take test A as the reference then there were 248 positive sera of which test B only identified 225 as positive therefore the sensitivity of test B relative to test A is: Relative sensitivity (B to A) = 225/248 = 0.907 (90.7%). Similarly according to test A there are 1,344 negative sera and the specificity of test B relative to test A is Relative specificity (B to A) = 1,312/1344 = 0.976 (97.6%). In the above example the overall agreement between the tests was reasonably high (0.965) and there is no evidence that the tests are fundamentally different. Therefore, it could just as well be argued that test B should be used as the reference test. In this case relative sensitivity and specificity become: Relative sensitivity (A to B) = 225/257 11 | P a g e Serology: Test performance measurement = 0.875 (87.5%) Relative specificity (A to B) = 1312/1335 = 0.983 (98.3%) There are no defined rules for interpretation of relative test results. Overall agreement and relative specificities and sensitivities calculated with both tests taken as reference should be considered when comparing two tests. PRECISION The term precision encompasses both repeatability and reproducibility. Precision describes the ability to repeatedly perform tests in a manner that consistently gives the same results. The characteristic can be likened to the performance of a marksman. If the marksman places all his shots in close proximity to each other his precision is good even if they are not near the bulls-eye. Someone shooting with good repeatability may score lower than a person who scatters shots around the target and gets some in the bulls-eye. However, high precision is the preferable characteristic because by simply adjusting the rifle sights the shooter with high precision can place all shots near the bulls-eye to get an excellent score. REPEATABILITY Within and between run repeatability. Initially when a test is set up intraplate and interplate repeatability within a single run should be measured by testing four replicates of each sample on each of five plates. By repeating the same tests on each of five days between run repeatability is established. Standards should be defined for the acceptable variation for the particular test. Generally a maximum coefficient of variation of 20-30% would be acceptable. For routine diagnostic testing standards are set for the on-going control of within run repeatability by measuring the variation between duplicate or triplicate tests on each sample in a batch of tests done on the same day by the same operator and with the same reagents for all samples. In some laboratories an acceptable limit is set for test repeatability. Each serum is then tested in duplicate and if duplicate results vary by more than the specified acceptable limit the serum in question is re-tested. An acceptable limit may be set at one doubling dilution or a half of a doubling dilution for tests done on doubling dilutions of sera. A fixed percentage of the test score may be used for tests that give a quantitative measure of antibody content on a single serum dilution e.g. absorbance measurements in an ELISA test. Because of the expense and time involved in doing duplicate testing some laboratories abandon this practice after they have repeatedly shown that they are capable of testing with a high level of repeatability. In this case a percentage or fixed number of the sera may be tested in duplicate in each run, thus providing on-going repeatability data. 12 | P a g e Serology: Test performance measurement Between run repeatability measures the repeatability of different runs of a test and for routine testing, is usually measured on the results of tests done on the control sera tested with each batch tests done. Records should be kept of the results of the positive control serum tests on each plate. The laboratory should strive to improve the percentage of results falling within the defined allowable deviation for the control serum. When it is able to regularly get a very high percentage of results within the defined acceptable deviation limits the laboratory may choose to make the defined deviation limits more stringent. In addition to testing control sera, some laboratories randomly choose a percentage or a fixed number of positive and negative sera from each test run and re-test them in the next batch of testing that is done or on a once a week basis. This allows a good record of between run repeatability to be built up to satisfy the requirements for quality assurance. REPRODUCIBILITY Reproducibility is the precision of test results obtained in different laboratories on the same samples. In practice this is usually measured by testing panels of sera in different laboratories in interlaboratory comparative testing programmes. The results of these tests are usually independently analysed by the organisation that is responsible for distributing the samples and unacceptable deviations are discussed and steps taken to correct excessive variations or identified faults. ACCURACY Accuracy is the agreement between a test result on a sample of known antibody content or titre and the expected results. For example a serum may have been assayed against an international reference serum and its antibody content precisely determined. This serum can then be used as a control serum in every batch of sera tested and should be tested on every plate of sera tested. The test is considered accurate if the results obtained on the control serum or sera fall within defined acceptable limits of deviation. If the control serum result does not fall within these limits on a particular plate, all sera on that plate should be re-tested. The records kept for the repeatability of testing the positive control serum will also serve as a record of the accuracy of the testing. More extensive tests of accuracy can be performed periodically, on panels of sera of known test titres. VALIDITY AND VALIDATION OF TEST METHODS Validation of test methods is specified as a requirement for test methods in several frequently used Quality Standards, but a precise definition of what is meant by validation is difficult to find. In normal dictionary terms validation of a system means to prove or confirm that the system is valuable or worthwhile. In the widely used laboratory Standard ISO 17025, Requirements for the Competence of 13 | P a g e Serology: Test performance measurement Testing and Calibration Laboratories, it is stated that “The validation shall be as extensive as is necessary to meet the needs of the given application or field of application. The laboratory shall record the results obtained, and the procedure for the validation and a statement whether the method is fit for the intended use”. The laboratory is allowed to specify how validation is done, thus allowing for flexibility. In the case of serological tests validation procedures will be influenced by the great differences in diseases, test methods and laboratory capabilities and resources. However, it is further specified that: “The validation should provide information about the representativeness, repeatability and reproducibility of the test and calibration method as well as the influence of instrumental, human and environmental factors on the uncertainty of the results”. In the chapter on Good Laboratory Practice, Quality Control and Quality Assurance in the OIE Manual of Standards for Diagnostic Tests and Vaccines it is stated that “Validation further evaluates the test for its fitness for a given use. Validation establishes performance standards for the method, such as sensitivity, specificity, and isolation rate, and diagnostic parameters such as positive/negative cut-off, titre of interest or significance, etc.” A test is therefore validated when it has been investigated and data collected that defines the performance capabilities of the test so that the suitability of the test for a particular use may be judged. To validate a serological test, data should be collected to define the following parameters: Analytical sensitivity. Comparative A-SN should be defined in relation to other standard tests and national and international reference sera. Measurements of A-SN must be periodically reviewed and repeated and quality control measures such as testing of reference sera must be on-going. Diagnostic sensitivity. D-SN should be defined as well as possible with the available material. When the material is insufficient to meet the OIE standards (see Section 1.2 of this chapter), the data should be defined as well as possible and continuously upgraded. Where good measurement of the parameters has been obtained periodic reviews and updating of data should be undertaken. Analytical specificity. Measurement of A-SP are not possible and should not form a part of the validation procedure. However, known causes of non-specific reactions and any available data on the incidence of non-specific reactions should be documented. Diagnostic specificity. D-SP should be defined by testing the numbers of sera, from known negative animals, defined by OIE (see Section 2,2 of this chapter). Diagnostic cut-off point. The diagnostic threshold points for the test must be established. Factors affecting the test performance must be defined. In the case of animal diseases this definition must include definition of the factors relating to the pathogenesis of the disease such as the incubation period, persistence of the disease after recovery, development of a carrier state, immune tolerant carriers etc. 14 | P a g e Serology: Test performance measurement Repeatability. Repeatability should be measured in terms of the various measurements of repeatability and reproducibility. However, these measurements are influenced as much by the performance of the laboratory staff, as the inherent characteristics of the test. For this reason there must be procedures for the continual measurement of these characteristics and of staff performance and investigation of any abnormal deviations from accepted standards. Predictive value measurements are not part of the validation process as they are dependent on the primary characteristics of sensitivity and specificity and prevalence of disease and are in themselves meaningless. Validation data may be collected in a variety of ways including: Testing reference standards and panels of sera. Comparison with other test methods. Accumulated data from quality control procedures. Interlaboratory comparative tests. Epidemiological studies in the field, including surveys of disease free and infected herds and flocks. Data collected from experimental transmission and challenge studies. Whatever methods are used must be adequately documented. The data must be analysed, continuously updated and retained for future reference and auditing. Comprehensive validation of tests is not always possible particularly with respect to D-SN. Where this is the case the minimum requirement should be the establishment of traceability to a standard method. In ISO 17025 it is stated that “ The range and accuracy of the values obtainable from validated methods (e.g. the uncertainty of the results, detection limit, selectivity of the method, linearity, limit of repeatability and/or reproducibility, robustness against external influences and/or cross sensitivity against interference from the matrix of the sample/test object) as assessed for the intended use shall be relevant to the customer needs” and “The laboratory shall use test and/or calibration methods, including methods for sampling, which meet the needs of the client and which are appropriate for the tests and/or calibrations it undertakes; preferably those published as regional or national standards”. There is no specific requirement for an international or regional test to be validated, but it must be suitable for the client’s requirements. The assumption is presumably that any test that is a nationally or internationally recognised standard has already been validated. This is contrary to a widely held belief that the laboratory using them must validate all tests. However, a laboratory should demonstrate that it regularly performs the tests accurately, precisely and with appropriate quality controls that show that the test is performing as specified in the standard. In the case of serological tests it can be assumed that all tests that OIE has nominated as prescribed or alternative test for international trade are International standards and therefore do not need to be fully re-validated. The OIE Manual of Standards for Diagnostic Tests and Vaccines is the appropriate international Standard for a large part of the work done in serology laboratories particularly those dealing with import and export testing. 15 | P a g e Serology: Test performance measurement In ISO 17025 it is clearly specified that non-standardised methods (meaning methods that are not defined in appropriate standards) must be validated. Validation does not necessarily imply suitability for a particular purpose, but should provide information from which suitability for use can be judged. The emphasis is laid on choosing a test that suits the client’s needs. Where a test that has maximum specificity even at the expense of sensitivity is required, a test that has a high specificity can be selected. Non-validated or partly validated tests can also be used and may in some instances be the tests of choice for a particular purpose. However, when a non-validated test is used the client should be informed that the test is a non-validated test and given appropriate information about the test and why it was chosen. Where the client chooses the test the laboratory is not obligated to offer any information about the test but should still do so if there is relevant information available. 16 | P a g e