A Statistical Approach to Method Validation and Out of Specification Data Outline of talk • Basic statistics – Averaging, confidence intervals • Fitness-for-purpose and analytical capability. • Quantifying variability and producing a capable method. • Out-of-specification results. • Conclusions. Repeat measurements 994.765 996.8626 1000.665 1017.53 981.7084 998.3029 1003.802 998.3409 1002.779 1007.732 1008.048 1008.842 995.1794 1004.904 1002.433 1013.802 1008.136 998.0636 1004.67 1006.48 992.7641 988.0834 1002.151 1011.441 1005.991 993.7479 996.3199 997.8086 1005.854 997.1728 999.4718 1004.641 1002.325 996.136 1000.387 12 10 Occurances 1005.081 8 6 4 2 0 980 990 1000 Value 1010 1020 Distribution of measurements 0.0600 average probability 0.0500 standard deviation 0.0400 0.0300 0.0200 95% confidence interval 2.5% 0.0100 0.0000 960 2.5% 970 980 990 1000 1010 1020 1030 1040 measurement The 95% confidence interval is the range of values around the mean in which 95% of the measurements are expected to lie. Relative standard deviation, RSD s RSD(%) 100 x For a strength of ~100%, a 0.7% RSD equates to a standard deviation of ~0.7%. This means that the range of values encompassing 99% of all possible measures is approximately +/- 2.1%. 0.7% RSD at 100% strength has a 99% confidence interval of 97.9% to 102.1%. • The standard deviation is a measure of variability. • The effect of variability can be reduced by taking the average of a number of repeat measures. • The standard deviation associated with the mean of n measures is: s sx n standard error of mean Effect of averaging 1.2 1 0.8 0.6 0.4 0.2 0 0 1 2 3 4 5 6 7 8 n Distribution of the mean 0.120 Probability 0.100 n=4 n=3 n=2 n=1 0.080 0.060 0.040 0.020 0.000 970 980 990 1000 1010 1020 1030 true value The confidence in the mean improves as the number of measurements increases. How many measurements should I average? • Depends upon: – The amount of variability present in the measurements. – The degree of confidence I wish to achieve. WHAT IS FITNESS FOR PURPOSE? Capability of an analytical method Incapable method Capable method 0.250 0.900 lower spec. limit 0.800 upper spec. limit 0.200 lower spec. limit upper spec. limit 0.700 0.150 probability probability 0.600 0.100 0.500 0.400 0.300 0.200 0.050 0.100 0.000 0.000 90 95 100 Concentration 105 110 90 95 100 Concentration 105 110 How to measure capability? Use measures from statistical process control 0.250 USL LSL cp C.I . 103 97 cp 12 0.5 upper spec. limit 0.200 probability e.g., specification between 97 mg/l and 103 mg/l, width of confidence interval of 12mg/l: lower spec. limit 0.150 0.100 Conf. Interval 0.050 0.000 90 95 100 Concentration 105 110 Interpreting cp Batch failure rate purely due to variability in analytical method. Bx failure rate due to analysis / % 50 40 30 20 10 0 0.2 0.4 0.6 0.8 1 1.2 cp 1.4 1.6 1.8 2 One-sided specifications c p ,l x LSL 1 C.I . 2 0.250 lower spec. limit expected value 0.200 probability c p ,u USL x 1 C.I . 2 0.150 0.100 Half Conf. Interval 0.050 Where x is the expected average value of the parameter. 0.000 90 95 100 strength 105 110 Method development/validation • To determine the number of repeat measurements to ensure that the analytical capability is acceptable, for example >1. • Acceptance criteria are then product dependent, rather than technique specific. • How do I determine the amount of variability? • How do I determine the number of repeat measurements required? Quantifying variability (e.g. HPLC) • Need to assess two sources of variability (repeatability): Experimental Design – Between “weighings” – Instrumental. • Between weighings quantifies variability due to sample inhomogeneity and the sample preparation process. • Instrumental quantifies the variability associated with the instrumental measurement. Sample weighings measures Quantify a source of variability by determining its standard deviation. Example Weighing 1 2 3 4 5 6 1 975.20 928.77 992.30 1047.96 1036.10 1109.29 2 971.41 934.27 1035.73 1069.98 1064.50 1074.81 Can use Analysis of Variance (ANOVA) to determine: Standard deviation for “weighing”, sw = 57.9 Standard deviation for instrument, s = 19.2 These values refer to the measured response (e.g. weightcorrected area) Confidence interval for analysis Confidence interval for future number of weighings (n1) and measurements per weighing (n) is given by: t / 2, N 1 2 s 2 sw n1 n : degree of confidence (usually 0.05 for 95% confidence) N: number of degrees of freedom to determine sw and s. t: Students t-value for and N. : confidence interval for measurement (area) Analytical Capability Number of weighings per weighing Number of measures 1 2 3 4 5 1 0.574 0.812 0.994 1.148 1.283 2 0.589 0.833 1.020 1.177 1.316 3 0.594 0.840 1.029 1.188 1.328 4 0.596 0.844 1.033 1.193 1.334 5 0.598 0.846 1.036 1.196 1.337 The analytical capability, cp, changes with n1 and n. External Standards x ˆ S Ss xs S s : strength of external standard xs : average measure for external standard x : average measure for sample Sˆ : estimated strength for sample If is the confidence interval for x and xs , then the confidence interval for Sˆ 2 , i.e. if x has an RSD of 0.7%, the RSD for the estimated strength is ~1.0%. Practical consequences: finding result Out-of-Specification Probability of OOS result Consumers risk 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Lower spec: 97% RSD of injection: 0.7% No weighing variability measures 1 2 3 4 5 Producers risk 95 96 97 98 True strength 99 100 Dealing with OOS results • Can re-test samples. • On re-testing, FDA guidelines for industry state “if no …errors are identified in the first test, there is no scientific basis for invalidating OOS results in favour of passing re-test results.” • Scientifically, the issue of whether the re-test results “pass” or “fail” is of little consequence. The issue is whether the re-test results are statistically the same as the original OOS result. • Can use the t-test to assess the similarity between OOS and retest. Example 1 LSL 0.60 0.50 Probability of • Specification >97.0% • OOS result 96.5% with confidence interval +/- 2.1%. • Re-test 97.7% with confidence interval +/- 2.1%. • No evidence that the OOS and re-test are different from t-test. • Average the OOS and re-test gives 97.1% with confidence interval +/- 1.5%. OOS Re-test 0.40 0.30 0.20 0.10 0.00 90 92 94 96 Strength 98 100 Example 2 LSL 1.00 0.90 0.80 Probability • Specification >97.0% • OOS 96.0% with confidence interval +/- 0.9%. • Re-test 98.0% with confidence interval +/- 0.9%. • No evidence that the OOS and re-test are the same. • Cannot average the OOS and re-test result. • Consequently must doubt both results. OOS Re-test 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 90 92 94 96 Strength 98 100 Conclusions • Understanding and determining the confidence interval associated with an analytical result is an important part of method development/validation. • The relationship between the confidence interval and the product specification is an important aspect of defining method fitness-for-purpose. • The analytical capability is quantifiable measure of fitness-for-purpose for precision. • Understanding the confidence interval is important during out-of-specification investigations.