University of Pennsylvania Annual Conference on Statistical Issues in Clinical Trials April 13, 2011 Efficient source data verification in randomized trials Marc Buyse IDDI, Louvain-la-Neuve and I-BioStat, Hasselt University, Belgium Outline 1. 2. 3. 4. Trials as a cost-effective, sustainable activity Scientific vs. regulatory requirements The continuum from errors to fraud Monitoring strategies – – – Extensive monitoring Reduced monitoring Targeted monitoring 5. The SMART project 6. Conclusions 2 Potential reductions in clinical trial costs Assumptions: • Treatment of chronic disease • 20,000 patients • 1,000 sites • 48 months enrollment (24) + follow-up (24) • 24 visits per site (every other month) • 60-page CRF • 10,000 $ per patient site Total budget in millions of $: • Coordinating Center • Site payments • Other costs: travel, meetings, etc Ref: Eisenstein et al, Clinical Trials 2008;5:75. 421 $ 170 $ (40%) 200 $ (48%) 51 $ (12%) 3 Potential reductions in clinical trial costs • • • • • • Ref: Eisenstein et al, Clinical Trials 2008;5:75. 4 mths planning 24 mths accrual 1,000 sites 24 site visits 60-page CRF 10,000 $ per site 4 Potential reductions in clinical trial costs Ref: Eisenstein et al, Clinical Trials 2008;5:75. • • • • • • 4 mths planning 24 mths accrual 1,000 sites 24 site visits 60-page CRF 10,000 $ per site • • • • • • 4 mths planning 18 mths accrual 750 sites 4 site visits 20-page CRF + EDC 5,000 $ per site 5 Potential reductions in clinical trial costs Ref: Eisenstein et al, Clinical Trials 2008;5:75. • • • • • • 4 mths planning 24 mths accrual 1,000 sites 24 site visits 60-page CRF 10,000 $ per site • • • • • • 4 mths planning 18 mths accrual 750 sites 4 site visits 20-page CRF + EDC 5,000 $ per site • • • • • • 4 mths planning 18 mths accrual 100 sites no site visits 5-page CRF + EDC 650 $ per site 6 7 Scientific vs. regulatory requirements for a clinical trial From a scientific point of view, a trial must estimate the effect of a treatment without bias. Randomized trials enable such unbiased inference even in the presence of massive random errors which only cause conservatism (in tests for superiority). From a regulatory point of view, a trial must provide verifiable evidence that it was carried out according to specifications. Absence of errors must be demonstrated regardless of their consequences. 8 The continuum from errors to fraud Type Typical examples Intent Errors Poorly calibrated equipment Wholly unintentional Sloppiness Data missing or Limited awareness incorrectly copied from source documents Fraud Data fabricated to avoid missing data or create patients Treatment Data fabricated or -related falsified to favor fraud treatment Deliberate Definite « intention to cheat » 9 The continuum from errors to fraud Type Typical examples Impact Errors Poorly calibrated equipment Potential (small) loss in power / no bias Sloppiness Data missing or Potential (small) loss incorrectly copied from in power / no bias source documents Fraud Data fabricated to avoid missing data or create patients Treatment Data fabricated or -related falsified to favor fraud treatment Unknown effect on power / no bias Definite bias 10 The continuum from errors to fraud Type Typical examples Ease of detection Errors Poorly calibrated equipment Difficult to detect Sloppiness Data missing or May be hard to incorrectly copied from detect source documents Fraud Data fabricated to avoid missing data or create patients Treatment Data fabricated or -related falsified to favor fraud treatment Detectable through center comparisons Detectable through treatment by center comparisons 11 12 Monitoring strategies Extensive monitoring • 100% SDV for primary and key secondary outcomes Reduced monitoring • Random sampling of centers / patients / outcomes to ensure rate of errors < x% • Risk-adapted monitoring Targeted monitoring • Monitoring based on Key Risk Indicators • Statistical Monitoring 13 Monitoring strategies Extensive monitoring • 100% SDV for primary and key secondary outcomes Reduced monitoring • Random sampling of centers / patients / outcomes to ensure rate of errors < x% • Risk-adapted monitoring Targeted monitoring • Monitoring based on Key Risk Indicators • Statistical Monitoring 14 Extensive monitoring ”(...) trial management procedures ensuring validity and reliability of the results are vastly more important than absence of clerical errors. Yet, it is clerical inconsistencies referred to as ’errors’ that are chased by the growing GCP-departments.” Refs: Lörstad, ISCB-27, Geneva, August 28-31, 2006 15 « Monitoring confirms consistency between data collection forms and source documents; if the source documents are wrong because of laboratory, clinical, or clerical errors, then monitoring adds expense without benefit. A common misinterpretation of sponsors is that GCP requires audits of 100% of data; by contrast, random audits might suffice. » 16 Ref: Glickman et al, NEJM 2009;360:816. 17 Monitoring strategies Extensive monitoring • 100% SDV for primary and key secondary outcomes Reduced monitoring • Random sampling of centers / patients / outcomes to ensure rate of errors < x% • Risk-adapted monitoring Targeted monitoring • Monitoring based on Key Risk Indicators • Statistical Monitoring 18 Reduced monitoring Countries Countries Countries Countries Countries Countries Countries Centers Random sampling Countries Countries Countries Patients Countries Countries Countries Items Countries Countries Countries Visits 19 Risk A – Negligible risk (non invasive procedures) Risk B – Risk similar to that of usual care (trials involving approved drugs) Risk C – High risk (phase III trials of new agents, new indications or at risk populations) Risk D – Very high risk (phase I or II trials of new agents) 20 OPTIMON: OPTimisation of MONitoring for clinical research studies Centers accruing > 5 patients in several trials Trials stratified by risk group: - A - B - C Control (“pharma” standards) Experimental (less visits / checks) Goal: non-inferiority of the proportion of patients with at least one severe error in informed consent, suspected unexpected serious adverse events reports, major eligibility criteria, or primary endpoint (expected: 95% with non-inferiority margin of 5%). Source: Geneviève Chêne, University Teaching Hospital Bordeaux, France https://ssl2.isped.u-bordeaux2.fr/optimon/Documents.aspx 21 Monitoring strategies Extensive monitoring • 100% SDV for primary and key secondary outcomes Reduced monitoring • Random sampling of centers / patients / outcomes to ensure rate of errors < x% • Risk-adapted monitoring Targeted monitoring • Monitoring based on Key Risk Indicators • Statistical Monitoring 22 Targeted monitoring based on Key Risk Indicators Countries Countries Countries Countries Data Management Countries Countries Countries Centers Monitoring Team Countries Countries Countries Items Countries Countries Countries Patients Countries Countries Countries Visits 23 Examples of “Key Risk Indicators” Study conduct • Actual accrual vs. target • % pts with protocol violations • % dropouts • … 24 Examples of “Key Risk Indicators” Study conduct • Actual accrual vs. target • % pts with protocol violations • % dropouts • … Treatment compliance • % dose reductions • % dose delays • Reasons for Rx stops • … 25 Examples of “Key Risk Indicators” Study conduct • Actual accrual vs. target • % pts with protocol violations • % dropouts • … Safety • AE rate • AE grade 3/4 rate • SAE rate • … Treatment compliance • % dose reductions • % dose delays • Reasons for Rx stops • … 26 Examples of “Key Risk Indicators” Study conduct • Actual accrual vs. target • % pts with protocol violations • % dropouts • … Safety • AE rate • AE grade 3/4 rate • SAE rate • … Treatment compliance • % dose reductions • % dose delays • Reasons for Rx stops • … Data management • Overdue forms • Query rate • Query resolution time • … 27 Targeted monitoring – based on statistical monitoring Countries Countries Countries Countries SMART Data Management Countries Countries Countries Centers Monitoring Team Countries Countries Countries Items Countries Countries Countries Patients Countries Countries Countries Visits 28 Targeted monitoring (…) Ref: Baigent et al, Clinical Trials 2008;5:49. 29 Principles behind statistical checks • Plausible data are hard to fabricate check plausibility (e.g. mean, variance, correlation structure, outliers, inliers, dates, etc.) • Humans are poor random number generators check randomness (e.g. Benford’s law for first digit, digit preference, etc.) • Clinical trial data are highly structured check comparability (e.g. between centers, treatment arms, etc.) 30 Ref: Encyclopaedic Companion to Medical Statistics (Everitt B, Palmer C, Eds..) Arnold Publishers Ltd, London, 2010. 31 SMART* A software that systematically performs a large battery of statistical tests on the values of all variables collected in a clinical trial. These tests generate a large number of pvalues, ranks and other statistics that are kept in a database for checks of randomness, plausibility and comparability. * Statistical Monitoring Applied to Randomized Trials 32 Brute force approach • In multicentric trials, the distribution of all variables can be compared between each center and all other centers • These tests can be applied automatically, without regard to meaning or plausibility • They yield very large number of centerspecific statistics • Meta-statistics can be applied to these statistics to identify outlying centers 33 An example • Trial in depression • Two stages: – an open-label run-in treatment stage – a double-blind randomized treatment stage • 800 patients from 70 centers 34 Exemplary findings: heart rate/blood pressure • To be taken at each visit, in two positions (supine/standing) • Variability suspiciously low for several centers • “Strange” patient: VISIT 1 1 POS 1 2 HR 72 70 SYSBP DIABP 115 75 110 70 2 2 1 2 72 70 115 110 75 70 3 3 1 2 70 70 110 110 75 70 4 4 1 2 72 70 110 105 75 70 5 5 ... 1 2 ... 74 72 ... 115 110 ... 75 70 ... 35 Exemplary findings: heart rate/blood pressure • To be taken at each visit, in two positions (supine/standing) • Variability suspiciously low for several centers • “Strange” patient: VISIT 1 1 POS 1 2 HR 72 70 SYSBP DIABP 115 75 110 70 2 2 1 2 72 70 115 110 75 70 3 3 1 2 70 70 110 110 75 70 4 4 1 2 72 70 110 105 75 70 5 5 ... 1 2 ... 74 72 ... 115 110 ... 75 70 ... Is it worth asking for inessential, tedious measurements? 36 Exemplary findings: baseline MADRS score • MADRS score (the sum of results on 10 questions) < 12 needed to enter the randomized stage • Half of the patients were expected to have a score < 12 after the run-in period 37 Exemplary findings: baseline MADRS score • MADRS score (the sum of results on 10 questions) < 12 needed to enter the randomized stage • Half of the patients were expected to have a score < 12 after the run-in period • In reality, 67% had a score < 12 38 Exemplary findings: baseline MADRS score • MADRS score (the sum of results on 10 questions) < 12 needed to enter the randomized stage • Half of the patients were expected to have a score < 12 after the run-in period • In reality, 67% had a score < 12 “Strange” centers : – Center A: 5 8 5 4 7 8 9 4 6 5 7 5 4 3 – Center B: 11 11 11 11 10 11 11 11 11 39 Conclusions • Current clinical research practices (such as intensive monitoring and 100% source data verification) are not useful, effective, or sustainable • A statistical approach to quality assurance could yield huge cost savings and yet increase the reliability of the trial results • Regulatory requirements should evolve accordingly 40