J. Heyse - MCP Conference 2015

advertisement
Use of the False Discovery Rate
for Evaluating
Clinical Safety Data
Joseph F. Heyse
Devan V. Mehrotra
Clinical Biostatistics – Vaccines
Merck Research Laboratories
Blue Bell, PA
Third International Conference on Multiple Comparisons
Bethesda, MD
August 6, 2002
Acknowledgment
This research was in collaboration with
the late Professor John Tukey
(Princeton University).
Heyse/MCP2002 bl 2
Outline
 Motivating example
 Multiplicity issues
 FWER and FDR
 Proposal for flagging AEs
 Summary of three examples
 Concluding remarks
Heyse/MCP2002 bl 3
Introduction
 Evaluation of safety is an important part of clinical
trials of pharmaceutical and biological products.
 Adverse experiences (AEs) can be categorized as
three types
– Tier 1: Associated with specific hypotheses
– Tier 2: Set encountered as part of trial safety
evaluation
– Tier 3: Rare spontaneous reports of serious
events that require clinical evaluation
 Our interest is primarily Tier 2
Heyse/MCP2002 bl 4
ICH Recommendations
 ICH-E9 recommends descriptive statistical methods
supplemented by confidence intervals
 p-values useful to evaluate a specific difference of
interest
 If hypothesis tests are used, statistical adjustments
for multiplicity to quantitate the Type I error are
appropriate, but the Type II error is usually of more
concern
 p-values sometimes useful as a “flagging” device
applied to a large number of safety variables to
highlight differences worthy of further attention
Heyse/MCP2002 bl 5
Illustration
Multiplicity in Safety Assessment
 Clinical trial compared the safety and immunogenicity
of the combination vaccine COMVAX™* to its
monovalent components
 1 of 92 safety comparisons revealed a higher rate of
unusual high-pitched crying (UHPC) following the
second of a three-dose series (6.7% vs. 2.3%,
p=0.016)
 No medical rationale for this finding was discovered
and a larger hypothesis-driven study was designed
 Comparable rates were observed following
vaccination in this larger trial
*COMVAX™
is a combination of HIB and HB vaccine
Heyse/MCP2002 bl 6
Motivating Example
(MMRV* Vaccine)
 Safety and immunogenicity vaccine trial.
 Study population: healthy toddlers, 12-18 months of
age
 Group 1 = MMRV + PedvaxHIB on Day 0
 Group 2 = MMR + PedvaxHIB on Day 0, followed by
(optional) varicella vaccine on Day 42
*MMRV
is a combination measles, mumps, rubella, varicella vaccine
Heyse/MCP2002 bl 7
Motivating Example (cont’d)
 Safety follow-up (local and systemic reactions)
Group 1: Day 0-42 (N=148)
Group 2: Day 0-42 (N=148) and Day 42-84 (N=132)
 Question: Is the safety profile different if the varicella
component is given as part of a combination vaccine on
Day 0 compared with giving it 6 weeks later as a
monovalent vaccine?
 AEs: Group 1 (Day 0-42) vs. Group 2 (Day 42-84)
Heyse/MCP2002 bl 8
Clinical AE Counts (“Tier 2” AEs)
Grp 1
Grp 2
(N1=148) (N2=132)
X1
X2
DIFF (%) p-value
#
BS ADVERSE EXPERIENCE
1
01 ASTHENIA / FATIGUE
57
40
8.2
.1673
2
01 FEVER
34
26
3.3
.5606
3
01 INFECTION, FUNGAL
2
0
1.4
.4998
4
01 INFECTION, VIRAL
3
1
1.3
.6248
5
01 MALAISE
27
20
3.1
.5248
6
03 ANOREXIA
7
2
3.2
.1791
7
03 CANDIDIASIS, ORAL
2
0
1.4
.4998
8
03 CONSTIPATION
2
0
1.4
.4998
9
03 DIARRHEA
24
10
8.6
.0289*
10 03 GASTROENTERITIS, INFECTIOUS
3
1
1.3
.6248
11 03 NAUSEA
2
7
-4.0
.0889
19
19
-1.6
.7295
3
2
0.5
1.0000
12 03 VOMITING
13 05 LYMPHADENOPATHY
Heyse/MCP2002 bl 9
Clinical AE Counts (“Tier 2” AEs) - cont’d
#
BS ADVERSE EXPERIENCE
Grp 1
Grp 2
(N1=148) (N2=132)
X1
X2
DIFF (%) p-value
14 06 DEHYDRATION
0
2
-1.5
.2214
15 08 CRYING
2
0
1.4
.4998
16 08 INSOMNIA
2
2
-0.2
1.0000
75
43
18.1
18 09 BRONCHITIS
4
1
1.9
.3746
19 09 CONGESTION, NASAL
4
2
1.2
.6872
20 09 CONGESTION, RESPIRATORY
1
2
-0.8
.6033
13
8
2.7
.4969
22 09 INFECTION, RESPIRATORY, UPPER 28
20
3.8
.4308
2
1
0.6
1.0000
13
15
3
8
2.7
-0.5
1.3
.4969
1.0000
.6248
17 08 IRRITABILITY
21 09 COUGH
23 09 LARYNGOTRACHEOBRONCHITIS
24 09 PHARYNGITIS
25 09 RHINORRHEA
26 09 SINUSITIS
14
1
.0025*
Heyse/MCP2002 bl 10
Clinical AE Counts (“Tier 2” AEs) - cont’d
#
27
28
29
30
31
32
33
34
35
36
37
38
39
40
BS
09
09
10
10
10
10
10
10
10
10
10
11
11
11
Grp 1
Grp 2
(N1=148) (N2=132)
X1
X2
ADVERSE EXPERIENCE
DIFF (%) p-value
2
1
TONSILLITIS
0.6 1.0000
3
1
WHEEZING
1.3
.6248
4
0
BITE/STING, NON-VENOMOUS
2.7
.1248
2
0
ECZEMA
1.4
.4998
2
1
PRURITUS
0.6 1.0000
13
3
RASH
6.5
.0209*
6
2
RASH, DIAPER
2.5
.2885
1
RASH, MEASLES/RUBELLA-LIKE 8
4.6
.0388*
4
2
RASH, VARICELLA-LIKE
1.2
.6872
0
2
URTICARIA
-1.5
.2214
1
2
VIRAL EXANTHEMA
-0.8
.6033
0
2
CONJUNCTIVITIS
-1.5
.2214
18
14
OTITIS MEDIA
1.6
.7109
2
1
OTORRHEA
0.6 1.0000
Heyse/MCP2002 bl 11
Multiplicity Issues - The Problem
 Potential for too many false positive safety findings if
the multiplicity problem is ignored (for “Tier 2” AEs).
 This can muddy the interpretation of the safety
profile of the vaccine/drug.
Heyse/MCP2002 bl 12
Multiplicity Issues - The Challenge
To develop a procedure for tackling multiplicity that:
 Provides a proper balance between “no adjustment”
and “too much adjustment”.
 Is easy to automate/implement.
Heyse/MCP2002 bl 13
Familywise Error Rate (FWER)
 Let F = {H1,H2 … Hm} denote a family of m hypotheses.
 FWER = Pr(any true Hi  F is rejected).
 We usually seek methods for which FWER  a.
 Benjamini & Hochberg (1995) argue that, in certain
settings, requiring control of the FWER is often too
conservative. They suggest controlling the “false
discovery rate” instead, as a more powerful alternative.
Benjamini , Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful
approach to multiple testing. Journal of the Royal Statistical Society, B, 57, 289-300.
Heyse/MCP2002 bl 14
False Discovery Rate (FDR)
(Benjamini & Hochberg)
Declared
Insignificant
Declared
Significant
Total
# of true Hi
U
V
m0
# of false Hi
T
S
m  m0
Total
mR
R
m
V

FDR  E    expected proportion of rejected null
R 
hypotheses which are incorrectly rejected. Define 0 0 as 0.
Heyse/MCP2002 bl 15
False Discovery Rate (FDR) (cont’d)
(Benjamini & Hochberg)
Re ject H1 , H2 , , H j if p j  
j
a
m
m0
{This controls FDR at  m a }
Adjusted p - values : ~
pm   pm 
m


~
p j   min~
p j1 , p  j  , j  m  1
j


Example
Unadjusted p-values
.0193
.0280
.2038
.4941
FDR-adjusted p-values
.0560
.0560
.2718
.4941
 FDR  FWER {equality holds if m = m0}.
 Effect of correlations on FDR is an area of research.
Heyse/MCP2002 bl 16
Proposal for Flagging AEs
 We routinely summarize AEs by body system (BS).
s body systems (i = 1, 2, …, s)
ki AEs associated with body system i
pij = between-group p-value for the jth AE within
ith BS (e.g., based on two-tailed Fisher’s
exact test.)
Heyse/MCP2002 bl 17
Proposal for Flagging AEs (cont’d)
 Step 1
Ignore AEs for which the total incidence is so low
that a rejection even at the unadjusted 0.05 level
is impossible.
 Step 2
Among the remaining AEs, flag those for which
the p-value achieves statistical significance after
adjusting for multiplicity using a “Double FDR”
approach.
Heyse/MCP2002 bl 18
Double FDR Approach


 Define p*i  min pi1, pi2,  pik .
i
This represents the
strongest safety “signal” for body system i.
 1st level FDR adjustment

– Apply FDR adjustment to p1* , p*2 ,  , p*s
– Let ~
pi*  FDR - adjusted p*i

 2nd level FDR adjustment
– Within body system i, apply FDR adjustment to
pi1, pi2,  piki , 1  i  s
p  FDR - adjusted p
– Let ~
ij
ij
Heyse/MCP2002 bl 19
Double FDR Approach (cont’d)
Proposed Flagging Rule
*
~
~
p

a
and
pij  a2
Flag AE(i,j) if i
1
 What values of a1 and a2 should we use?
Heyse/MCP2002 bl 20
Choosing a1 and a2
 Set a2 = a and use either (a) or (b) below for a1.
(a) Using resampling (non-parametric bootstrap) to
determine the largest data-dependent a1 ( a2) that
ensures FDR  a.
OR
(b) Choose a1 ( a2) independent of the data. For
a
example, let a1  a 2  or 2 , and estimate the
2 

resulting FDR using resampling.
Heyse/MCP2002 bl 21
Resampling Procedure
 Purpose
– To estimate the false discovery rates of the following:
NOADJ
No multiplicity adjustment;
flag AE if unadjusted p < .05
FULLFDR(a)
Full FDR adjustment
(ignore BS grouping)
DFDR(a1, a2)
Double FDR adjustment for
selected (a1, a2)
– To determine the largest a1( a2) that guarantees
FDR  a when using DFDR(a1, a2).
Heyse/MCP2002 bl 22
Resampling Procedure (cont’d)
 Details
1.POOL data from both treatment groups into a
common population. Sample with replacement from
this common population, to simulate many repetitions
of the original trial.
This procedure:
a) simulates a true null situation (Group 1 = Group 2).
b) preserves the correlation structure of original data.
2.Implement our proposal for flagging AEs using the
NOADJ, FULLFDR(a), and DFDR(a1, a2) approaches,
and calculate the corresponding FDRs.
Heyse/MCP2002 bl 23
MMRV Example - Resampling Results
Y = # of incorrectly flagged AEs*
Distribution of Y (%)
Method
0
1
2
3
FDR (%)
NOADJ
48.8
33.0
12.9
5.3
51.2
FULLFDR(.10)
95.2
4.0
0.6
0.2
4.8
DFDR(.02, .05)
97.0
2.5
0.4
0.1
3.0
DFDR(.05, .05)
91.2
7.3
1.1
0.4
8.8
DFDR(.05, .10)
90.9
6.4
1.9
0.8
9.1
DFDR(.10, .10)
79.8
13.0
5.2
2.0
20.2
* out of 40; 2000 simulations
Heyse/MCP2002 bl 24
MMRV Example - Resampling Results
DFDR(a1, a2): Estimated FDR (%)
a2
a1
0.05
1.45
3.00
4.70
7.10
8.80
0.10
1.45
3.00
4.70
7.15
9.15
11.70
13.65
16.35
18.85
20.25
0.15
0.01
1.45
0.02
3.00
0.03
4.70
0.04
7.15
0.05
9.15
0.06
11.70
0.07
13.70
0.08
16.50
0.09
19.25
0.10
21.30
0.11
24.25
0.12
25.60
0.13
27.75
0.14
29.90
0.15
31.25
5%
10%
15%
Max. Acceptable FDR (a)
(aa2 = a) (.03,.05) (.05,.10) (.07,.15)
Heyse/MCP2002 bl 25
First Level FDR Adjustment
Number
FDR
Unadjusted Adjusted
of AE
Body System ID
p-value
Types
p-value
Nervous system
0.0025
3
0.0200
Skin
0.0209
9
0.0771
Digestive system
0.0289
7
0.0771
Body site unspecified
0.1673
5
0.2952
Special senses
0.2214
3
0.2952
Metabolic / immune
0.2214
1
0.2952
Respiratory
0.3746
11
0.4281
Hematologic and lymphatic
1.0000
1
1.0000
Heyse/MCP2002 bl 26
Second Level FDR Adjustment
Body System 08: Nervous System and Psychiatric
Unadjusted
p-value
FDR Adjusted
p-value
Irritability
0.0025
0.0075
Crying
0.4998
0.7497
Insomnia
1.0000
1.0000
Adverse Experience
Heyse/MCP2002 bl 27
Summary of Three Examples
Flagged AEs
#
DFDR Adjustment, maximum FDR (%):
Trial
of No Multiplicity
15%
10%
5%
(# of subs.) AEs Adjustment
FDR ~ 43%
a1=.07,a=.15 a1=.05,a=.10 a1=.02,a=.05

PedvaxHIB
15
Irritability
Irritability
(N=681)
Upper Resp. Inf. Upper Resp. Inf.
Rash
FDR ~ 51%
a1=.07,a=.15 a1=.05,a=.10 aa
Irritability
Irritability
Irritability
Irritability
MMRV
40
Rash
Rash
(N=280)
M/R-like rash
M/R-like rash
Diarrhea
Diarrhea
FDR ~ 87%
aa
aa aa
COMVAX
Erythema
58
Rash
(N=811)
Rhinorrhea
Heyse/MCP2002 bl 28
Concluding Remarks
 Current approach of flagging AEs based on
unadjusted p-values (or C.I.s) can result in
excessive false positive safety findings. These
can cause undue concern for approval/labeling, and
can affect post-marketing commitments.
 Under our proposal, the unadjusted p-values (or
C.I.s) would still be reported. The Double FDR
multiplicity adjustment is a method to facilitate the
interpretation of the unadjusted p-values.
Heyse/MCP2002 bl 29
Concluding Remarks (cont’d)
 Our proposal for tackling multiplicity will:
– substantially reduce the percentage of incorrectly
flagged AEs.
– be better accepted if described a priori in the
protocol/DAP rather than on a post-hoc basis.
– facilitate comparable interpretation of safety
results across studies, with respect to Type I error.
Heyse/MCP2002 bl 30
Download