power_MScTransMed_for2014 - KEATS

advertisement

Statistical Power and Sample

Size Calculations

Drug Development Statistics & Data Management

July 2014

Cathryn Lewis

Professor of Genetic Epidemiology & Statistics

Department of Medical & Molecular Genetics

King’s College London

With thanks to Irene Rebollo Mesa and Frühling Rijsdijk

Outline

1.

Concepts of power

2.

Power and types of error

3.

Software to calculate power

4.

Power for continuous outcome

5.

Power for proportion, success/failure

6.

Quiz!

Power and Sample size 2

Planning a Study

Question : What are the study endpoints?

Types of Endpoints:

• Binary clinical outcome: Death from disease.

• Quantitative : Creatinine, cholesterol levels, QOL.

• Time to Event: Time to graft failure, time to death, time to recovery

Good Qualities:

Clinically meaningful

Practical and feasible to measure

Occur frequently enough throughout the duration of the trial

Power and Sample size 3

Planning a Study

Question : What is the expected prevalence of outcome

(discrete) or variability of the outcome (continuous)?

• Based on previous studies, pilot study or hospital/NHS report.

• Variability and prevalence are vital for power.

• Both are best at intermediate levels.

Question:What is the expected difference between groups

• in proportion of events (if discrete), or

• in mean measure (if continuous)

• Based on previous studies or pilot study

• Alternatively, minimum difference clinically relevant

• The larger the difference the higher the power

Power and Sample size 4

Design: What is your Hypothesis

1.Superiority

Objective  To determine whether there is evidence of statistical difference in the comparison of interest between two Tx regimes:

A: Tx of Interest B : Placebo or

Active control Tx

H

0

: The two Txs have equal effect with respect to the mean response 

A

 

B

H

1

: The two Txs are different with respect to the mean response

A

 

B





Power and Sample size 5

Power and Sample size

Statistical Power

6

Power

• Definition: The expected proportion of samples in which we decide correctly against the null hypothesis

• It depends on:

1.

Size of the (treatment) effect in the population ( d

)

2.

The significance level at which we reject the null (0.05)

3.

Sample size (N)

4.

Design of the study: parallel or crossover etc.

5.

Endpoint measurement (categorical, ordinal, continuous)

6.

The expected dropout rate

Power and Sample size 7

Power primer

• We summarise results of a trial in a statistical analysis with a test statistic (e.g. chi-squared, Z score)

• Provide a measure of support for a certain hypothesis

• Pre-determine threshold on test statistic to reject null hypothesis

NO

YES

Test statistic

YES OR NO decision-making : significance testing

Inevitably leads to two types of mistake : false positive ( YES instead of NO ) (Type I) false negative ( NO instead of YES ) (Type II)

Power and Sample size 8

Standard Case

Sampling distribution if

H

0 were true alpha 0.05

Sampling distribution if

H

A were true

POWER:

1 -

T

Power and Sample size 9

Rejection of H

0

Non-rejection of H

0

H

0 true

Signifcance level

Type I error = α

H

A true

Power

1-type II error = 1β

Type II error = β

Power and Sample size 10

Hypothesis testing

• Null hypothesis : no effect

• A ‘ significant ’ result means that we can reject the null hypothesis

• A ‘ non-significant ’ result means that we cannot reject the null hypothesis

Power and Sample size 11

Statistical significance

• The ‘ p-value ’

• The probability of a false positive error if the null were in fact true

• Typically, we are willing to incorrectly reject the null

5% or 1% of the time ( Type I error )

Power and Sample size 12

Rejection of H

0

Non-rejection of H

0

H

0 true

Signifcance level

Type I error = α

H

A true

Power

1-type II error = 1β

Type II error = β

Power and Sample size 13

H

0 true

Rejection of H

0

Type I error at rate

Non-rejection of H

0

Nonsignificant result

(1-

)

H

A true

Significant result

(1-

)

Type II error at rate

Power and Sample size 14

Standard Case

Sampling distribution if

H

0 were true alpha 0.05

Sampling distribution if

H

A were true

POWER:

1 -

T

Power and Sample size 15

Increased effect size

Sampling distribution if

H

0 were true alpha 0.05

Sampling distribution if

H

A were true

POWER:

1 -

 ↑

 

T

Power and Sample size 16

More conservative

α

Sampling distribution if

H

0 were true alpha 0.01

 

POWER:

1 -

 ↓

Sampling distribution if

H

A were true

T

Power and Sample size 17

Less conservative

α

Sampling distribution if

H

0 were true alpha 0.1

 

POWER:

1 -

 ↑

Sampling distribution if

H

A were true

Power and Sample size 18

Reduced variation

Sampling distribution if

H

0 were true alpha 0.05

Sampling distribution if

H

A were true

POWER:

1 -

 ↑

 

T

Power and Sample size 19

Determining Sample Size

We need:

– Acceptable type I error rate ( 

),

• usually 0.05, or 0.025 if one sided

– A meaningful difference d in the response: the smallest Tx effect clinically worth detecting / that we wish to detect

– The desirable power (1) to detect this difference, min. 80%

– Ratio of allocation to the groups (equal sample sizes?)

– Whether to use one-sided or two-sided test

In addition,

– The variability common to the two populations for continuous endpoint

– The response (event) rate of the control group for the binary endpoint

Power and Sample size 20

Calculating power using software or Web

PRISM StatMate ($50)

G*Power 3 (Free)

Statistical software: SPSS, SAS, Stata, R

PS Power and Sample size Calculation (free) (Windows)

Web: Google “Statistical Power Calculation”

Russell V. Lenth

http://www.stat.uiowa.edu/~rlenth/Power/

David Schoenfeld

http://hedwig.mgh.harvard.edu/sample_size/size.html

Perform calculation in two methods – similar answers

Power and Sample size 21

Russ Lenth’s Power and Sample size page http://www.stat.uiowa.edu/~rlenth/Power/

Statistical Considerations 22

http://hedwig.mgh.harvard.edu/sample_size/size.html

Statistical Considerations 23

Determining Sample Size:

Continuous outcome

• Two Anti-Hypertensives:

– Testing for superiority

• Endpoint: Difference in Diastolic BP

– Continuous variable

• Relevant parameters

– Difference in Diastolic BP between drugs: d

=2 mm Hg

– Standard deviation of Diastolic BP in each group: 

= 10 mm Hg

– Significance level: 0.05

– Required power: 0.8

– Assume equal sized groups

• Calculate sample size required

393 patients in each group

Power and Sample size 24

Russ Lenth’s Power and Sample size page http://www.stat.uiowa.edu/~rlenth/Power/

Power and Sample size 25

Power and Sample size 26

Power, by difference between two groups

Statistical Considerations 27

Continuous outcome:

Power

Standard error

(equal in each group) Significance level

Difference in means

Sample size

(equal in each group, fixed ratio? )

Power and Sample size 28

Determining Sample Size: Discrete Example

• APT070 perfusion vs. cold storage of kidney

• Testing for superiority

• Endpoint: Delayed Graft Function after transplantation

• Proportion of patients experiencing delayed graft

• Relevant parameters

• Baseline prevalence: 35%

• Minimum difference clinically significance, 10%

• p1=0.35, p2=0.25

[proportion with delayed graft function in each group]

• Significance level 

=0.05

• Power = 80%

• Calculate sample size required

349 patients in each group

Power and Sample size 29

Russ Lenth’s Power and Sample size page http://www.stat.uiowa.edu/~rlenth/Power/

Power and Sample size 30

http://hedwig.mgh.harvard.edu/sample_size/size.html

Power and Sample size

A and 349 patients on treatment A and 349 response rate of treatment A is

0.05 significance level.

This assumes that the response rate of treatment

A is 0.35 and the response rate of treatment B is 0.25.

31

Discrete outcome

Power

Proportion responding in Group 1

Proportion responding in Group 2

Significance level

Sample size

Equal in each group?

Fixed ratio?

Power and Sample size 32

How to use power calculations

• Use power prospectively for planning future studies

– Determine an appropriate sample size

– Evaluating a planned study – will it yield useful information?

• Put science before statistics .

– Use effect sizes that are clinically relevant

– Don’t get distracted by statistical considerations

• Perform a pilot study

– Helps establish procedures, understand and protect against the unexpected

– Gives variance estimates needed in determining sample size

Power and Sample size 33



Design: What is your Hypothesis?

1.Superiority

2.Equivalence:

Objective  To demonstrate that two treatments have no clinically meaningful difference d = largest difference clinically acceptable

H

0

: The two Txs effects are different with respect to the mean response

A

 

B

  d or 

A

 

B

 d 

A

 

B

H

1

: The two Txs are equal with respect to the mean response  d  

A

 

B

  d



A

Power and Sample size

 

B

34

 

Design: What is your Hypothesis?

3.Non-Inferiority:

Objective  To demonstrate that a given treatment is not clinically inferior to another

H

0

: A given Tx is inferior with respect to the mean response

A

 

B

  d

H

1

: A given Tx is non-inferior with respect to the mean response



A

 

B

  d

Power and Sample size



35

QUIZ

Assume 80% Power, α = 0.05, two-sided

1. Mortality

2. Mortality

3. Diastolic BP

4. Diastolic BP

Study A

20% vs 10%

B How many subjects?

Study B

20% vs 15%

(x) more with A

(y) more with B

(z) the same

20% vs 10% 40% vs 30%

(x) more with A

(y) more with B

(z) the same

(x) more with A

80 vs 85 mmHg 90 vs 95 mmHg

St. dev 10 St dev 10 (z) the same

(x) more with A

(y) more with B

St. dev 10 St dev 8

Power and Sample size

36

1. B

2. B

3. Same

4. A

ANSWERS

Small difference need more subjects

Bigger effect size in A (doubling of survival. Smaller effect, larger sample size needed to detect

Only standard deviation matters

Bigger standard deviation more subjects

Power and Sample size

37

Download