power_MScTransMed_for2014 - KEATS

Statistical Power and Sample

Size Calculations

Drug Development Statistics & Data Management

July 2014

Cathryn Lewis

Professor of Genetic Epidemiology & Statistics

Department of Medical & Molecular Genetics

King’s College London

With thanks to Irene Rebollo Mesa and Frühling Rijsdijk

Outline

1.

Concepts of power

2.

Power and types of error

3.

Software to calculate power

4.

Power for continuous outcome

5.

Power for proportion, success/failure

6.

Quiz!

Power and Sample size 2

Planning a Study

Question : What are the study endpoints?

Types of Endpoints:

• Binary clinical outcome: Death from disease.

• Quantitative : Creatinine, cholesterol levels, QOL.

• Time to Event: Time to graft failure, time to death, time to recovery

Good Qualities:

Clinically meaningful

Practical and feasible to measure

Occur frequently enough throughout the duration of the trial


Planning a Study

Question : What is the expected prevalence of outcome

(discrete) or variability of the outcome (continuous)?

• Based on previous studies, pilot study or hospital/NHS report.

• Variability and prevalence are vital for power.

• Both are best at intermediate levels.

Question:What is the expected difference between groups

• in proportion of events (if discrete), or

• in mean measure (if continuous)

• Based on previous studies or pilot study

• Alternatively, minimum difference clinically relevant

• The larger the difference the higher the power


Design: What is your Hypothesis

1.Superiority

Objective  To determine whether there is evidence of statistical difference in the comparison of interest between two Tx regimes:

A: Tx of Interest B : Placebo or

Active control Tx

H

0

: The two Txs have equal effect with respect to the mean response 

A

 

B

H

1

: The two Txs are different with respect to the mean response



A

 

B






Power and Sample size

Statistical Power

6

Power

• Definition: The expected proportion of samples in which we decide correctly against the null hypothesis

• It depends on:

1.

Size of the (treatment) effect in the population ( d

)

2.

The significance level at which we reject the null (0.05)

3.

Sample size (N)

4.

Design of the study: parallel or crossover etc.

5.

Endpoint measurement (categorical, ordinal, continuous)

6.

The expected dropout rate

Power and Sample size 7

Power primer

• We summarise results of a trial in a statistical analysis with a test statistic (e.g. chi-squared, Z score)

• Provide a measure of support for a certain hypothesis

• Pre-determine threshold on test statistic to reject null hypothesis

NO

YES

Test statistic

YES OR NO decision-making : significance testing

Inevitably leads to two types of mistake : false positive ( YES instead of NO ) (Type I) false negative ( NO instead of YES ) (Type II)


Standard Case

Sampling distribution if

H

0 were true alpha 0.05




H

A were true



POWER:

1 -



T


Rejection of H

0

Non-rejection of H

0

H

0 true

Signifcance level

Type I error = α

H

A true

Power

1-type II error = 1β

Type II error = β


Hypothesis testing

• Null hypothesis : no effect

• A ‘ significant ’ result means that we can reject the null hypothesis

• A ‘ non-significant ’ result means that we cannot reject the null hypothesis


Statistical significance

• The ‘ p-value ’

• The probability of a false positive error if the null were in fact true

• Typically, we are willing to incorrectly reject the null

5% or 1% of the time ( Type I error )


Rejection of H

0

Non-rejection of H

0

H

0 true

Signifcance level

Type I error = α

H

A true

Power

1-type II error = 1β

Type II error = β


H

0 true

Rejection of H

0

Type I error at rate



Non-rejection of H

0

Nonsignificant result

(1-



)

H

A true

Significant result

(1-



)

Type II error at rate




Standard Case


H





H

A were true



POWER:

1 -



T


Increased effect size


H



H

A were true

POWER:

1 -

 ↑

 

T


More conservative

α


H


 

POWER:

1 -

 ↓


H

A were true

T


Less conservative

α


H


 

POWER:

1 -

 ↑


H

A were true


Reduced variation


H



H

A were true

POWER:

1 -

 ↑

 

T


Determining Sample Size

We need:

– Acceptable type I error rate ( 

),

• usually 0.05, or 0.025 if one sided

– A meaningful difference d in the response: the smallest Tx effect clinically worth detecting / that we wish to detect

– The desirable power (1) to detect this difference, min. 80%

– Ratio of allocation to the groups (equal sample sizes?)

– Whether to use one-sided or two-sided test

In addition,

– The variability common to the two populations for continuous endpoint

– The response (event) rate of the control group for the binary endpoint


Calculating power using software or Web

PRISM StatMate ($50)

G*Power 3 (Free)

Statistical software: SPSS, SAS, Stata, R

PS Power and Sample size Calculation (free) (Windows)

Web: Google “Statistical Power Calculation”

Russell V. Lenth

http://www.stat.uiowa.edu/~rlenth/Power/

David Schoenfeld

http://hedwig.mgh.harvard.edu/sample_size/size.html

Perform calculation in two methods – similar answers


Russ Lenth’s Power and Sample size page http://www.stat.uiowa.edu/~rlenth/Power/

Statistical Considerations 22

http://hedwig.mgh.harvard.edu/sample_size/size.html


Determining Sample Size:

Continuous outcome

• Two Anti-Hypertensives:

– Testing for superiority

• Endpoint: Difference in Diastolic BP

– Continuous variable

• Relevant parameters

– Difference in Diastolic BP between drugs: d

=2 mm Hg

– Standard deviation of Diastolic BP in each group: 

= 10 mm Hg

– Significance level: 0.05

– Required power: 0.8

– Assume equal sized groups

• Calculate sample size required

393 patients in each group





Power, by difference between two groups


Continuous outcome:

Power

Standard error

(equal in each group) Significance level

Difference in means

Sample size

(equal in each group, fixed ratio? )


Determining Sample Size: Discrete Example

• APT070 perfusion vs. cold storage of kidney

• Testing for superiority

• Endpoint: Delayed Graft Function after transplantation

• Proportion of patients experiencing delayed graft

• Relevant parameters

• Baseline prevalence: 35%

• Minimum difference clinically significance, 10%

• p1=0.35, p2=0.25

[proportion with delayed graft function in each group]

• Significance level 

=0.05

• Power = 80%

• Calculate sample size required

349 patients in each group




http://hedwig.mgh.harvard.edu/sample_size/size.html


A and 349 patients on treatment A and 349 response rate of treatment A is

0.05 significance level.

This assumes that the response rate of treatment

A is 0.35 and the response rate of treatment B is 0.25.

31

Discrete outcome

Power

Proportion responding in Group 1

Proportion responding in Group 2

Significance level

Sample size

Equal in each group?

Fixed ratio?


How to use power calculations

• Use power prospectively for planning future studies

– Determine an appropriate sample size

– Evaluating a planned study – will it yield useful information?

• Put science before statistics .

– Use effect sizes that are clinically relevant

– Don’t get distracted by statistical considerations

• Perform a pilot study

– Helps establish procedures, understand and protect against the unexpected

– Gives variance estimates needed in determining sample size




Design: What is your Hypothesis?

1.Superiority

2.Equivalence:

Objective  To demonstrate that two treatments have no clinically meaningful difference d = largest difference clinically acceptable

H

0

: The two Txs effects are different with respect to the mean response



A

 

B

  d or 

A

 

B

 d 

A

 

B

H

1

: The two Txs are equal with respect to the mean response  d  

A

 

B

  d





A


 

B

34

 

Design: What is your Hypothesis?

3.Non-Inferiority:

Objective  To demonstrate that a given treatment is not clinically inferior to another

H

0

: A given Tx is inferior with respect to the mean response



A

 

B

  d

H

1

: A given Tx is non-inferior with respect to the mean response





A

 

B

  d




35

QUIZ

Assume 80% Power, α = 0.05, two-sided

1. Mortality

2. Mortality

3. Diastolic BP

4. Diastolic BP

Study A

20% vs 10%

B How many subjects?

Study B

20% vs 15%

(x) more with A

(y) more with B

(z) the same

20% vs 10% 40% vs 30%

(x) more with A

(y) more with B

(z) the same

(x) more with A

80 vs 85 mmHg 90 vs 95 mmHg

St. dev 10 St dev 10 (z) the same

(x) more with A

(y) more with B

St. dev 10 St dev 8


36

1. B

2. B

3. Same

4. A

ANSWERS

Small difference need more subjects

Bigger effect size in A (doubling of survival. Smaller effect, larger sample size needed to detect

Only standard deviation matters

Bigger standard deviation more subjects


37

power_MScTransMed_for2014 - KEATS

Statistical Power and Sample

Size Calculations

Outline

Planning a Study

Planning a Study

Design: What is your Hypothesis

Statistical Power

Power

Power primer

Standard Case

Hypothesis testing

Standard Case

Increased effect size

More conservative

α

Less conservative

α

Reduced variation

Determining Sample Size

Calculating power using software or Web

393 patients in each group

349 patients in each group

How to use power calculations

Design: What is your Hypothesis?

Design: What is your Hypothesis?

QUIZ

ANSWERS

Related documents

Products

Support

power_MScTransMed_for2014 - KEATS

Statistical Power and Sample

Size Calculations

Outline

Planning a Study

Planning a Study

Design: What is your Hypothesis

Statistical Power

Power

Power primer

Standard Case

Hypothesis testing

Standard Case

Increased effect size

More conservative

α

Less conservative

α

Reduced variation

Determining Sample Size

Calculating power using software or Web

393 patients in each group

349 patients in each group

How to use power calculations

Design: What is your Hypothesis?

Design: What is your Hypothesis?

QUIZ

ANSWERS

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib