R simulations

advertisement
Computing for Research I
Spring 2013
R: Random number generation
& Simulations
April 7
Presented by:
Liqiong Fan
Outline
How to sample from common distribution:
Uniform distribution
Binomial distribution
Normal distribution
Pre-specified vector
Examples:
• Randomization code generation
• Simulation 1 (explore the relationship between power and effect size)
• Simulation 2 (explore the relationship between power and sample size)
Syntax for random number generation in R
1. Sample from a known distribution: “r” + name of distribution:
e.g., runif()
rbinom()
rnorm()
…
Uniform
Binomial
normal
2. Sample from a vector: sample()
e.g., extract two numbers from {1,2,3,4,5,6} with replacement
Uniform distribution (continuous)
PDF
Mean:
Variance:
[R] Uniform distribution
runif(n, min=0, max=1)
See R code …
[R] Uniform distribution
Use UNIFORM distribution to generate BERNOULLI distribution
Bernoulli
distribution
1
0.0
0.2
0.4
aa1
Uniform
distribution
0
0.6
0.8
1.0
Basic idea:
See R code …
[R] Binomial distribution
rbinom(n, size, prob)
e.g. generate 10 Binomial random number with Binom(100, 0.6)
n = 10
size = 100
prob = 0.6
rbinom(10, 100, 0.6)
e.g. generate 100 Bernoulli random number with p=0.6
n = 100
size = 1
prob = 0.6
rbinom(100, 1, 0.6)
See R code …
[R] Normal distribution
rnorm(n,
dnorm(x,
pnorm(q,
qnorm(p,
mean,
mean,
mean,
mean,
sd)
sd)
sd)
sd)
#random number
#density
#P(X<=q) cdf
#quantile
See R code …
[R] Normal distribution
dnorm(x, mean, sd) #density
e.g. plot a standard normal curve
pnorm(q, mean, sd) #probability P(X<=x)
e.g. calculate the p-value for a one sides test with standardized test statistic
H0: X<=0
H1: X>0
Reject H0 if “Z” is very large
If from the one-sided test, we got the Z value = 3.0, what’s the p-value?
P-value = P(Z>=z) = 1 - P(Z<=z)
1
- pnorm(3, 0, 1)
[R] Normal distribution
qnorm(p, mean, sd) #quantile
See R code …
rnorm(n, mean, sd) #random number
See R code …
[R] Another useful command for sampling from a
vector – “sample()”
e.g. randomly choose two number from {2,4,6,8,10} with/without replacement
sample(x, size, replace = FALSE, prob = NULL)
4
2
sample(c(2,4,6,8,10), 2, replace = F)
8
6
10
[R] Another useful command for sampling from a
vector – “sample()”
e.g. A question from our THEORY I CLASS:
“Draw a histogram of all possible average of 6 numbers
selected from {1,2,7,8,14,20} with replacement”
14
20
8
1
2
7
Answer:
A quick way to solve this question is to do a simulation:
That is: we assume we repeat selection of 6 balls with
replacement from left urn for many many times, and plot
their averages. The R code is looked like:
a <- NULL
for (i in 1:10000){
a[i] <- mean(sample(c(1,2,7,8,14,20),6, replace = T))
}
hist(a)
[R] Another useful command for sampling from a
vector – “sample()”
e.g. Generate 1000 Bernoulli random number with P = 0.6
sample(x, size, replace = T, prob =)
Answer:
Let x = (0, 1),
Let size = 1,
Let replace = T/F,
Let prob = (0.4, 0.6).
Repeat 1000 times
0
1
Example 1
Generate randomization sequence
Goal: randomize 100 patients to TRT A and B
1. Simple randomization (like flipping a coin) – Bernoulli distribution
0 0 1 0 0 1 0 1 0 0 …. 1 0 1 0
runif(), rbinom(), sample().
See R code …
Example 1
Generate randomization sequence
Goal: randomize 100 patients to TRT A and B
2. Random allocation rule (RAL)
Unlike simple randomization, number of allocation for each treatment
need to be fixed in advance
Again, think about the urn model!
50
Draw the balls without replacement
50
RAL can only guarantee treatment allocation is
balanced toward the end.
Example 1
Generate randomization sequence
Goal: randomize 100 patients to TRT A and B
3. Permuted block randomization
AABB BABA BBAA BABA BAAB … BBAA
Block size = 4
sample()
Think about multi urns model!
50
50
25
…
Example 2
Investigate the relationship between effect size and
power – drug increases SBP
Linear model: Y = b0 + b1X + e
Y: Systolic Blood Pressure (response)
X: intervention (1 = drug vs. 0 = control)
e: random error = var(Y)
When X=0, E(Y) = b0, effect of control;
When X=1, E(Y) = b0 + b1, effect of drug;
Between group difference is represented by
b1
b1 represents the effect size of new drug relative to the control.
For instance, assuming that the SBP in control population is distributed as N(120, 49),
what is the power if the new drug can truly increase SBP by 0, 1, 2, 3, 4 and 5 units in a
study with a sample size of 100 (50 in drug, 50 in placebo)
b0 = 120
Important information: Y (placebo) ~ N(120, 49)
e ~ N(0, 49)
Example 2
Investigate the relationship between effect size and
power - drug increases BP
Linear model: Y = b0 + b1X + e
Y: Blood Pressure (response)
X: intervention (1 = drug vs. 0 = control)
e: random error = var(Y)
Important information: Y (placebo) ~ N(120, 49)
b0 = 120
e ~ N(0, 49)
We try to answer:
What’s the power given b1 (the real effect size of the treatment) is 0, 1, 2, 3, 4 or 5
Definition of Power:
Probability of rejecting NULL when ALTERNATIVE IS TRUE (i.e., b1 = some non-zero value).
If we run simulation for N times, power means the probability that b1 (treatment
effect) shows significant (P<0.05) from linear regression tests out of N simulations
Example 2
Investigate the relationship between effect size and
power - drug increases BP
Linear model: Y = b0 + b1X + e
Y: Systolic Blood Pressure (response)
X: intervention (1 = drug vs. 0 = control)
e: random error = var(Y)
Simulation steps (E.g. sample size = 50/ per group, 1000 simulations):
1. Generate X according to study design (50 “1”s and 50 “0”s);
2. Generate 100 “e” from N(0, 49);
3. Given b0 and b1, generate Y using Y = b0 + b1X + e;
4. Use 100 pairs of (Y, X) to refit a new linear model, and get the new b0 and b1 and
their p-value;
5. Repeat these steps for 1000 times.
6. If type I error is 0.05, for a two-sided test
Power 
# p  value for b1  0.05 in 1000 simulations
1000
Example 3
Investigate the relationship between sample size and
power
Linear model: Y = b0 + b1X + e
We try to answer:
What’s the power given b1 = 2 and sample size = 25, 50, 75, 100, 125, and 150 per group
Some recommendation
1. Try not “fix” the parameters in your simulation
2. Always test your code with small number of iterations before
you actually start your simulation
3. Use append / write.table (… append = T …) to save the result or
simulated data
4. Print the number of interations / senarios
Code:
print(c)
flush.console()
Download