Computing for Research I Spring 2012 R: Random number generation & Simulations Jan 31 Presented by: Yanqiu Weng Outline How to sample from common distribution: Uniform distribution Binomial distribution Normal distribution Pre-specified vector Examples: • Randomization code generation • Simulation 1 (explore the relationship between power and effect size) • Simulation 2 (explore the relationship between power and sample size) Syntax for random number generation in R 1. Sample from a known distribution: “r” + name of distribution: e.g., runif() rbinom() rnorm() … Uniform Binomial normal 2. Sample from a vector: sample() e.g., extract two numbers from {1,2,3,4,5,6} with replacement Uniform distribution (continuous) PDF Mean: Variance: [R] Uniform distribution runif(n, min=0, max=1) See R code … [R] Uniform distribution Use UNIFORM distribution to generate BERNOULLI distribution Bernoulli distribution 1 0.0 0.2 0.4 aa1 Uniform distribution 0 0.6 0.8 1.0 Basic idea: See R code … [R] Binomial distribution rbinom(n, size, prob) e.g. generate 10 Binomial random number with Binom(100, 0.6) n = 10 size = 100 prob = 0.6 rbinom(10, 100, 0.6) e.g. generate 100 Bernoulli random number with p=0.6 n = 100 size = 1 prob = 0.6 rbinom(100, 1, 0.6) See R code … [R] Normal distribution rnorm(n, dnorm(x, pnorm(q, qnorm(p, mean, mean, mean, mean, sd) sd) sd) sd) #random number #density #P(X<=q) cdf #quantile See R code … [R] Normal distribution dnorm(x, mean, sd) #density e.g. plot a standard normal curve pnorm(q, mean, sd) #probability P(X<=x) e.g. calculate the p-value for a one sides test with standardized test statistic H0: X<=0 H1: X>0 Reject H0 if “Z” is very large If from the one-sided test, we got the Z value = 3.0, what’s the p-value? P-value = P(Z>=z) = 1 - P(Z<=z) 1 - pnorm(3, 0, 1) [R] Normal distribution qnorm(p, mean, sd) #quantile See R code … rnorm(n, mean, sd) #random number See R code … [R] Another useful command for sampling from a vector – “sample()” e.g. randomly choose two number from {2,4,6,8,10} with/without replacement sample(x, size, replace = FALSE, prob = NULL) 4 2 sample(c(2,4,6,8,10), 2, replace = F) 8 6 10 [R] Another useful command for sampling from a vector – “sample()” e.g. A question from our THEORY I CLASS: “Draw a histogram of all possible average of 6 numbers selected from {1,2,7,8,14,20} with replacement” 14 20 8 1 2 7 Answer: A quick way to solve this question is to do a simulation: That is: we assume we repeat selection of 6 balls with replacement from left urn for many many times, and plot their averages. The R code is looked like: a <- NULL for (i in 1:10000){ a[i] <- mean(sample(c(1,2,7,8,14,20),6, replace = T)) } hist(a) [R] Another useful command for sampling from a vector – “sample()” e.g. Generate 1000 Bernoulli random number with P = 0.6 sample(x, size, replace = T, prob =) Answer: Let x = (0, 1), Let size = 1, Let replace = T/F, Let prob = (0.4, 0.6). Repeat 1000 times 0 1 Example 1 Generate randomization sequence Goal: randomize 100 patients to TRT A and B 1. Simple randomization (like flipping a coin) – Bernoulli distribution 0 0 1 0 0 1 0 1 0 0 …. 1 0 1 0 runif(), rbinom(), sample(). Example 1 Generate randomization sequence Goal: randomize 100 patients to TRT A and B 2. Random allocation rule (RAL) Unlike simple randomization, number of allocation for each treatment need to be fixed in advance Again, think about the urn model! 50 Draw the balls without replacement 50 RAL can only guarantee treatment allocation is balanced toward the end. Example 1 Generate randomization sequence Goal: randomize 100 patients to TRT A and B 3. Permuted block randomization AABB BABA BBAA BABA BAAB … BBAA Block size = 4 sample() Think about multi urns model! 50 50 25 … Example 2 Investigate the relationship between effect size and power – drug increases BP Linear model: Y = b0 + b1X + e Y: Blood Pressure (response) X: intervention (1 = drug vs. 0 = control) e: random error = var(Y) When X=0, E(Y) = b0, effect of control; When X=1, E(Y) = b0 + b1, effect of drug; Between group different is represented by b1 b1 represent the effect size of drug. For instance, if a previous study shows us that the BP in control population are distributed as N(100, 49), what is the power when real effect of the drug on BP increase is 0, 1, 2, 3, 4 and 5 for study with sample size = 100 (50 in drug, 50 in placebo) b0 = 100 Important information: Y (placebo) ~ N(100, 49) e ~ N(0, 49) Example 2 Investigate the relationship between effect size and power - drug increases BP Linear model: Y = b0 + b1X + e Y: Blood Pressure (response) X: intervention (1 = drug vs. 0 = control) e: random error = var(Y) Important information: Y (placebo) ~ N(100, 49) b0 = 100 e ~ N(0, 49) We are try to answer: What’s the power given b1 (the real effect size of the treatment) is 0, 1, 2, 3, 4 or 5 Definition of Power: Probability of rejecting NULL when ALTERNATIVE IS TRUE (i.e., b1 = some non-zero value). If we run simulation for N times, power means the probability of b1 (treatment effect) showing significant (P<0.05) from linear regression tests out of N simulations Example 2 Investigate the relationship between effect size and power - drug increases BP Linear model: Y = b0 + b1X + e Y: Blood Pressure (response) X: intervention (1 = drug vs. 0 = control) e: random error = var(Y) Simulation steps (E.g. sample size = 50/ per group, 1000 simulations): 1. Generate X according to study design (50 “1”s and 50 “0”s); 2. Generate 100 “e” from N(0, 49); 3. Given b0 and b1, generate Y using Y = b0 + b1X + e; 4. Use 100 pairs of (Y, X) to refit a new linear model, and get the new b0 and b1 and their p-value; 5. Repeat these steps for 1000 times. 6. If type I error is 0.05, for a two-sided test Power # p value for b1 0.05 in 1000 simulations 1000 Example 3 Investigate the relationship between sample size and power Linear model: Y = b0 + b1X + e We try to answer: What’s the power given b1 = 2 and sample size = 25, 50, 75, 100, 125, and 150 per group