Chapter 10 Simulation: An Introduction ST2137 Computer Aided Data Analysis CYM 10.1 1. Introduction 1.1 Definition • A simulation is an imitation of some real things, state of affairs, or processes. • The act of simulating something generally entails certain key characteristics or behaviours of a selected physical or abstract system. ST2137 Computer Aided Data Analysis CYM 10.2 1.2 Use of Simulation • Simulation is used in many contexts, including the modeling of natural systems or human systems in order to gain insight into their functioning. • Simulation can be used to show the eventual real effects of alternative conditions and courses of action. ST2137 Computer Aided Data Analysis CYM 10.3 Example 1: Checking distribution theory Theory: • A sample of size 4 is taken from a normal distribution with mean 0. • Consider the statistic ๐เดค ๐= ๐/ 4 where ๐เดค and S are the sample mean and s.d. respectively • The statistic T follows a t-distribution with 3 degrees of freedom. ST2137 Computer Aided Data Analysis CYM 10.4 Example 1 (Continued) Simulation: • Generate 1000 random samples each of size 4 from a normal distribution. • For each sample, compute the value of the statistic T. • Construct a histogram for these 1000 realizations of the statistic T. ST2137 Computer Aided Data Analysis CYM 10.5 Example 1 (Continued) Result: • The histogram matches well with the t(3) distribution. ST2137 Computer Aided Data Analysis CYM 10.6 Example 2: Comparing estimators We have learnt several robust estimators of location Question: Which estimator is the best, the trimmed mean or the Winsorized mean? Secondary questions: • Under what conditions ( in terms of the underlying distributions ) is the estimator better? We may consider various underlying distributions ST2137 Computer Aided Data Analysis CYM 10.7 Example 2: Comparing estimators • How to measure the performance? We may use the Mean Squared Error (MSE) ๐๐๐ธ = ๐ต๐๐๐ 2 + ๐๐๐๐๐๐๐๐ • It may be intractable to compute the MSE for each of the estimators under different underlying distributions. • Simulation is an alternative solution • How to design such a simulation study? ST2137 Computer Aided Data Analysis CYM 10.8 Example 3: Buffon’s needle experiment A needle of length ๐ is thrown randomly onto a grid of parallel lines with distance ๐ ( > ๐ ). ST2137 Computer Aided Data Analysis CYM 10.9 Example 3 (Continued) • What is the probability that the needle intersects a line? Answer: 2/๐ ๐๐ • • Can we get the answer through simulations? A website gives the visualization of the experiment http://www.metablake.com/pi.swf ST2137 Computer Aided Data Analysis CYM 10.10 Example 3 (Continued) The experiment • Simulate the throwing of a needle into a grid of parallel lines, say N times • Count the number of times the needle intersects a line, say n times • Then n/N gives estimate of the probability that the needle intersects a line. ST2137 Computer Aided Data Analysis CYM 10.11 2. How to do the simulation • Step 1: Generate the position and the inclination of the needle. – Generate a random number, ๐ฅ, from Uniform (0, ๐/2). This number represents the location of the centre of the needle. – Generate a random number, ๐, from Uniform (0, ๐). This number represents the angle of between the needle and the parallel lines. ST2137 Computer Aided Data Analysis CYM 10.12 How to do the simulation (Continued) • Step 2: Check if the needle cuts a line. – The needle cuts a line if ๐ฅ < (๐/2) sin(๐). – Create a variable ๐ค = 1 if ๐ฅ < (๐/2) sin(๐) and 0 otherwise. • Step 3: Repeat Steps 1 and 2 N times. – Count the number of times that ๐ค = 1, let say n times. – Then an estimate of the probability of the needle intersects a line is given by ๐/๐. ST2137 Computer Aided Data Analysis CYM 10.13 Example 3: R Code > # Buffon's needle > # X ~ U(0,d/2), t ~ U(0,pi) where d is the distance between 2 parallel lines > # A needle of length L cut one of the lines if x < L/2*sin(t) > # Theoretical Probability = 2*L/(pi*d) > ns <- 50000 > d <- 2 > L <- 1 > d2 <- d/2 ST2137 Computer Aided Data Analysis CYM 10.14 Example 3: R Code > # Buffon's needle > # Theoretical answer > 2*L/(pi*d) [1] 0.3183099 > x <- runif(ns,0,d2) # Generate a sample of size ns from uniform(0, d2) > t <- runif(ns,0,pi) # Generate a sample of size ns from uniform (0, pi) > length(x[x < L/2*sin(t)])/ns [1] 0.3186 ST2137 Computer Aided Data Analysis CYM 10.15 3. Random number generator 3.1 Definition • A sequence of pseudo-random numbers {Ui} is a deterministic sequence of number in [0, 1] having the same relevant statistical properties as a sequence of random numbers ST2137 Computer Aided Data Analysis CYM 10.16 3.2 Congruential generators • Congruential generators are defined by ๐๐ = ๐๐๐−1 + ๐ ๐๐๐ ๐ for a multiplier a, shift c, and modulus M. • ๐, ๐ and ๐ are all integers • For example: Let ๐ = 10, ๐ = 2, ๐ = 0, ๐ฅ0 = 1 • Then the sequence generated is 1, 2, 4, 8, 6, 2, 4, 8, 6, 2, 4, 8, โฏ • Are these numbers random? There is a pattern, 2, 4, 8 ,6. • Uniform random numbers are obtained by ๐๐ = ๐๐ /๐ ST2137 Computer Aided Data Analysis CYM 10.17 3.2 Congruential generators • To initialize, we have to provide a seed, ๐0 . • If ๐ = 0, generators having the form ๐๐ = ๐๐๐−1 ๐๐๐ ๐ are called multiplicative congruential generator. • If ๐ > 0, they are called linear congruential generator. ST2137 Computer Aided Data Analysis CYM 10.18 Remarks • M + 1 values {๐0 , ๐1 , โฏ , ๐๐ } cannot be distinct and at least one value must occur twice, as ๐๐ and ๐๐+๐ , say. • ๐๐ , ๐๐+1 , โฏ , ๐๐+๐−1 is repeated as ๐๐+๐ , ๐๐+๐+1 , โฏ , ๐๐+2๐−1 • The sequence {๐๐ } is periodic with period ๐ ≤ ๐ • For multiplicative generators, the maximal period is ๐ − 1 If 0 ever occurs, it is repeated indefinitely. • One of our primary objectives is to use a generator with as large period as possible. • However, a large period does not guarantee a good generator. ST2137 Computer Aided Data Analysis CYM 10.19 3.3 R function for congruential generators > # Linear congruential generators > lcg <- function(n,a,m,c,x0){ + ran <- NULL + for (i in 1:n) { + x1 <- (a*x0+c)%%m + x0 <- x1 + ran <- c(ran,x1/m) } + return(ran) } > lcg(10,397204094,2^31-1,0,1234) [1] 0.24381116 0.08947511 0.38319371 0.72792771 0.17503827 [7] 0.40680994 0.90923700 0.34271140 ST2137 Computer Aided Data Analysis CYM 0.72800325 0.55960111 10.20 4. Generate uniform random numbers 4.1 SAS code: *Generate Uniform Random Numbers; data try1; seed = 1234; do i = 1 to 10; x = ranuni(seed); output; end; keep x; run; proc print data=try1; var x; run; ST2137 Computer Aided Data Analysis CYM 10.21 4. Generate uniform random numbers 4.1 SAS code (Continued): * Alternative way to generate Uniform Random Numbers; data try1; seed = 1234; call streaminit(seed); do i = 1 to 10; x = rand(‘uniform’); output; end; keep x; run; • Without call streaminit(seed), the sets of random numbers generated will be different. • If seed value is not specified, then the time at which the command is executed is used for the seed. ST2137 Computer Aided Data Analysis CYM 10.22 Generate uniform random numbers (Continued) 4.2 R code > # Generate 1000 random numbers from U(0, 1) distribution > set.seed(1234) > x <- runif(1000) In general, “runif(n, a, b)” generates a vector of n random numbers from a uniform distribution between a and b. ST2137 Computer Aided Data Analysis CYM 10.23 5 Generate non-uniform random numbers 5.1 Inversion method • If X has a continuous distribution function ๐น(๐ฅ) (Recall ๐น ๐ฅ = Pr(๐ ≤ ๐ฅ)), then ๐น ๐ฅ ~ ๐๐๐๐๐๐๐(0, 1) Algorithm: • Generate U from Uniform(0, 1) • Set ๐ = ๐น −1 (๐) provided that the inverse function exists. ST2137 Computer Aided Data Analysis CYM 10.24 5.2 Exponential distribution • If X follows an exponential distribution with parameter λ (i.e. ๐ธ ๐ = ๐), then ๐ฅ 1 −๐ก ๐น ๐ฅ = Pr ๐ ≤ ๐ฅ = เถฑ ๐ ๐ ๐๐ก 0 ๐ • Note : The p.d.f. of ๐ is given by ๐ ๐ก = 1 −๐ก ๐ ๐ ๐ for 0 ≤ ๐ก ≤ ∞ • Let ๐ฆ = ๐ก/๐. Then ๐๐ฆ = ๐๐ก/๐. ๐ก = 0 ⇒ ๐ฆ = 0 and ๐ก = ๐ฅ ⇒ ๐ฆ = ๐ฅ/๐. Hence ๐ฅ/๐ ๐น(๐ฅ) = เถฑ ๐ −๐ฆ ๐๐ฆ = 1 − ๐ −๐ฅ/๐ ST2137 Computer Aided Data Analysis 0 CYM 10.25 5.2 Exponential distribution • Let ๐ = ๐น ๐ = 1 − ๐ −๐/๐ . • Then ๐ = ๐น −1 ๐ = −๐ log(1 − ๐) Suppose ๐ = 2. Step 1: Generate U from Uniform(0, 1) Step 2: Set ๐ = −2 log(1 − ๐). ST2137 Computer Aided Data Analysis CYM 10.26 5.3 Weibull distribution If X follows a Weibull distribution with parameter ๐ฝ, then it can be shown that ๐น ๐ฅ = 1 − exp −๐ฅ ๐ฝ , on (0, ∞) • Note: The p.d.f. for ๐ is given by ๐ ๐ฅ = ๐ฝ๐ฅ ๐ฝ−1 exp −๐ฅ ๐ฝ , for ๐ฅ in 0, ∞ 1. Generate U from Uniform(0, 1) 2. Set ๐ = − log 1 − ๐ 1/๐ฝ . (It is obtained by solving for ๐ in the equation ๐ = 1 − exp(−๐ฅ ๐ฝ )) ST2137 Computer Aided Data Analysis CYM 10.27 5.4 Cauchy distribution • If ๐ follows a Cauchy distribution with parameter ๐ and ๐, then it can be shown that 1 1 ๐ฅ−๐ −1 ๐น ๐ฅ = + tan , for ๐ฅ in (−∞, ∞) 2 ๐ ๐ • Note: 1 ๐ ๐ฅ = ๐ฅ−๐ 2 ๐๐ 1 + ๐ 1. Generate U from Uniform(0, 1). 2. Set ๐ = ๐ tan ๐ ๐ − 0.5 + ๐ ST2137 Computer Aided Data Analysis CYM 10.28 6. Algorithm to generate a normal random variable 6.1 Box-Muller Algorithm • Generate ๐1 and ๐2 from Uniform(0, 1). • Set ๐ = 2๐๐1 . • Set ๐ = −2 log ๐ 1/2 2 • Set ๐ = ๐ cos(๐) and ๐ = ๐ sin ๐ . • X and Y are independent standard normal variables. • For simplicity, only X or Y is used. ST2137 Computer Aided Data Analysis CYM 10.29 7. Generate a random variable from other random variables 7.1 Cauchy distribution • If Y and Z are independent and follow ๐(0, 1), then ๐ = ๐/๐ follows a Cauchy(0, 1) distribution. • If ๐~๐(๐, ๐ 2 ) and ๐~๐(0, 1), and are independent, then ๐ = ๐/๐ follows a Cauchy(๐, ๐๐) distribution ST2137 Computer Aided Data Analysis CYM 10.30 Generate a random variable from other random variables (Continued) 7.2 Chi-square distribution • If ๐~๐(0, 1) and ๐ = ๐ 2 , then ๐ ~๐ 2 (1) • If ๐1 , ๐2 , โฏ , ๐๐ are independent and identically distributed standard normal variables and ๐ = ๐12 + ๐22 + โฏ + ๐๐2 , then ๐~๐ 2 (๐) ST2137 Computer Aided Data Analysis CYM 10.31 Generate a random variable from other random variables (Continued) 7.3 Student’s t-distribution • If ๐ ~ ๐(0, 1) and ๐ ~ ๐ 2 (๐), then ๐ ๐= ๐/๐ follows a Student’s t distribution with p degrees of freedom ST2137 Computer Aided Data Analysis CYM 10.32 Generate a random variable from other random variables (Continued) 7.4 F distribution • If ๐ ~ ๐ 2 ๐ , and ๐ ~ ๐ 2 ๐ , then ๐/๐ ๐= ๐/๐ follows an F distribution with degrees of freedom m and n. ST2137 Computer Aided Data Analysis CYM 10.33 8. Functions to generate random numbers from specific distributions References SAS http://support.sas.com/documentation/cdl/en/lrdict/6431 6/HTML/default/viewer.htm#a001466748.htm R https://stat.ethz.ch/R-manual/Rdevel/library/stats/html/Distributions.html ST2137 Computer Aided Data Analysis CYM 10.34 8.1 Function to generate random numbers from Uniform (a, b) To generate random numbers from Uniform (a, b) 1 ๐ ๐ฅ = , for ๐ < ๐ฅ < ๐ ๐−๐ R program > > > > > > > # Generate uniform random variables n <- 100 a <- 0 b <- 100 set.seed(1234) # set the seed number x <- runif(n,a,b) x ST2137 Computer Aided Data Analysis CYM 10.35 8.1 Function to generate uniform distribution random variables (Continued) SAS program /* Generate uniform random variables */ data unif; call streaminit(1234); /* 1234 is used as the seed */ n = 100; a = 0; b = 10; do i = 1 to n; x = a + (b - a)*rand('uniform'); output; end; keep x; proc print data=unif; run; ST2137 Computer Aided Data Analysis CYM 10.36 8.2 Function to generate Normal distribution random variables To generate random numbers from Normal(๐, ๐2) 1 ๐ฅ−๐ 2 ๐ ๐ฅ = exp − , for − ∞ < ๐ฅ < ∞ 2 2๐ 2πσ R program > > > > > > # Generate normal random variables n <- 100 mu <- 0 sigma <- 1 x <- rnorm(n, mean=mu, sd=sigma) x ST2137 Computer Aided Data Analysis CYM 10.37 8.2 Function to generate Normal distribution random variables (Continued) SAS program /* Generate normal random variables */ data norm; seed = 1234; call streaminit(seed); n = 100; mu = 0; sigma = 1; do i = 1 to n; x = rand('normal',mu,sigma); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.38 8.3 Function to generate Exponential distribution random variables To generate random numbers from Exponential(λ) 1 ๐ฅ ๐ ๐ฅ = exp − , ๐ ๐ for ๐ฅ > 0 R program > > > > > # Generate exponential random variables n <- 100 lambda <- 5 x <- rexp(n, mean=lambda) x ST2137 Computer Aided Data Analysis CYM 10.39 8.3 Function to generate Exponential distribution random variables (Continued) SAS program /* Generate exponential random variables */ data expno; seed = 1234; call streaminit(seed); n = 100; lambda = 5; do i = 1 to n; x = lambda*rand('exponential'); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.40 8.4 Function to generate Gamma distribution random variables To generate random numbers from Gamma(๐ผ, ๐ฝ) 1 ๐ฅ ๐ผ−1 ๐ ๐ฅ = ๐ผ ๐ฅ exp − , ๐ฝ Γ ๐ผ ๐ฝ for ๐ฅ > 0 R program > > > > > > # Generate Gamma random varaibles n <- 100 alpha <- 1 beta <- 2 x <- rgamma(n, shape = alpha, scale = beta) x ST2137 Computer Aided Data Analysis CYM 10.41 8.4 Function to generate Gamma distribution random variables (Continued) SAS program /* Generate Gamma random variables */ data gammano; seed = 1234; call streaminit(seed); n = 100; alpha = 1; beta = 2; do i = 1 to n; x = beta*rand('gamma',alpha); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.42 8.5 Function to generate Chi-square distribution random variables To generate random numbers from Chi-square(p) ๐ ๐ฅ = 1 ๐ 22 Γ ๐ 2 ๐ ๐ฅ 2−1 exp ๐ฅ − , 2 for ๐ฅ > 0 R program > > > > > # n p x x Generate Chisquare random variables <- 100 <- 10 <- rchisq(n, df = p) ST2137 Computer Aided Data Analysis CYM 10.43 8.5 Function to generate Chi-square distribution random variables (Continued) SAS program /* Generate chisquare random variables */ data chisqno; seed = 1234; call streaminit(seed); n = 100; df = 10; do i = 1 to n; x = rand('chisquare',df); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.44 8.6 Function to generate Beta distribution random variables To generate random numbers from Beta(α, β) Γ ๐ผ + ๐ฝ ๐ผ−1 ๐ ๐ฅ = ๐ฅ 1−๐ฅ Γ ๐ผ Γ ๐ฝ ๐ฝ−1 , for 0 < ๐ฅ < 1 R program > > > > > > # n a b x x Generate Beta random variables <- 100 <- 2 <- 3 <- rbeta(n, shape1 = a, shape2 = b) ST2137 Computer Aided Data Analysis CYM 10.45 8.6 Function to generate Beta distribution random variables (Continued) SAS program /* Generate Beta random variables */ data betano; seed = 1234; call streaminit(seed); n = 100; alpha = 2; beta = 3; do i = 1 to n; x = rand('beta',alpha,beta); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.46 8.7 Function to generate t-distribution random variables To generate random numbers from t(k) ๐+1 Γ 2 ๐ ๐ฅ = ๐ Γ 2 1 1 ๐๐ 1+ ๐+1 , ๐ฅ2 2 for − ∞ < ๐ฅ < ∞ ๐ R program > > > > > # n k x x Generate t random variables <- 100 <- 5 <- rt(n, df = k) ST2137 Computer Aided Data Analysis CYM 10.47 8.7 Function to generate t-distribution random variables (Continued) SAS program /* Generate t random variables */ data tno; seed = 1234; call streaminit(seed); n = 100; df = 5; do i = 1 to n; x = rand('t',df); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.48 8.8 Function to generate F-distribution random variables To generate random numbers from F(m, n) ๐1 + ๐2 Γ ๐ ๐ฅ = ๐ 2 ๐ Γ 1 Γ 2 2 2 ๐1 ๐2 ๐1 2 ๐1 ๐ฅ 2 −1 ๐1 1+ ๐ฅ ๐2 ๐1 +๐2 2 , for 0 < ๐ฅ < ∞ R program > > > > > > # Generate F random variables n <- 100 n1 <- 5 n2 <- 10 x <- rf(n, df1 = n1, df2 = n2) x ST2137 Computer Aided Data Analysis CYM 10.49 8.8 Function to generate F-distribution random variables (Continued) SAS program /* Generate F random variables */ data fno; seed = 1234; call streaminit(seed); n = 100; df1 = 5; df2 = 10; do i = 1 to n; x = rand('f',df1,df2); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.50 8.9 Function to generate Binomial distribution random variables To generate random numbers from Binomial(n, p) ๐ ๐ฅ ๐ ๐ฅ = ๐ 1−๐ ๐ฅ ๐−๐ฅ , for ๐ฅ = 0, 1, โฏ , ๐ R program > > > > > > # Generate Binomial random variables nn <- 100 n <- 10 p <- 0.3 x <- rbinom(100, size = n, prob = p) x ST2137 Computer Aided Data Analysis CYM 10.51 8.9 Function to generate Binomial distribution random variables (Continued) SAS program /* Generate Binomial random variables */ data binomno; seed = 1234; call streaminit(seed); ns = 100; n = 10; p = 0.3; do i = 1 to ns; x = rand('binomial',p,n); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.52 8.10 Function to generate Poisson distribution random variables To generate random numbers from Poisson(λ) ๐ −๐ ๐๐ฅ ๐ ๐ฅ = , ๐ฅ! for ๐ฅ = 0, 1, 2, โฏ R porgram > > > > > # Generate Poisson random variables n <- 100 lambda <- 3 x <- rpois(100, lambda) x ST2137 Computer Aided Data Analysis CYM 10.53 8.10 Function to generate Poisson distribution random variables (Continued) SAS program /* Generate Poisson random variables */ data poisno; seed = 1234; call streaminit(seed); n = 100; lambda = 3; do i = 1 to n; x = rand('poisson',lambda); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.54 8.11 Function to generate Hypergeometric distribution random variables To generate random numbers from Hypergeo(n, N, S) ๐ ๐ ๐ฅ = ๐ฅ R program > > > > > > > ๐−๐ ๐−๐ฅ , ๐ ๐ for ๐ฅ = 0, 1, 2, โฏ , min(๐, ๐) # Generate Hypergeometric random variables ns <- 100 n <- 10 S <- 20 N <- 50 x <- rhyper(ns,S,N,n) x ST2137 Computer Aided Data Analysis CYM 10.55 8.11 Function to generate Hypergeometric distribution random variables (Continued) SAS porgram /* Generate Hypergeometric random variables */ data hypergeono; seed = 1234; call streaminit(seed); ns = 100; popnsize = 10; numbsucc = 5; samplsize =3; do i = 1 to ns; x=rand('hyper',popnsize,numbsucc,samplsize); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.56 8.12 Function to generate Negative Binomial distribution random variables To generate random numbers from NBinom(r, p) ๐+๐ฅ−1 ๐ ๐ ๐ฅ = ๐ 1 − ๐ ๐ฅ, ๐ฅ for ๐ฅ = 0, 1, 2, โฏ R program > > > > > > # n r p x x Generate Negative Binomial random var <- 100 <- 10 <- 0.3 <- rnbinom(n, size=r, prob=p) ST2137 Computer Aided Data Analysis CYM 10.57 8.12 Function to generate Negative Binomial random variables (Continued) SAS program /* Generate Negative Binomial random variables */ data negbinomno; seed = 1234; call streaminit(seed); n = 100; p=0.3; k=5; do i = 1 to n; x = rand('negbinomial',p,k); output; end; keep x; run; ST2137 Computer Aided Data Analysis CYM 10.58