2G1503 Simulation and Modeling Exercise #4 Ali Ghodsi aligh@imit.kth.se 1 Random Variates: normal 1/5 • Generating random variaties for the standard normal distribution N(0,1): 1 f ( x) = e 2π x2 − 2 • However, ITT, cannot be applied since the CDF cannot be retrieved in closed form! x F ( x) = ∫ −∞ 1 e 2π t2 − 2 dt 2 Random Variates: normal 2/5 • Generating random variaties for the standard normal distribution N(0,1): – Assume two random variables X and Y from N(0,1) – We know: X=Z cos(θ ) Y=Z sin(θ ) Z2=X2+Y2 Y-axis (X, Y) Z θ X-axis 3 Random Variates: normal 3/5 • We know: X=Z cos(θ ) Y=Z sin(θ ) Z2=X2+Y2 Y-axis (X, Y) Z θ X-axis • It turns out that: – Z2 is exponentially distributed with λ=1/2 – θ is uniformly distributed On [0, 2π) 4 Random Variates: normal 4/5 • To generate two values x and y that are normally distributed N(0, 1) : Generate an exponentially distributed z2 with λ=1/2 and a uniformly distributed θ on [0, 2π) z 2 = −2 ln( R1 ) where R1 is uniformly distributed on [0,1) θ = 2πR2 where R2 is uniformly distributed on [0,1) • Formula: x = − 2 ln( R1 ) cos(2πR2 ) y = − 2 ln( R1 ) sin(2πR2 ) 5 Random Variates: normal 5/5 • To transform x and y from N(0, 1) into v1 and v2 N(µ, σ2): – v1 = µ + xσ v2 = µ + yσ – Both v1 and v2 will be normally distributed on N(µ, σ2) 6 Develop an input model • Develop an input model for the following historical data, n=50: 79.919 3.081 0.062 1.961 5.845 3.027 6.505 0.021 0.013 0.123 6.769 59.899 1.192 34.760 5.009 18.387 0.141 43.565 24.420 0.433 144.695 2.663 17.967 0.091 9.003 0.941 0.878 3.371 2.157 7.579 0.624 5.380 3.148 7.078 23.960 0.590 1.928 0.300 0.002 0.543 7.004 31.764 1.005 1.147 0.219 3.217 14.382 1.008 2.336 4.562 7 Input Modeling • Making an input model: 1. 2. 3. 4. Collect Data (already given) Identify PDF Estimate Parameters Perform Goodness-of-Fit test 8 Step 2: Identifying a PDF • Determine the number of intervals: – • √n = √50 ≈ 7.1 ≈ 7 intervals Determine interval widths: – – – The data seems to have 2 high extreme values : 144.695 and 79.919 Disregard those, data seems varies between [0.002, 59.899] (59.899-0.002) / 7 ≈ 8 9 Histogram try #1(coarse) e M or 64 56 48 40 32 24 intervals 16 Try smaller 40 35 30 25 20 15 10 5 0 8 Many small values! Coarse! Frequency Histogram 10 Histogram try #2 Histogram 30 25 20 15 10 5 or e M 57 51 45 39 33 27 21 15 9 0 3 Interval Width 3 11 Step 2: PDF is exponential • Looks exponential! Histogram 30 25 20 15 10 5 or e M 57 51 45 39 33 27 21 15 9 3 0 12 Step 3: Estimate parameter(s) • The exponential distribution only has one parameter, the inverse mean λ=1/E[X] • E[X] = 3.027 + 6.505 + ... + 4.562 / 50 = 11.894 • λ = 1/11.894 = 0.084 13 Step 4: Goodness-of-fit Test • Test the null-hypothesis: H0: the sample has exponential distribution with λ=0.084 with =0.05 • We choose to use the Chi-Square Test 14 Chi Square Test (1/2) • The intervals of the Chi Square test can be either equi-probable or of equal width. – Use equi-probable for continous distributions • Exception: Normal Distribution – Use equal width for discrete distributions 15 Chi Square Test Intervals (2/2) The number of intervals should be: Sample Number of Size, n Intervals, k • If equi-probable intervals are chosen, then the number of intervals, k, should be < n/5 20 Don’t use Chi Square! 50 5 to 10 100 10 to 20 >100 √n to n/5 16 Chi Square Test: exponential, equi.prob. • Null-hypothesis: H0: the sample has exponential distribution with λ=0.084 with =0.05 • Continous distribution, use equi-probable intervals! • Rembember: – −1 F (r ) = − ln(1 − r ) λ , for 0.0 ≤ r < 1.0 17 Chi Square Test: exponential, equi.prob. • If our n=50, the number of classes, k, has to be ≤n/5 = 10 • Assume k=8, then each interval should have the probability 1.0 / 8 = 0.125 • The endpoints for the inverse intervals ax are: a0=0.0, a4=0.5, a8=1.0 a1=0.125, a5=0.625, a2=0.25, a6=0.75, a3=0.375, a7=0.875, 18 Chi Square Test: exponential, equi.prob. • The endpoints for the inverse intervals ax are: a0=0.0, a4=0.5, a8=1.0 a1=0.125, a5=0.625, a2=0.25, a6=0.75, a3=0.375, a7=0.875, • To get the real interval endpoints i0,...,i10: – – – – i0=0, i1 = F-1( a1 ), ..., i8= F-1(a8 ), i9=∞ i0=0, i1=0.590, i2=3.425, i3=5.595 i5=11.677., i6=16.503, i7=24.755 i4=8.252, i8=∞ 19 Chi Square Test: exponential, equi.prob. • Interval Observed, Oi Expected, Ei (Oi-Ei)2 / Ei 0,1.59 19 6.25 26.01 1.59,3.4 10 6.25 2.25 3.4,5.6 3 6.25 0.81 5.6,8.3 6 6.25 0.01 8.3,11.7 1 6.25 4.41 11.7,16.5 1 6.25 4.41 16.5,24.8 4 6.25 0.81 24.8, ∞ 6 6.25 0.01 50 50 39.6 Sum 20 Chi Square Test: exponential, equi.prob. • χ2 value is 39.6 • For a 5% chance that the data is rejected even though normally distributed (=0.05), k-s-1 = 8-1-1 = 6 degrees if freedom, χ20.05, 6 is 16.8. • χ2>χ20.05, 6 the null-hypothesis is rejected! 21 Chi Square Test: exponential, equi.prob. • Even χ20.01, 6=12.6 would reject the nullhypothesis. (only 1% of the normally distributed samples with 6 degrees of freedom have values less than 12.6 • I.e. with 99% certainty the data is not normally distributed! 22