SA421 – Simulation Modeling Asst. Prof. Nelson Uhan Spring 2013 Lesson 12. Chi-Square Goodness-of-Fit Test 1 Using Excel’s functions ● Last time: Inverse transform method ○ X is a continuous random variable with cdf F ○ U is a Uniform[0, 1] random variable ○ F −1 (U) is a random variate generator for X ○ Generate observations of U ⇒ generate observations of X ● Normal distribution with mean µ and standard deviation σ ○ F −1 (u) = NORMINV(u, µ, σ) ● Gamma distribution with parameters α, β ○ Mean µ = αβ, variance σ 2 = αβ2 ○ α = 1 ⇒ exponential distribution with rate parameter λ = 1/β (mean β) ○ F −1 (u) = GAMMAINV(u, α, β) ● Let’s use Excel to generate ○ 10 normal random variates with mean 2 and standard deviation 0.5 ○ 10 gamma random variates with parameters α = 2 and β = 3 2 Chi-square goodness-of-fit test ● How do we know if a set of observations are in fact generated from a particular random variable? ● The Chi-square goodness-of-fit test: examines the expected number of observations in a family of subintervals to determine how closely the observations fit a particular distribution ● Works essentially the same way as the Chi-square test for uniformity ● Let F be a cdf of a random variable X ● Let Y1 , . . . , Yn be n independent random variables ● Do Y1 , . . . , Yn share a common cdf F? ● Divide real line into m subintervals: [a1 , b1 ], [a2 , b2 ], . . . , [a m , b m ] ● The null hypothesis for this test: if Y represents any of the Yj ’s, 1 ● Let y1 , . . . , y n be observations of Y1 , . . . , Yn ● Let e i = expected number of observations in interval [a i , b i ] ○ Rule of thumb: set up intervals so e i ≥ 5 for i = 1, . . . , m ● Let o i = observed number of observations in interval [a i , b i ] ● The observed test statistic is ● The p-value is ○ Small p-values (< α, where α is typically 0.05, or even 0.01) ⇒ reject H0 3 Conducting the Chi-square goodness-of-fit test in Excel ● In the chi-square sheet in the Excel workbook for today’s lesson, there are 100 numbers ● Are they from a Normal distribution with mean 2 and standard deviation 0.5? ● Use MIN and MAX function to determine interval of observation values ● Determine subintervals (don’t forget −∞ and +∞ if applicable) ● Use FREQUENCY function to determine the number of observations in subinterval i ● Compute the expected number of observations in subinterval i ○ cdf F of Normal random variable with mean µ and standard deviation σ F(x) = NORMDIST(x, mean, stdev, TRUE) ● Merge subintervals so that the expected number of observations ≥ 5 for each resulting subinterval ● Compute the observed test statistic and p-value 4 On your own ● Generate 100 exponentially distributed random variates with mean 1/2 (Remember that the gamma distribution with α = 1 and β is the exponential distribution with mean β.) ● Use a Chi-square goodness-of-fit test to test how well your random variates fit with an exponential distribution with mean 1/2. 2