Lesson 12. Chi-Square Goodness-of-Fit Test 1 Using Excel’s functions

advertisement
SA421 – Simulation Modeling
Asst. Prof. Nelson Uhan
Spring 2013
Lesson 12. Chi-Square Goodness-of-Fit Test
1
Using Excel’s functions
● Last time: Inverse transform method
○ X is a continuous random variable with cdf F
○ U is a Uniform[0, 1] random variable
○ F −1 (U) is a random variate generator for X
○ Generate observations of U ⇒ generate observations of X
● Normal distribution with mean µ and standard deviation σ
○ F −1 (u) = NORMINV(u, µ, σ)
● Gamma distribution with parameters α, β
○ Mean µ = αβ, variance σ 2 = αβ2
○ α = 1 ⇒ exponential distribution with rate parameter λ = 1/β (mean β)
○ F −1 (u) = GAMMAINV(u, α, β)
● Let’s use Excel to generate
○ 10 normal random variates with mean 2 and standard deviation 0.5
○ 10 gamma random variates with parameters α = 2 and β = 3
2
Chi-square goodness-of-fit test
● How do we know if a set of observations are in fact generated from a particular random variable?
● The Chi-square goodness-of-fit test: examines the expected number of observations in a family of
subintervals to determine how closely the observations fit a particular distribution
● Works essentially the same way as the Chi-square test for uniformity
● Let F be a cdf of a random variable X
● Let Y1 , . . . , Yn be n independent random variables
● Do Y1 , . . . , Yn share a common cdf F?
● Divide real line into m subintervals: [a1 , b1 ], [a2 , b2 ], . . . , [a m , b m ]
● The null hypothesis for this test: if Y represents any of the Yj ’s,
1
● Let y1 , . . . , y n be observations of Y1 , . . . , Yn
● Let e i = expected number of observations in interval [a i , b i ]
○ Rule of thumb: set up intervals so e i ≥ 5 for i = 1, . . . , m
● Let o i = observed number of observations in interval [a i , b i ]
● The observed test statistic is
● The p-value is
○ Small p-values (< α, where α is typically 0.05, or even 0.01) ⇒ reject H0
3
Conducting the Chi-square goodness-of-fit test in Excel
● In the chi-square sheet in the Excel workbook for today’s lesson, there are 100 numbers
● Are they from a Normal distribution with mean 2 and standard deviation 0.5?
● Use MIN and MAX function to determine interval of observation values
● Determine subintervals (don’t forget −∞ and +∞ if applicable)
● Use FREQUENCY function to determine the number of observations in subinterval i
● Compute the expected number of observations in subinterval i
○ cdf F of Normal random variable with mean µ and standard deviation σ
F(x) = NORMDIST(x, mean, stdev, TRUE)
● Merge subintervals so that the expected number of observations ≥ 5 for each resulting subinterval
● Compute the observed test statistic and p-value
4
On your own
● Generate 100 exponentially distributed random variates with mean 1/2 (Remember that the gamma
distribution with α = 1 and β is the exponential distribution with mean β.)
● Use a Chi-square goodness-of-fit test to test how well your random variates fit with an exponential
distribution with mean 1/2.
2
Download