Minitab Demonstration of Some Statistical Concepts

advertisement
Minitab Demonstration of Some Statistical Concepts
By
Dr Kanchan Mukherjee
Dept. of Statistics and Applied Probability, NUS.
Presented in the seminar The Art and Science of Teaching on
November 6, 2002.
Minitab is an user-friendly, menu-driven Statistics Software. Using Minitab,
many basic concepts of data-analysis, probability distributions (uniform, sum
of iid uniform r.v.) and approximations (normal approximation to binomial
and Poisson distribution), laws of large numbers, CLT etc. can be demonstrated to students while teaching in the classroom. I found it very useful
while taking tutorials for ST 1131 (Introduction to Statistics) and while
teaching ST 5201 (Basic Statistical Theory) for part-time M.Sc. Students
and USSP02 (Uncertainty) for University Scholar’s Program. In this talk,
I am going to share with you some of my experiences while using Minitab
there extensively.
Minitab has two windows: Session and Worksheet. First, let us change
the font so that the commands can be seen better. Go to the appropriate
window (Session/Worksheet) and click ‘Editor>Select font’. Usually, make
it 24.
1. First you can see descriptive statistics via Stat>Basic Stat>Display
Descriptive Statistics. You can also put the option of ‘Graphs’ to compare
with normality. In stead of real data, let us deal with a data of size n = 50
simulated from an exponential distribution with mean 5. Let us calculate the
descriptive statistics, and histogram. Let us see ‘density histogram’ obtained
by clicking ‘Graphs > histograms > Options’. Density histogram has total
area equal to 1 and it gives an approximate picture of the population density.
Draw boxplot to check for outliers.
2. Poisson approximation to binomial when n = 100 is large and p = 0.05
is small: First draw a plot for the pmf of a Poisson(λ = 5) as follows. Click
‘Calc > Make patterned data > Simple set of numbers’ from ‘First value=0’
to ‘Last value=100’. Then click ‘Calc > Probability distribution’ and use
Poisson distribution with ‘Input column’ C1 and Optional storage C2 and
mean 5. Then click ‘Graph > Plot’ to plot C2 vs C1. Next generate a
sample of size 1000 from a Binomial(n = 100, p = 0.05) distribution and
plot its density histogram to compare with the Poisson pmf with λ = 5. One
1
can also do normal approximation to binomial (when n is large and p is not
extremely close to 0 or 1).
3. If U has uniform distribution on (0, 1), then Y = U 2 has pdf g(y) =
I(0 < y < 1)/2y 1/2. We can demonstrate it through Minitab before giving
the derivation. Also, if U1 and U2 are iid U[0, 1], then U = U1 +U2 has a symmetric triangular-shaped density with base [0, 2]. We can demonstrate this
with Minitab by taking random samples and writing MTB> let C3=C1+C2
and then drawing density histogram.
4. We can demonstrate LLN by taking sample and calculating means. A
very useful one would be to compute integral say 01 ex x1/2 dx. Here no closed
form expression exists for the integral. To get an approximate numerical
value, first generate uniform sample. Then go to ‘Calc > Calculator’ and in
the expression, click on ‘Exponentiate (C1) * SQRT (C1)’. Then calculate
the mean.
5. Finally, CLT can be demonstrated as follows. First we motivate how
the density changes to symmetry if U1 + U2 + U3 is taken as in (3). Next, we
will compute 1000 x-bars, where each x-bar is computed by taking a sample
of size n = 30 from an exponential distribution with mean 5. Unfortunately,
Minitab can not compute column means. So we proceed as follows.
MTB > random 1000 C1-C30;
SUBC > exponential 5.
MTB > rmean C1-C30 C31
Then compute density histogram of C31. It should look like N[µ =
5, σ 2 = (5)2 /30].
One can also investigate what can happen for Cauchy distribution. Here
CLT will not hold.
Here are some more exercises for you to work on. First windowshop
minitab from NUSNET.
For our purpose Calc, Stat and Graphs items will be most useful.
Under Calc, Random Data generates random numbers from specified
distribution. Probability distribution calculates the cdf (cumulative distribution function) at specified points. One can Statndardize observations
also.
Under Stat, there are Basic Stat (descriptive statistics, correlation, test
of hypotheses), Regression (equation of regression line, residual plots),
ANOVA, Time Series (plots, trend analysis, autocorrelation function,
moving averages), Nonparametric tests and estimation and EDA or
2
exploratory data analysis (boxplot, stem and leaf plot), among others.
Under Graph, there are histograms, boxplots etc. Under Manip, one
can Rank and Sort observations. All procedures are self-explanatory.
Apart from these, you can do certain algebraic manipulation with columns.
For example, C3 = C1 + C2 will add the entries of the colums vectors C1
and C2 and will store the result in C3.
Exercise 1. Generate say 200 observations from a N[0, 1] distribution.
Draw the histogram. Does it look like a N[0, 1] density?
Exercise 2. Generate say 1500 observations from a binomial distribution
with n = 200 and p = 0.45. Draw histogram. You should expect to see a
N[µ = 200 × 0.45, σ 2 = 200 × 0.45 × 0.55] distribution.
Exercise 3. Can you take a random sample of 20 students from 600 USP
students? Use Calc, Random Data and Sample from Integers.
Exercise 4. Choose 10 numbers. Get descriptive statistics and boxplot of
those set of numbers.
Exercise 5 (Law of large numbers). Generate 100, 200 and 500 data from
Binomial distribution with n = 1 and p = 0.40 (so 1 will denote a head
and 0 will denote a tail). For each one of them, calculate the mean using
Calc-Column Stat-Mean. Is it becoming close to 0.40?
Exercise 6 (CLT and sampling distribution). Generate 5 samples of size
n = 40 each from any distribution and for demonstration, let us take N[µ =
5, σ 2 = 3] (each one in the class should use this same distribution). Then
calculate the mean. So each student has 5 sample means. Suppose we have
10 students in this tutorial. So we have now 50 sample means X̄. Draw the
histogram of these 50 numbers. By the CLT, the histogram should look like
a normal distribution with mean 5 and variance 3/40 = 0.075.
3
Download