Lab 1 – The Sampling Distribution Using Excel

advertisement
Econ 488 – Applied Managerial Econometrics
Cameron Kaplan, Fall 2010
9/3/10
Lab 1 – The Sampling Distribution Using Excel
(Adapted from Hunt and Mashhoundy, 1997)
Directions: Complete this assignment using MS Excel. Your write-up should be typed
and be no longer than 3 pages (including graphs). Due in class next week.
Part 1: Sampling from a normal population
Suppose we are concerned about the speeds of cars on a local highway. It is reasonable
to assume that the speeds follow a normal distribution with mean 50 mph and standard
deviation 16 mph. These can be simulated using the NORMINV function in Excel. We
shall take 100 samples, in rows 16 to 115 of the worksheet.
Part 1 Procedure:
1. In B15 enter the value 1 to indicate the first car in the sample.
2. Suppose we only observe the speed of one car on the highway, drawn at random.
In B16 enter the formula =NORMINV(RAND( ), 50, 16). This generates a
random normal variable with a mean of 50 and a standard deviation of 16.
3. In Column A we are going to calculate the sample means so in A15 enter the
label "Means"
4. In A16 enter the formula =AVERAGE(B16:Z16), allowing for sample sizes of
up to 25. This relies on the spreadsheet ignoring empty cells when computing the
average. For now, our sample size is just 1, so the mean will have the same value
as the observation.
Now we want to repeat this 100 times, each time drawing a sample of size 1:
5. Select A16:B16 and copy down to B115. Format cells to 1 decimal place.
6. Select A16:A115 and name it Means (using Insert Name Define). Initially the
sample size is 1, so the means equal the sample values and give a picture of the
population itself.
The spreadsheet should now look like Figure 1.
7. Enter the label "Limits" in A1 and the limits 10, 20, ... , 100 in A2:A11.
8. Select A2:A11 and name it Limits.
9. Enter the midpoints 5, 15, ..., 95 in B2:B11. These are required because Excel
labels the midpoints of bars on a column chart, rather than the boundaries.
10. To set up the frequency tally, highlight C2:C11 and enter the array formula
=FREQUENCY(Means, Limits) by pressing CTRL and SHIFT and ENTER.
11. Use the Chart Wizard to construct the histogram of B1:C11.
12. The formulae for the summary statistics in Row 13 are =MIN(Means),
=MAX(Means), =AVERAGE(Means), =STDEV(Means).
The completed spreadsheet is shown in Figure 2 where, to facilitate increasing the sample
size, rows 19 to 113 have been hidden (using Format Rows Hide). To increase the
sample size, simply select B15:B115 and use the drag-handle (or Copy and Paste) to
copy the formulae and heading across to column C and beyond. To reduce the sample
size, simply drag the selection back again (or Delete).
Figure 1: Initial Spreadsheet
A
B
C
14
15
Means
1
16
53.1
53.1
17
62.6
62.6
18
55.2
55.2
19
47.1
47.1
20
63.3
63.3
Figure 2: Competed Spreadsheet.
Part 1 Write-up:
1. Create a table showing the mean and standard deviation (or standard error, as it is
commonly referred to in this context) of the distribution of sample means for
several different sample sizes. Include at least 4 different sample sizes. Note that
if you want to create a sample size larger than 25, you will have to revise the code
from step 4.
2. What happens to the mean of the sample means as the sample size is increased?
What happens to the standard error of the sample means as the sample size is
increased? Can you find the implied standard deviation of the sample? (Hint: see
Studenmund pg. 555) Put this into your chart.
3. Based on what you just found, why might a larger sample size be better than a
small sample size?
Part 2: Sampling from a Uniform Population
Now suppose that each motorist surveyed is asked to estimate what percentage of their
journey distance has been completed. It is reasonable to assume that this is uniformly
distributed between 0% and 100%. To simulate data from this population requires the
simple formula =100*RAND( ).
Procedure:
1. Clear out the sample values in columns C to Z.
2. Change the formula in B16 to =100*RAND( ) and copy down to B115.
Write-Up Part 2:
1.
Create a table similar to the table in part 1, showing the mean and standard error
of the distribution of sample means for several different sample sizes. Also
include the implied standard deviation of the sample. Include at least 4 different
sample sizes.
2. What happens to the sampling distribution as the sample size increases (Hint: look
at your histogram)? How does this relate to the Central Limit Theorem?
Part 3: Sampling from a Skewed Population
Suppose we now turn our attention to the time-gap between each car passing a certain
point. These inter-arrival times could follow an Exponential distribution, which is skewed
towards smaller values. We shall assume that one car passes every 30 seconds on
average, so that the population mean and s.d. are both 30. These times can be simulated
using a transformation of RAND( ).
Part 3 Procedure:
1. Clear out the sample values in columns C to Z.
2. Change the formula in B16 to = - 30*LN( 1 - RAND( ) ) and copy down to B115.
Note that one or two means may exceed 100 and are therefore missing from the
histogram.
Part 3 Write-Up:
1.
Create another table, showing the mean and standard error of the distribution of
sample means for several different sample sizes. Also include the implied
standard deviation of the sample. Include at least 4 different sample sizes.
2. What happens to the sampling distribution as the sample size increases?
Approximately how large must the sample be for the sampling distribution to look
normal?
Part 4: Sampling from a Binary Population
Finally, consider the percentage of drivers who are female. Let 100 represent a female
and 0 represent a male. Assuming an equal number of male and female drivers, we are
thus sampling from a population consisting of an equal number of 0’s and 100’s, with
mean 50 and s.d. 50.
Part 4 Procedure:
1. Clear out the sample values in columns C to Z.
2. Change the formula in B16 to =IF(RAND( )<0.5, 0, 100) and copy down to
B115.
Part 4 Write-Up:
1. Create another table, showing the mean and standard error of the distribution of
sample means for several different sample sizes. Also include the implied
standard deviation of the sample. Include at least 4 different sample sizes.
2. Copy the histogram generated from a sample size of 1 into a word document, and
label it. Copy the histogram from a sample size of 16 into the word document.
Be sure to label both graphs. Comment on the differences between the two
graphs and what this demonstrates.
Download