M 311 – L

advertisement
FRIDAY, APRIL 18, 2008
MATH 311
LAB 3 – SAMPLING DISTRIBUTIONS
DUE: TUESDAY, MARCH 22ND AT 3:00 P.M.
Part I: Insurance gumshoes – from the files of “real life.”
Download the dataset Firefraud.mtw from the class webpage.
Read the following carefully and then read it again before you start your analysis.
Here is the (true-life!) story behind the data:
A wholesale furniture retailer stores in-stock items at a large warehouse located in Florida. Several
years ago, a fire destroyed the warehouse and all the furniture in it. After determining the fire was
an accident, the retailer sought to recover costs by submitting a claim to its insurance company.
As is typical in a fire insurance policy of this type, the furniture retailer must provide the insurance
company with an estimate of “lost” profit for the destroyed items. Retailers calculate profit margin
in percentage form using the Gross Profit Factor (GPF). By definition, the GPF for a single sold
item is the ratio of the profit to the item’s selling price measured as a percentage. That is,
Item GPF = (Profit / Sales Price) * 100%.
Of interest to both the retailer and the insurance company is the average GPF for all of the items in
the warehouse. Since these furniture pieces were all destroyed, their eventual selling prices and
profit values are obviously unknown. Consequently, the exact value of the average GPF for all of
the warehouse items lost in the fire is unknown. The value can, however, be estimated.
One way to estimate the mean GPF of the destroyed items is to use the mean GPF of similar,
recently sold items. The retailer had sold 3,005 furniture items in the year prior to the fire and kept
paper invoices on these sales. Rather than calculate the mean GPF for all 3,005 items (the data were
not computerized), the retailer sampled a total of 253 of the invoices and computed the mean GPF
for these items as 50.8%. The retailer applied this average GPF to the costs of the furniture items
destroyed in the fire to obtain an estimate of the “lost” profit.
According to experienced claims adjusters at the insurance company, the GPF for sale items of the
type destroyed in the fire rarely exceeds 48%. Consequently, the estimate of 50.8% appeared to be
unusually high. (A 1% increase in GPF for items of this type equates to, approximately, an
additional $16,000 in profit.) Consequently, a dispute arose between the furniture retailer and the
insurance company, and a lawsuit was filed. In one portion of the suit, the insurance company
accused the retailer of fraudulently representing its sampling methodology. Rather than selecting a
sample randomly, the retailer was accused of selecting an unusual number of “high profit” items
from the population in order to increase the average GPF of the sample.
To support its claim of fraud, the insurance company hired a CPA firm to independently assess the
retailer’s true GPF. Through the discovery process, the CPA firm legally obtained the paper
invoiced for the entire population of 3,005 items sold the year before the fire and input the
information in to a computer. The selling price, the profit, profit margin, and month sold for these
3,005 furniture items are available in the file Firefraud.
In the following, you’ll re-create the CPA’s statistical analysis:
Suppose we want to know how likely it is to obtain a GPF value that exceeds the estimated mean
GPF of 50.8%. Since the data for all 3,005 items are available in the file, we can find the actual
mean and standard deviation for the entire 3,005 gross profit margins (you should do so now –
Stat>Basic Statistics>Display Descriptive Statistics). Assume that these data are normally
distributed and represent the ENTIRE population (i.e. you’re finding μ and σ, not just x and s).
Note, finding μ and σ is the only part of this part of the lab requiring the use of a computer.
QUESTION 1: Find the probability that a single randomly selected item will have a GPF that exceeds
50.8%.
QUESTION 2: Now find the probability that a sample of size 253 would have a mean GPF that
exceeds 50.8%
QUESTION 3: If you were the statistician retained by the CPA firm that the insurance company
hired, what would you recommend that the insurance company do next? Think carefully about this.
Part II: the amazing confidence interval
In this part of the lab, we’re going to have Minitab compute many, many 95% confidence intervals
and see what percentage of these confidence intervals capture the true mean of the distribution
(hopefully you can guess).
The population we will be pulling our data from will be uniformly distributed between 9 and 11.
Thus the mean of the population will be μ = 10 (obvious) and the standard deviation will be σ =
0.57735 (not so obvious… yet!).
Step 1: Start Minitab (if you closed it already).
Step 2: Have Minitab compute 10,000 rows of 36 columns (c1-c36) of uniformly distributed data
with a lower endpoint of 9.0 and an upper endpoint of 11.0. Recall menu series is Calc>Random
Data>Uniform…
This gives us 10,000 samples of size n = 36
If 10,000 take too long, try 5,000.
Step 3: Have Minitab compute the mean of each row and store that mean in column c38. Title this
column “Sample Means.”
To do this, select Calc>Row statistics
Click Mean
In the Input variables: box enter C1-C36
In the Store result in: box enter C38
Select OK
The mean of each of the 10,000 rows should appear in C38. We now have 10,000 means for 10,000
samples of size 36.
Step 4: Now we’re going to have Minitab compute a confidence interval for each of these 10,000
means.
First we compute the lower end of each interval and store it in c40:
Do this as follows: select Calc>Calculator and then type c40 in the “Store result in variable” box.
Then type c38 – 1.96*0.09622504 in the “Expression:” box.
Note: 0.09622504 = 0.57735/ 36   / n and 1.96*0.09622504 is about 2 standard deviations.
Title column c40 “Lower bound.”
Next compute upper bound for each of our 10,000 confidence intervals and store this in column c41
Do this as follows: select Calc>Calculator and then type c41 in the “Store result in variable” box.
Then type c38 + 1.96*0.09622504 in the “Expression:” box.
Title column c41 “Upper bound.”
Step 5: Now we need to figure out which, if any, of the confidence intervals we just computed
captured the true mean (recall μ = 10) of the population. So, select Calc>Calculator and then type
c43 in the “Store result in variable” box. Then type 10 >= c40 And (it’s a button) 10 <= c41 in the
“Expression:” box.
This command will return a 1 in column c43 if μ = 10 is in the confidence interval. It will return a 0
if μ = 10 is not in the confidence interval.
Step 6: Now let’s see how many times μ = 10 was captured by selecting Calc>Column Statistics…
then select Sum and an input variable of c43. Leave the “Store result in:” box empty. This
command will return the number of times the true mean of μ = 10 was captured.
QUESTION 4: What percentage of the 10,000 confidence intervals you computed captured μ = 10?
How does this compare to the 95% confidence you were trying to establish?
Download