Inferential Statistics: Introduction to Confidence Intervals and

advertisement
Inferential Statistics: Introduction to
Confidence Intervals and Hypothesis Tests
We will now look at two basic tools of inferential statistics:
A basic example:
Suppose a law firm specializes in class action lawsuits with large payouts
for their clients. We will take the role of an outsider to the firm (either an
interested client, a Revenue Canada agent, or even a criminal investigator).
1) the confidence interval (or parameter estimation)
2) the hypothesis test
In this unit we will look at both tools under very simplified conditions.
Once we have covered the basic ideas, we will expand our methods to
more realistic situations in the course’s final units.
Again, our general goal in inferential statistics is to make
predictions/decisions on a population based on limited sample data.
While we may not have access to the company’s internal files (with all of
their payout records), we certainly should be able to collect a sample of
recent claims and the size of the payouts. Suppose we have collected a
random sample of 50 individual claims.
The average payout for this sample is 0=$116,000.
This is, of course, just a sample average, and may or may not reflect the
true value of the population average :, i.e. the average payout across all
clients in this firm.
We are interested in the value of : (perhaps we are interested in hiring this
firm to represent us, or we are investigating them for potential tax
evasion).
We will address the question of “What is the average payout for claims by
this law firm?” in two ways:
Page 1 of 26
Page 2 of 26
1.
We may simply want to estimate the average payout to all clients. We have
collected data and calculated a sample average 0, now we want to take this
statistic and use it to make our best possible estimate of the value of :.
To begin with, we will look at both of these processes under very
simplified conditions. Specifically, we will assume that:
i) The sample we collected was a simple random sample with no
difficulties (such as non-response errors)
This process is called parameter estimation.
ii) The variable under investigation (here, the client payout amount) is
normally distributed.
2.
We may question or challenge a claim made by the law firm. Perhaps their
advertising states that their average payout is at least $120,000 per client.
Is the sample statistic that we collected (which, at 0=$116,000 is less than
the claimed value of :$$120,000) enough to reject the company’s claim?
This process is called a confidence test.
iii) We know the population standard deviation, F.
For this example, we will assume that F=$18,000.
In rare cases these assumptions are reasonable. We certainly can take an
SRS from some populations. Many variables are at least approximately
normal. And we may know the value of F from previous studies (e.g. a
census) and can reasonably assume that the variation has not changed
much.
In this case,
We will deal with more realistic situations (in particular, where (ii) and
(iii) do not hold), in later units.
Page 3 of 26
Page 4 of 26
Estimation of Parameters using Confidence Intervals
A second option in parameter estimation is to assign a range of values to a
parameter. This is called an interval estimator.
How do we estimate a parameter (e.g. :) from sample data (e.g. 0).
The simplest option would be to simply use our found value of 0 as our
“best” estimate of :. Here we call 0 a point estimator of :. In other words,
we estimate that :=$116,000.
e.g. we may say that the value of : is somewhere between $115,000 and
$117,000.
We can then make a probabilistic statement about the nature of this
interval estimate.
Advantages:
e.g. we may say that the above interval was produced using a method that
produces correct results 95% of the time.
Disadvantages:
We would combine the above interval and probability to say that a 95%confidence interval for the value of : is
$115,000 < : < $117,000
Typical unbiased point estimators are
Now, these particular two numbers were just “pulled out of thin air”. Let’s
look at the process to determine the actual interval range.
Parameter
Statistic
(Point Estimator)
Mean µ
0
Standard Deviation
F
s
Proportion p
(i.e. binomial)
(p hat)
Page 5 of 26
Page 6 of 26
Example 1:
Example 2:
A sample of 50 claim payments by a law firm has mean $116,000. We will
assume (as per our simple conditions) that claim levels are normally
distributed and that the standard deviation for claims across all clients is
F=$18,000.
A sample of 40 randomly selected Canadians males has a mean height of
182.3 cm. We will assume that height is normally distributed, and we
know that F=9.6 cm.
The sampling distribution for the sample averages is normally distributed
with
Under these assumptions, construct a 95% confidence interval for the
height of Canadian males.
mean :0 =
and standard deviation F0=
Recall that, by the empirical rule, 95% of all cases lie within two standard
deviations (i.e. $5091) of the mean.
That is, the interval
$116,000 - $5091 = $110,909
to $116,000 + $5091 = $121,091
is obtained from a process that includes the population
mean in 95% of all samples.
We say that the 95% confidence interval for : is
$110,909 < : < $121,091
Page 7 of 26
Once again, be sure to understand the interpretation of the above
statement. Our particular interval estimate may or may not actually “catch”
the true value of :. However, 95% of all possible samples of size n=40
would have a sample average within 3.04 cm of :.
Page 8 of 26
Example 3:
Terminology:
Given the same conditions as in Example 2, how would we construct a
90% confidence interval for :?
The interval estimate that we construct (e.g $110,909 to $121,091) is
called a confidence interval, and the two numbers are the confidence
limits.
Our probability is called the confidence level (if expressed as a percentage
95%) or the confidence coefficient (if expressed as a proportion 0.95), and
denoted 1-". (here 1-"=.95, so "=.05, i.e. " is the area left out in the two
“tails” of our distribution).
We sometimes use the notation zx to indicate the z-value that has a
proportion “x” of the normal curve beyond it,
e.g. in a 95% confidence interval, we are looking for z.025,
as we want 2.5% of the graph to be at each tail end.
Note that z.025 = z"/2
Finally, the confidence interval for our population mean is constructed as
0- z"/2 F0 < : < 0+ z"/2 F0
The value z"/2 F0 is also called the maximum error of estimate (and
sometimes denoted E). It measures the largest possible distance the true
population mean could have from your sample mean at the given
confidence level, i.e. our “margin of error”.
Page 9 of 26
Page 10 of 26
Some Notes:
Example
1. Don’t think that a higher confidence level is always better. As the
confidence level increases,
Speed limit on Highway 11 is 110 km/h.
A sample of 32 cars on Highway 11 have average speed of 0=104 km/h.
You may assume that the population is normal with F=34.1 km/h. With
confidence level 99%,
a) find the maximum error of estimate
2. You must always decide on the confidence level before your
experiment. If you try out different levels and pick the one that you
think fits best, you are introducing your own bias into the results.
3. A common misconception: The statement “Poll results are accurate to
±3% 19 out of 20 times” means that a 95% confidence interval has
been constructed, and the maximum error of estimate was 3%.
This does not mean that there is a 95% chance that the population
mean will be in the given interval!
b) construct the 99% confidence interval for :
You can only say that 95% of all confidence intervals constructed
in this manner will contain the population mean. This particular
confidence interval may be way off (and you will never know)!
This is a subtle, but important point.
Note: this interval contains the cut-off value 110 km/h! What does this
imply?
Page 11 of 26
Page 12 of 26
Determining a reasonable sample size
Depending on our confidence level (90%, 95%, 99%) and our desired
maximum error of estimate (e.g. up to 2 units away from the mean) we can
determine what a good sample size would be.
Since E = z"/2 F0
i.e.
E = z"/2 (F//n)
we can solve this for n=(z"/2 F / E)2
Example:
Suppose you wish to pick a sample of lightbulbs to estimate their mean
lifetime (in hours). You wish to be 95% confident that the estimate is
accurate within 20 hours. How many lightbulbs should you sample? (you
can assume that the population is normal with F=74 hours).
Page 13 of 26
Page 14 of 26
Hypothesis Testing - The Basics
We can’t say for sure, there are two possibilities:
Let us now return to our original example (the law firm client payouts).
Recall that the law firm claimed that their average payout was at least
:=$120,000. We took a sample of 50 claims and found that the sample
average was 0=$116,000.
1) The population average is in fact not $120,000 as claimed, and our
sample simply reflects this fact.
Using an assumed value for F=$18,000, we previously showed that a 95%
confidence level for the population average payout is $110,909 < : <
$121,091.
We will now consider the same situation from the point of view of the law
firm’s claim.
In this case we would reject the claim that the average payout (for
all clients) is $120,000.
2) The population mean payout is in fact $120,000, but we happened (by
chance) to pick a sample of 50 cases with lower payouts from that
population.
In this case we wouldn’t have enough evidence to reject the claim
that the average payout is $120,000.
A claim is being made on a population parameter, in this case the claim is
that population average payout is at least :=$120,000. Our sample shows a
lower average payout of 0=$116,000.
The basic question: given that our sample average is lower, should we
doubt (or reject) the company’s claim?
How likely is Scenario #2?
Let’s assume the population average payout really is $120,000. If we take
a sample of 50 cases from this population, the resulting sampling
distribution would be approximately normal with
and
.
We can use these values to calculate how likely is it that we draw a sample
of 50 cases with a sample average of $116,000 or less.
Page 15 of 26
Page 16 of 26
If we find this probability is very (!) low, then we can conclude that this
couldn’t reasonably have happened by chance, hence our actual mean
couldn’t reasonably be $120,000 (but is likely lower).
Terminology
The hypothesis test is a very structured process with several steps that need
to be followed precisely. We will use the following definitions:
If, on the other hand, this probability is reasonable, then it is certainly
possible that - even with a population mean of $120,000, we still picked a
random sample with sample mean $116,000.
Null Hypothesis H0:
The hypothesis that the value of : remains unchanged from some value k.
What is our cut-off probability, i.e. how small a probability do we allow?
This could be written in one of two ways:
This is entirely our choice, and is called the level of significance of our
test. If we want to reject the claim if our probability drops below 5%, then
our level of significance would be "=0.05
1) :=k. In this case we would reject this hypothesis if we think that : is
either larger or smaller than k. This is called a two-tailed test.
In other words, if we use "=0.05, we will reject the company’s claim of
:=$120,000 if the probability of picking our random sample with
0=$116,000 is less than 5%. In that case, we would say that our sample
average is significantly different enough from the claimed population
average to reject that claim.
Now calculate the probability and make your decision: ...
Page 17 of 26
2) :#k or :$k. In this case we’d only reject the hypothesis if we think that
: is larger (!) or smaller (!) than k. This is called a one-tailed test.
In our example, the company’s claim was that payouts were at least
$120,000. Hence the Null Hypothesis is
H0: :$$120,000
and we will only reject the claim if we feel that : is in fact less than
$120,000.
Page 18 of 26
Alternate Hypothesis H1:
Errors:
The hypothesis that the value of : was changed, i.e. the “opposite” of H0.
Note that, in reality, H0 may or may not be true.
This is either written as :…k, :>k, :<k, depending on the Null
Hypothesis.
*
In our example H1: :<$120,000
(the opposite of H0: :$$120,000)
“Defendant was innocent, but jury convicted them.”
*
Statistical Test:
The process of calculating a sample statistic test-value (such as a mean) to
decide whether or not there are sufficient grounds to reject a Null
Hypothesis.
Important: We can only reject H0 or not reject H0. We can never accept H0!
(i.e. we can only answer the claim saying “we don’t believe the average
payout is over $120,000” or “we have no reason not to believe your
claim”. We would never answer “we believe your claimed value”. The
reason for this will become apparent later.)
Page 19 of 26
If H0 is indeed true (the population average is indeed $120,000)
but based on our sample we reject H0 (i.e. we happened to pick 50
cases with lower payouts) then we have made a Type I Error.
If H0 is false (the average payout is not $120,000) but based on our
sample we do not reject H0 (i.e. we happened to pick 50 cases close
to $120,000) then we have made a Type II Error.
“Defendant was guilty, but jury let them go.”
Note that, using the law-and-order example, a very “trigger-happy” jury
would be more likely to make a Type I error, while a very “laid-back” jury
would be more likely to make a Type II error.
On which side would you rather fall?
Page 20 of 26
Critical Values:
Example
Usually, we wish to avoid making a
error, hence we set a low
level of probability of making such an error.
This cut-off level for our probability is again called our level of
significance and denoted ".
An insurance company states that the mean payout for automobile
accidents is at least $18,500. A researcher wants to test the accuracy of this
value by sampling 36 random cases. Their mean is found to be $17,585.
Assuming that the population standard deviation for payouts is $2,600, test
the insurance company’s claim at a significance level of "=0.05.
The associated cut-off test statistic is called the critical value (for example,
if we use z-scores, the z-value that would result in a probability of 5%
would be our critical value). Any z-value beyond the critical value is said
to be in the rejection region.
The Five Step Approach:
To summarize all of these ideas, we tend to go through a hypothesis test in
five fairly rigid steps.
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Page 21 of 26
Page 22 of 26
Example
Example
A researcher stated in a 2004 report that the average child watches 520
minutes of TV per week. To test whether or not this value has changed
since then, the TV viewing habits of 40 children are monitored. The
sample watches an average of 500 minutes of TV per week. Assuming that
F=80 minutes is given, is the 2004 claim still reasonable? Perform a
hypothesis test at the 0.1 level of significance.
A snack food manufacturer claims that their chips bags contain at most
80g of fat. A random sample of 45 chips bags has mean fat content 80.5g
(assume F=1.6g). Perform a hypothesis test at a 0.025 level of
significance.
Page 23 of 26
Page 24 of 26
Using P-values in Hypothesis Testing
Example
So far, we have found a critical point (and hence a rejection region) and
have based our conclusion on whether or not a test-statistic falls into the
rejection region or not.
A healthy invididual has an MCV (average volume of a red blood cell) of
90 femtoliters (fl). (Note: a femtoliter is 10-15 litres). A particular
chemotherapy drug is believed to cause a change in MCV levels. A sample
of 88 patients has a mean volume of 92 fl. Assume a population standard
deviation of 8.3 fl.
No indication is made how close or how far a statistic is from the rejection
region.
To indicate this, we can also use a P-value: the probability of getting a
sample statistic as far away from the population parameter as we did.
Using "=0.01 and calculating the P-Value, test the claim that the drug
causes a change in MCV levels.
For example: If the population of chip bags had mean fat content of 80 g
with F=1.6g, what is the probability that a random sample of size 45 has
sample average 80.5g? That probability is called the P-value.
In our case we found the test-statistic to be z=2.10. From the z-table we
find an area of A=.9821, so .0179 of the curve lies beyond this z-value.
This is our P-value, the probability of getting a sample that is as far (or
further) from the population mean as we got is 0.0179.
We know the rejection region contains 0.025 of the curve’s area (a onetailed test at 95% level of significance). Hence, as 0.0179<0.025 our teststatistic lies beyond the critical value, we (again) reject the nullhypothesis. Since 1.79% is not that far below 2.5%, it is however not a
very “strong” rejection.
In our Five Step Approach, we would calculate the p-value in Step 3.
Page 25 of 26
Page 26 of 26
Download