STA 1020 Exam 3 of 3

advertisement
** STA 1020 - Part 3 (08/Dec/13) **
MATERIAL FOR EXAM #3
Contents
Exam 3 of 3: Chance and Inference
STA 1020
Quizzes every chapter and then Third Partial Exam
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
Chapter 17 - Thinking about Chance
Chapter 18 - Probability Models
Instructor: Dr. J.L. Menaldi
Chapter 19 - Simulation – mostly skipped!
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
Chapter 20 - The House Edge: Expected Values
Chapter 21 - What is Confidence Interval?
Chapter 22 - What is a Test of significance?
Chapter 23 - Use and Abuse of Statistical Inference – skipped!
“Statistics” is the Science of collecting, describing and interpreting data...
Chapter 24 - Two-way Tables and Chi-Square Test
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
JLM (WSU)
STA 1020
Ch17 - Thinking about Chance
1 / 112
JLM (WSU)
Thought Questions. . .
STA 1020
Ch17 - Thinking about Chance
2 / 112
Two Concepts of Probability
Part 3: Probability. The Theory of Statistics
Chapter 17
Personal-Probability Interpretation
Here are two very different probability questions:
Relative-Frequency Interpretation
The degree to which a given individual believes the event in question
will happen
Personal belief (or personal ignorance about something?)
If you roll a 6-sided die and do it fairly, what is the probability that it
will land with “3” showing?
What is the probability that in your lifetime you will travel to a
foreign country other than one you have already visited?
The proportion of time the event in question occurs over the long run
Long-run relative frequency
Two ways to determine the Relative-Frequency Probabilities
Physical assumptions (theoretical mathematical model)
For which question was it easier to provide a precise answer? Why? For
which one could we all agree?
Repeated observations (empirical results), i.e., by experience with
many samples or by simulation
What is wrong with the following partial answer: The probability that I
will eventually travel to another foreign country (or of any other particular
event happening) is 1/2, because either it will happen or it won’t
JLM (WSU)
STA 1020
Ch17 - Thinking about Chance
3 / 112
JLM (WSU)
Ex1 Coin tossing
STA 1020
Ch17 - Thinking about Chance
Figure 17.1 Toss a coin many times. The proportion of heads changes as we make more tosses
but eventually gets very close to 0.5. This is what we mean when we say, ”The probability of a
head is one-half.”
4 / 112
Ex2 Some coin tossers
The French naturalist Count Buffon (1707-88) tossed a coin 4040
times. Result: 2048 heads, or a proportion 2040/4040=0.5069 for
heads
Around 1900, the English statistician Karl Pearson heroically tossed a
coin 24,000 times. Results: 12,012 heads, a proportion of 0.5005
While imprisoned (WW2), the South African mathematician John
Kerrich tossed a coin 10,000 times. Result: 5067 heads, a proportion
of 0.5067
What is called a random phenomenon?
The probability of any outcome of a random phenomenon is a number
between 0 and 1 that describes the proportion of times the outcome would
occur in a very long series of repetitions
JLM (WSU)
STA 1020
5 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
1 / 19
6 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch17 - Thinking about Chance
Ex3 Cannot predict
Ch17 - Thinking about Chance
The National Center for Health Statistics says that the proportion of men
aged 20 to 24 years who died in any one year is 0.0014. This is taken as
the probability that a young man will die next year. For women that age,
the probability of death is about 0.0005.
If an insurance company sells many policy to people aged 20 to 24, it
knows (or believe?) that it will have to pay off next year on about 0.14%
(0.05%) of the policies sold to men’s (women’s) lives. Logically, it will
charge more to insure a man because the probability of having to pay is
higher. However, we cannot predict whether a particular person will die
on the next year. . .
Probability answer the question “What would happen if we did this
many times?”
The idea of probability is that randomness is regular in long run
STA 1020
Ch17 - Thinking about Chance
If a basketball player makes several consecutive shots, both the fans
and his teammates believe that he has a “hot hand” and is more
likely to make the next shot. . .
If a person win the lotto today, that same person has less change of
wining again next week. . . , winning the lottery twice?
Cancer is a common disease, accounting for more that 23% of all
deaths in US. That cancer cases sometimes occur in clusters in the
same neighborhood is not surprising: there are bound to be clusters
somewhere simply by change (or not?)
When a shooter in the dice game craps rolls several winners in a row,
some gamblers think she/he has a “hot hand” and bet that she/he
will keep on winning. Others say that “the law of average” means
that she/he must now lose so that wins and losses will balance out . . .
Ex8: We want a boy, the law of average affirms that . . .
If we toss a coin 6 times, which of these outcomes is more probable (or
look random) “HTHTTH”, “HTHTHT” or “TTTHHH” (pattern?)
JLM (WSU)
Ex5, 6, 7 and 8
7 / 112
JLM (WSU)
Law of average
STA 1020
Ch17 - Thinking about Chance
Law of the large numbers: in a large number of “independent” repetitions of a
random Phenomenon (such as coin tossing), averages or proportions are likely to
become Stable as the number of trials increases, contrary to sums or counts . . .
8 / 112
Again. . .
Relative-Frequency Probabilities
Can be applied when the situation can be repeated numerous times
(conceptually) and the outcome can be observed each time
Relative frequency (proportion of occurrences) of an outcome settles
down to one value over the long run. That one value (between 0 and
1) is then defined to be the probability of that outcome
The probability cannot be used to determine whether or not the
outcome will occur on a single occasion, or in a single sample (it is a
long-run phenomenon)
A Personal Probability of an outcome is always a number between 0 and
1 that expresses an individual’s judgment of how likely the outcome is.
(the outcome may not be repeated!)
Two ways: “personal judgment of how likely” and “what happens in may
repetitions”
Figure 17.3 Toss a coin many times. The difference between the observed number of heads and
exactly one-half the number of tosses becomes more variable as the number of tosses increases.
JLM (WSU)
STA 1020
Ch17 - Thinking about Chance
9 / 112
Risk and Relative Risk (Case Study)The following table gives results for
whether or not subjects were still smoking when given a nicotine patch or
a placebo:
Nicotine
Placebo
JLM (WSU)
No
56 (46.7%)
24 (20%)
STA 1020
STA 1020
Ch17 - Thinking about Chance
High exposure to asbestos is dangerous. Low exposure, such as that
experienced by teachers and students in schools where asbestos is present
in the insulation around pipes, is not very risky. The probability that a
teacher who works for 30 years in a school with typical asbestos levels will
get cancer from the asbestos is around 15/1,000,000. The risk of dying in
a car accident during a lifetime is about 15,000/1,000,000, i.e., 1000 times
more risky, but . . .
Yes
64 (53.3%)
96 (80%)
JLM (WSU)
Ex9 Risk
Total
120 (100%)
120 (100%)
10 / 112
Relative Risk
Risk of continuing to smoke:
Nicotine: 0.533 (just the proportion from the table)
Placebo: 0.800
Relative risk of continuing to smoke when using the placebo patch
compared with when using the nicotine patch is 1.5 (0.800/0.533 =
1.5)
The risk of continuing to smoke when using the placebo patch is 1.5
times the risk when using the nicotine patch
Cautions about Risk
What if the baseline risk is missing? The relative risk means
“relative” to what?
The reported risk is not necessarily your risk. Are the subjects and the
setting of the study representative of you and your situation?
11 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
2 / 19
12 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch17 - Thinking about Chance
Exercise Ch17
Ch17 - Thinking about Chance
17.12 Marital status. The probability that a randomly chosen 50-year
old woman is divorced is about 0.18. This probability is a long-run
proportion based on all the millions of women aged 50. Let’s suppose that
the proportion stays at 0.18 for the next 30 years. Bridget is now 20 years
old and is not married.
(a) Bridget thinks her own chances of being divorced at age 50 are about
5%. Explain why this is a personal probability.
(b) Give some good reasons why Bridget’s personal probability might
differ from the proportion of all women aged 50 who are divorced.
(c) You are a government official charged with looking into the impact of
the Social Security system on middle-aged divorced women. You care only
about the probability 0.18, not about anyones personal probability. Why?
JLM (WSU)
STA 1020
Ch17 - Thinking about Chance
Exercise (answer) Ch17
**Answers
(a) This is based on a personal judgment of her likelihood to get divorced;
it is not based on data on repeated trials of an experiment.
(b) For example, Bridget might have strong religious or moral beliefs that
make her less inclined to consider divorce.
(c) For the overall impact of divorce, we are concerned with the
percentage of all 50-year-old women who are divorced. The probability
0.18 is supported by data, and is known to apply to the whole group.
13 / 112
JLM (WSU)
Multiple choice Ch17
STA 1020
14 / 112
Ch18 - Probability Models
If I toss a fair coin 5,000 times
(a) the number of heads will be close to 2,500. (b) the proportion of
heads will be close to 0.5. (c) the proportion of heads in these tosses is a
parameter. (d) the proportion of heads will be exactly to 50%.
Answer: (b)
STA 1020
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
.......................................................................
There are 2,598,960 possible 5-card hands that can be dealt from an
ordinary 52-card deck. Of these, 5,148 have all five cards of the same suit
(in poker such hands are called flushes). The probability of being dealt
such a hand (assuming randomness) is closest to
(a) 1/4. (b) 1/100. (c) 1/500. (d) 1/1000.
Answer: (c)
Instructor: Dr. J.L. Menaldi
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
“Statistics” is the Science of collecting, describing and interpreting data...
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
JLM (WSU)
STA 1020
Ch18 - Probability Models
15 / 112
Choose a woman aged 25 to 29 old at random and record her marital
status, i.e., a SRS of size n=1. The probability of any marital status is just
the proportion of all women aged 25 to 29 who have that status, if we
choose many women, we get
Never married
0.503
Married
0.452
Widowed
0.003
STA 1020
Ch18 - Probability Models
Chapter 18
Marital status:
Probability:
JLM (WSU)
Ex1 Marital Status
16 / 112
Avoid Being Inconsistent
Sketching. . . For instance, the probability of married with children must
not be greater than the probability that the couple is married
Divorced
0.042
Because of the proportions
To find out P(not married), we add P(never married), P(widowed)
and P(divorced), i.e., 0.503 + 0.003 + 0.042 = 0.548
Adding P(not married) and P(married) should give 1, so
P(not married) is also equal to 1 − 0.452 = 0.548.
JLM (WSU)
STA 1020
A probability model for a random phenomenon describes all the possible
outcomes and says how to assign probabilities to any collection of
outcomes. We sometimes call a collection of outcomes an event
17 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
3 / 19
18 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch18 - Probability Models
Probability Rules A-B
Ch18 - Probability Models
*C* The probability that an event does not occur is 1 minus the
probability that the event does occur
“These rules tell us only what probability models make sense!”
*A* Any probability is a number between 0 and 1
A probability can be interpreted as the proportion of times that a
certain event can be expected to occur
If the probability of an event is more than 1, then it will occur more
than 100% of the time (Impossible!)
*B* All possible outcomes together must have probability 1
Because some outcome must occur on every trial, the sum of the
probabilities for all possible outcomes must be exactly one
If the sum of all of the probabilities is less than one or greater than
one, then the resulting probability model will be incoherent
JLM (WSU)
STA 1020
Ch18 - Probability Models
If the probability that a flight will be on time is 0.70, then the
probability it will be late is 0.30
*D* If two events have no outcomes in common, they are said to be
“mutually exclusive”. The probability that one or the other of two
mutually exclusive events occurs is the sum of their individual probabilities
Example: Age of woman at first child birth. Given (a) under 20: 25% and
(b) 20-24: 33%, find (1) 24 or younger: ?, Rule D says
25% + 33% = 58%, and (2) 25+: ?, Rule C says 100% − 58% = 42%
19 / 112
JLM (WSU)
20 / 112
Ex3 A Sampling distribution
Figure 18.2 The sampling distribution of a sample proportion p̂ from SRSs of size 2527 drawn
from a population in which 50% of the members would give positive answers. The histogram
shows the distribution from 1000 samples.
Now it’s your turn: How about the events “roll a 7” and “roll a 11”?
STA 1020
STA 1020
Ch18 - Probability Models
Assume carefully made dice, resulting in fair dice (i.e., each outcome is
equally possible). The event “roll a 5” contains four outcomes, ”1+4”,
“2+3”, “3+2”, “4+1”, so that
P(roll a 5) = 1/36 + 1/36 + 1/36 + 1/36 = 4/36 = 0.111.
Ch18 - Probability Models
As a jury member, you assess the probability that the defendant is
guilty to be 0.80. Thus you must also believe the probability the
defendant is not guilty is 0.20 in order to be coherent (consistent with
yourself).
Ex2 Rolling two dice
Figure 18.1 There are 6 possible outcomes for each die, so 36 for two dice.
JLM (WSU)
Probability Rules C-D
The Normal curve is the ideal pattern that describes the results of a very large number of
samples, in this case, with x̄ = 0.5 and s = 0.010.
• • • So, the ‘95’ part of the 68-95-99.7 rule says than 95% of all samples will give a p̂ within
0.48 = 0.50 − 0.02 and 0.52 = 0.50 + 0.02
21 / 112
JLM (WSU)
Ex4 & Ex5 Gambling
STA 1020
Ch18 - Probability Models
An opinion poll asks an SRS of 501 teens, “Generally speaking, do you approve or
disapprove of legal gambling or betting?” Suppose exactly 50% of all teens would say
‘yes’ (i.e., the parameter p = 0.5), and that the sampling distribution follows
approximatively a normal curve with x̄ = 0.5 and s = 0.022.
22 / 112
Sampling distribution
The sampling distribution of a statistic tells us what values the statistic takes in
repeated samples from the same population and how often it takes those values
We think of a sampling distribution as assigning probabilities to the values the statistic
can take. Because there are usually many possible values, sampling distributions are
often described by a density curve such as a normal curve.
A sampling distribution
Tells what values a statistic (calculated sample value) takes and how
often it takes those values in repeated sampling
Assigns probabilities to the values a statistic can take. These
probabilities must satisfy Rules A-D
Probabilities are often assigned to intervals of outcomes by using
areas under density curves
Figure 18.3 The Normal sampling distribution.
Because 0.478 is one standard deviation below
the mean, the area under the curve to the left
of 0.478 is 0.16.
JLM (WSU)
Figure 18.4 The Normal sampling distribution.
The outcome 0.52 has standard score 0.9, so
Table B tells us that the area under the curve
to the left of 0.52 is 0.8159
STA 1020
Often this density curve is a normal curve
Can use “68-95-99.7 rule” or get probabilities from Table B
Sample proportions (i.e., p̂) follow a normal curve
Check Case Study Evaluated
23 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
4 / 19
24 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch18 - Probability Models
Who Voted?
Ch18 - Probability Models
–[ World Almanac and Book of Facts (1995), Famighetti, R. (editor), Mahwah, N.J.: Funk and Wagnalls ]–
• 56% of registered voters actually voted in the 1992 presidential election.
• In a random sample of 1600 voters, the proportion who claimed to have voted
was 0.58.
• Such sample proportions (p̂) from repeated sampling would have a normal
distribution with a mean of 0.56 and a standard deviation of 0.012.
• What is the probability of observing a sample proportion (p̂) as large or larger
than 58%?
Independent Events (Ch19!)
If two events do not influence each other, and if knowledge about one does
not help with the knowledge of the probability of the other, the events are
said to be independent of each other. If two events are independent, the
probability that they both happen is found by multiplying their individual
probabilities.
Example: Suppose that about 20% of incoming male freshmen smoke.
Suppose that these freshmen are randomly assigned in pairs to dorm
rooms. Then . . .
the probability of a match (both smokers or both non-smokers):
Solution:
both are smokers: 0.04 = (0.20)(0.20)
If we convert the observed value of 0.58 to a standardized score, we get
standardized score z = (x − x̄)/s, i.e., (0.58 − 0.56)/0.012 = 1.67
neither is a smoker: 0.64 = (0.80)(0.80)
From Table B, this is the 95.54 percentile, so the probability of observing a
value as small as 0.58 is 0.9554
both are or neither is a smoker: 0.04 + 0.64 = 0.68
only one is a smoker: Rule C, (1 − 0.68), i.e., 32%
By Rule C (or B), the probability of observing a value as large or larger than
0.58 is 1 − 0.9554 = 0.0446
JLM (WSU)
STA 1020
Ch18 - Probability Models
25 / 112
Top 20%
0.44
Second 20%
0.26
Third 20%
0.23
STA 1020
Ch18 - Probability Models
18.8 High school academic rank. Select a first-year college student at
random and ask what his or her academic rank was in high school. Here
are the probabilities, based on proportions from a large sample survey of
first-year students:
Rank
Probability
JLM (WSU)
Exercise Ch18
Fourth 20%
0.06
26 / 112
Exercise (answer) Ch18
**Answers
(a) The sum is 1, as we expect, because all possible outcomes are listed.
(b) 1 − 0.44 = 0.56.
(c) 0.44 + 0.26 = 0.70.
Lowest 20%
0.01
(a) What is the sum of these probabilities? Why do you expect the sum to
have this value?
(b) What is the probability that a randomly chosen first-year college
student was not in the top 20% of his or her high school class?
(c) What is the probability that a first-year student was in the top 40% in
high school?
JLM (WSU)
STA 1020
Ch18 - Probability Models
27 / 112
JLM (WSU)
Multiple choice Ch18
Choose an American household at random and ask how many computers
that household owns. Here are the probabilities as of 2003:
Number of computers
Probability
0
0.38
1
0.44
STA 1020
28 / 112
Ch20 - The House Edge: Expected Valued
STA 1020
2
0.18
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
Instructor: Dr. J.L. Menaldi
1
2
This is a legitimate assignment of probabilities because it satisfies
these rules:
(a) all the probabilities are between 0 and 1. (b) all the probabilities
are between 0% and 100%. (c) the sum of all the probabilities is
exactly 1. (d) both (a) and (c).
Answer: (d)
What is the probability that a randomly chosen household owns more
than one computer?
(a) 0.56. (b) 0.18. (c) 0.44. (d) 0.62.
Answer: (b)
JLM (WSU)
STA 1020
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
“Statistics” is the Science of collecting, describing and interpreting data...
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
29 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
5 / 19
30 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch20 - The House Edge: Expected Valued
Thought Questions. . .
Ch20 - The House Edge: Expected Valued
Chapter 20
Raffle tickets
Long-Term Gains, Losses and Expectations
Tickets to a sorority fund-raiser sell for $1.
Expected Value is what you logically expect in the long run . . .
One ticket will be randomly chosen, the ticket owner receives a $200
gift card.
(expected value) = a1 p1 + a2 p2 + · · · + an pn ,
where ai is the value (e.g., amount of money) that you expect if the
outcome i happens, and pi is the probability (chance) that outcome i
occurs, for i = 1, 2, . . . , n.
They expect to sell 1000 tickets. Your ticket has a 1/1000 = 0.001
probability of winning (and a 0.999 probability of losing).
.......................................................................
** Suppose that a sorority pledge class is selling raffle tickets to raise
money. The grand prize is a $200 gift certificate to the campus bookstore,
and the pledges must sell all 1000 raffle tickets that were printed. How
much would you be willing to pay for a single ticket? Explain your answer.
.......................................................................
Your expected gain (expected value) is
($199)(0.001) + (−$1)(0.999) = −$0.80.
The Main Point... While we cannot predict individual outcomes, we can
“estimate” what happens (on average, i.e., repeating this over and over)
in the long run.
JLM (WSU)
STA 1020
Ch20 - The House Edge: Expected Valued
loose
$0
0.999
31 / 112
win
$250
0.001
loose
$0
0.994
order
$42
0.005
STA 1020
Number of vehicles
Proportion
The expected value is ($0)(0.994) + ($42)(0.005) + ($292)(0.001) = $0.502
Which one is better in the long run? Ans: If you keep playing
* ’Straight’ you will loose $0.50-$0.25=$0.25, i.e., 50% and
* ’Straight-Box’ $1.000-$0.502=$0.498, i.e., 49.8%
STA 1020
32 / 112
Vehicles
What is the average number of motor vehicles in American households?
The Census Bureau tells us that the distribution of vehicles per household
(2000 year census) is as follows:
exact
$292
0.001
You may choose to make a $1 ’Straight-Box’ (6-way) wager. You again choose a
three-digit number, but now you have two ways to win. You win $292 if you
exactly match the winning number, and you win $42 if your number has the same
digits as the winning number, but in any order.
Ch20 - The House Edge: Expected Valued
JLM (WSU)
Ch20 - The House Edge: Expected Valued
The average or expected value is ($0)(0.999) + ($250)(0.001) = $0.25
JLM (WSU)
Long term, you lose an average of $0.80 each time (conceptually) you
enter such a contest (Hey, the sorority needs to make a profit!).
Daily Numbers
A simple lottery wager, the ‘Straight’ from Pick 3 game of the Tri-State Daily Numbers.
You pay $0.50 and choose a three-digit number, and the state chooses a three-digit
winning number at random and pays you $250 if your number is chosen.
Outcomes n = 2
Outcome Value (ai )
Probability (pi )
Two outcomes: (a) You win $200, net gain is $199 (chance: 0.001)
or (b) You do not win, net ‘gain’ is -$1 (chance: 0.999)
0
0.10
1
0.34
2
0.39
3
0.13
4
0.03
5
0.01
The expected value is (0)(0.10) + (1)(0.34) + · · · + (5)(0.01) = 1.68
. ...............................................................
Deal or No Deal? (1) You choose one of four sealed cases; one contains
$1,000, and the others are empty. If you open your case, you have a 25%
chance to win $1,000 and a 75% chance of getting nothing (winning $0).
Or, (2) you can sell your unopened case for $240, giving you a 100%
chance of winning $240.
* First option (open your case): EV = ($1000)(0.25) + ($0)(0.75) = $250
* Second option (sell your case): EV = $240, no variation.
** Make a Decision: Will you open or sell your case?
33 / 112
JLM (WSU)
Deal or No Deal? (cont)
STA 1020
Ch20 - The House Edge: Expected Valued
Summary:
Option 1 - a 25% chance to win $1,000 and a 75% chance of getting
nothing, EV=$250
Option 2 - a gift of $240, guaranteed, EV=$240
Analysis
If choosing for ONE trial
34 / 112
Deal or No Deal? (variation)
Now, a variation: (1) You have a case containing $740 of your money. If
you give away your case, you have a 100% chance of losing $740. Or, (2)
you can keep your case and play a game in which you have a 75% chance
to lose $1,000 and a 25% chance to lose nothing ($0)
(1) Give away your case: EV = $740, no variation, a sure loss of $740
(2) Play the game: EV = ($1000)(0.75) + ($0)(0.25) = $750 a 75%
chance to lose $1,000 and a 25% chance to lose nothing
Option (1) will maximize potential gain ($1000) and also minimize
potential loss ($0)
Option (2) guarantees a gain ($240)
Make a Decision . . . . . . . . . . . . . . . . . . . . . . . .Will you play the game or not?
If choosing for ONE trial
If choosing for MANY trials
Option (1) will maximize expected gain (will make more money in the
long run)
How many trials are necessary for ‘long run’ ?, 500?
Option (2) will minimize potential gain ($0) and will also maximize
potential loss ($1000)
Option (1) guarantees a loss of ($740)
If choosing for MANY trials
Option (1) will minimize expected loss (will lose less money in the long
run)
How many trials are necessary for ‘long run’ ?, 500?
JLM (WSU)
STA 1020
35 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
6 / 19
36 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch20 - The House Edge: Expected Valued
The Law of Large Numbers
Ch20 - The House Edge: Expected Valued
The actual average (mean) outcome of many independent trials gets closer
to the expected value as more trials are made.
the higher the variability of the trials, the larger the sample is needed
A couple plan to have children until they have a girl or until they have three children.
What is the probability that they will have a girl among their children?
1
The probability model is like that for coin tossing: (a) Each child has probability
0.49 of being a girl and 0.51 of being a boy (yes, more boys than girls are born;
boys have higher infant mortality, so the sexes even out soon) (b) The sexes of
successive children are independent.
2
Assigning digits is also easy. Two digits simulate the sex of one child. We assign
49 of the 100 pairs to “girl” and the remaining 51 to “boy”, i.e., 00, 01, 02, . . .,
48 means girl, and 49, 50, 51, . . ., 99 means boy.
3
To simulate one repetition of this childbearing strategy, read pairs of digits from
Table A until the couple have either a girl or three children. The number of pairs
needed to simulate one repetition depends on how quickly the couple get a girl.
Here are 10 repetitions, simulated using line 130 of Table A. To interpret the pairs
of digits, we have written G for girl and B for boy under them, have added space
to separate repetitions.
4
In these 10 repetitions, a girl was born 9 times. Our estimate of the probability
that this strategy will produce a girl is therefore estimated probability 9/10 = 0.9.
Some mathematics shows that, if our probability model is correct, the true
probability of having a girl is 0.867. Our simulated answer came quite close.
Unless the couple are unlucky, they will succeed in having a girl.
expected values can be calculated by simulating many repetitions and
finding the average of all of the outcomes
The “house” in a gambling operation is not gambling at all
the games are defined so that the gambler has a negative expected
gain per play
each play is independent of previous plays, so the law of large
numbers guarantees that the average winnings of a large number of
players will be close to the (negative) expected value
State lottos have extremely variable outcomes; also use pari-mutuel
system for (fixed) payoffs, too many trials are necessary . . .
JLM (WSU)
STA 1020
Ch20 - The House Edge: Expected Valued
Ex4 We want a girl (Ch19!)
37 / 112
JLM (WSU)
Ex3 We want a girl
STA 1020
Ch20 - The House Edge: Expected Valued
38 / 112
Controversies
Sometimes, expected values may be too difficult to compute and
“simulation” is used.
Gambling?
Voluntary tax?
A couple plan to have children until they have a girl or until they have three
children, whichever comes first. We find the expected value by simulation, using
the table of random digits. The probability model says that the sexes of
successive children are independent and that each child has probability 0.49 of
being a girl. Thus, a pair of digits simulates one child, with 00 to 48 standing for
a girl (e.g., begin at line 130)
6905
BG
2
16
G
1
48
G
1
17
G
1
8717
BG
2
40
G
1
9517
BG
2
845340
BBG
3
648987
BBB
3
Arguments for
& against
C H E C K:
“Exploring the
Web” box and
the end of this
chapter.
20
G
1
Mean of number of children x̄ = (2 + 1 + · · · + 3 + 1)/10 = 1.7
This simulation is too short to be trustworthy (only 10 repetitions or trials).
A deeper math analysis shows that the actual expected value is 1.77
JLM (WSU)
STA 1020
Ch20 - The House Edge: Expected Valued
39 / 112
JLM (WSU)
Exercise Ch20
STA 1020
Ch20 - The House Edge: Expected Valued
20.10 Keno. Keno is a popular game in casinos. Balls numbered 1 to 80
are tumbled in a machine as the bets are placed, then 20 of the balls are
chosen at random. Players select numbers by marking a card. Here are
two of the simpler Keno bets. Give the expected winnings for each.
(a) A $1 bet on “Mark 1 number” pays $3 if the single number you mark
is one of the 20 chosen; otherwise, you lose your dollar.
(b) A $1 bet on “Mark 2 numbers” pays $12 if both your numbers are
among the 20 chosen. The probability of this is about 0.06. Is Mark 2 a
more or a less favorable bet than Mark 1?
40 / 112
Exercise (answer) Ch20
**Answers
(a) The expect payoff for a Mark 1 bet is
($3)(20/80) + ($0)(60/80) = $0.75.
(b) The expected payoff for a Mark 2 bet is approximately
($12)(0.06) = $0.72,
slightly less favorable than a Mark 1 bet.
Note: The exact probability of winning a Mark 2 bet is
(20/80)(19/79) = 0.06013;
with this value, the expected payoff is about $0.7215.
JLM (WSU)
STA 1020
41 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
7 / 19
42 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch20 - The House Edge: Expected Valued
Multiple choice Ch20
Ch21 - What is a Confidence Interval?
A basketball player makes 65% of her shots from the field during the
season. You want to estimate the expected number of shots made in 10
shots. You simulate 10 shots 25 times and get the following numbers of
hits:
7976375656756685639677879
Your estimate is: (a) 6 out of 10 shots. (b) 6.5 out of 10 shots. (c) 5.6
out of 10 shots. (d) 5.2 out of 10 shots.
Answer: (d)
.......................................................................
In government data, a family consists of two or more persons who live
together and are related by blood or marriage. Choose an American family
at random and count the number of people it contains. Here is the
assignment of probabilities for your outcome:
Number of persons
Probability
2
0.42
3
0.23
4
0.21
5
0.09
6
0.03
7
0.02
STA 1020
Ch21 - What is a Confidence Interval?
Instructor: Dr. J.L. Menaldi
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
“Statistics” is the Science of collecting, describing and interpreting data...
Using the probabilities above, what is the expected size of the family you
draw? (a) 2 people. (b) 3 people. (c) 3.14 people. (d) 3.50 people.
Answer: (c)
JLM (WSU)
STA 1020
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
JLM (WSU)
43 / 112
Estimating
STA 1020
Ch21 - What is a Confidence Interval?
44 / 112
Thought Questions. . .
Part 4: Inference - To draw a conclusion from evidence
Chapter 21
Suppose that 40% of a certain population favor the use of nuclear power
for energy
Statistical inference draws conclusions about a population on the basis
of data from a sample. Question such as, “what is the opinion of people
about a particular issue”, or “what is the mean survival time for patients
with this type of cancer”, or “how people are going to vote in the coming
election”. These questions are about a number (the mean or in particular,
a percentage) that describes the population on the basis of a sample. This
is, to estimate a parameter on the basis of a statistic, as defined in early
chapters.
(b) Now suppose you randomly sample 1000 people from this population.
Will exactly 400 (40%) of them be in favor of the use of nuclear
power? Would you be surprised if only 200 (20%) of them are in
favor?
(a) If you randomly sample 10 people from this population, will exactly
four (40%) of them be in favor of the use of nuclear power? Would
you be surprised if only two (20%) of them are in favor?
(c) In both cases (a) and (b). How about if none of the sample are in
favor?
A level C confidence interval (e.g., C = 95%) for a parameter has two
parts
An interval calculated from the data
A confidence level (or coefficient) C, which gives the probability that
the interval contains the true parameter value
JLM (WSU)
STA 1020
Ch21 - What is a Confidence Interval?
45 / 112
JLM (WSU)
Thought Questions. . . (cont)
STA 1020
Ch21 - What is a Confidence Interval?
46 / 112
Recall
* What does it mean to say that the interval from 0.07 to 0.11 represents
a 95% confidence interval for the proportion of adults in the US who
have diabetes?
A 95% confidence interval is an interval calculated from sample data by
a process that is guaranteed to capture the true population parameter in
95% of all samples.
* Would a 99% confidence interval for the above proportion be wider or
narrower than the 95% interval given? What common sense tell you?
Explain.
Recall from previous chapters:
* In a May 2006 Zogby America poll of 1000 adults, 70% said that past
efforts to enforce immigration laws have been inadequate. Based on this
poll, a 95% confidence interval for the proportion in the population who
feel this way is about 67% to 73%. If this poll had been based on 5000
adults instead, would the 95% confidence interval be wider or narrower
than the interval given? Explain.
JLM (WSU)
STA 1020
Parameter: fixed, unknown number that describes the population
Statistic: known value calculated from a sample, a statistic is used to
estimate a parameter
Sampling Variability: different samples from the same population may
yield different values of the sample statistic, estimates from samples
will be closer to the true values in the population if the samples are
larger
√
Margin of Error: in Chapter 3, a quick estimate was given by 1/ n,
where n is the sample size
47 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
8 / 19
48 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch21 - What is a Confidence Interval?
More key words
Ch21 - What is a Confidence Interval?
The amount by which the proportion obtained from the sample (p̂)
will differ from the true population proportion (p) rarely exceeds the
margin of error
Rule Conditions and Illustration
Figure 21.1 Repeat many times the process of selecting an SRS of size n from a population in
which the proportion p are successes. The values of the sample proportion of successes p̂ have
this Normal sampling distribution.
Sampling Distribution tells what values a statistic takes and how
often it takes those values in repeated sampling
Sample proportions (p̂) from repeated sampling would have a normal
distribution with a certain mean and standard deviation
Take an SRS of size n from a large population that contains
proportion p of successes. Let p̂ be the sample proportion of
successes, [ i.e., p̂ = (count of successes in the sample)/n ]. If the
sample size n is large enough then the
sampling distribution of p is approximately normal
q
mean of the sampling distribution is p
standard deviation of the sampling distribution is p(1−p)
n
Ex1 & Ex2 Binge drinking: We calculate
p the sample proportion is 279/2166 = 0.129. If
we assume that p = 0.13 then sd = (0.13)(0.87)/2166 = 0.0072.
The 68-95-99.7 rule says that 95% of all sample of that size will yield a p̂ within the
interval p − 2 sd = 0.13 − 0.0144 = 0.1156 and p + 2 sd = 0.13 + 0.0144 = 0.1444.
* Problem: We do not actually know the true proportion p . . .
JLM (WSU)
STA 1020
Ch21 - What is a Confidence Interval?
Figure 21.3 Repeated samples from the same
population give different 95% confidence intervals, but 95% of these intervals capture the true
population proportion p.
For n sufficiently
large p̂ is close to p
STA 1020
Ch21 - What is a Confidence Interval?
Sample proportion plus orpminus two standard deviations of the
sample proportion, p̂ ± 2 p(1 − p)/n
Since we do not know the population proportion p (needed to
calculate the standard
p deviation) we will use the sample proportion p̂
in its place, p̂ ± 2 p̂(1 − p̂)/n
p
√
The margin of error is 2 p̂(1 − p̂)/n ≤ 1/ n, the quick method of
Chapter 3
The formula for a “C-level (%) p
Confidence Interval” for the
population proportion is p̂ ± z ∗ p̂(1 − p̂)/n, where z ∗ is the critical
value of the standard normal distribution for confidence level C
JLM (WSU)
51 / 112
Confidence Level
C
50%
60%
68.3%*
70%
80%
90%
95%*
95.4%
99%
99.7%*
99.9%
r
52 / 112
Confidence Interval
Figure 21.5 Critical values z* of the Normal distributions. In any Normal distribution, there is
area (probability) C under the curve between -z* and z* standard deviations away from the
mean.
p̂(1 − p̂)
n
Critical Value
z∗
0.67
0.84
1*
1.04
1.28
1.64
1.96
2*
2.58
3
3.29
Check table z-score
STA 1020
STA 1020
Ch21 - What is a Confidence Interval?
p̂ ± z ∗
50 / 112
Empirical Rule
Formula for a 95% Confidence Interval for the Population Proportion
Margin of Error
Figure 21.4 Twenty-five samples from
the same population give these 95%
confidence intervals. In the long run, 95%
of all such intervals cover the true population proportion, marked by the vertical line.
JLM (WSU)
STA 1020
Ch21 - What is a Confidence Interval?
Figure 21.2 Repeat many times the process of
selecting an SRS of size 2166 from a population
in which the proportion p = 0.13 are successes.
The middle 95% of the values of the sample
proportion p̂ will lie between 0.1156 and 0.1444.
JLM (WSU)
JLM (WSU)
49 / 112
Binge drinking
Ex5 A 99% confidence interval: The SRS of size n=2166 yields
p
p̂ = 279/2166 = 0.129, z ∗ = 2.58 and p̂(1 − p̂)/n = 0.0072, so
0.129 ± (2.58)(0.0072) = 0.129 ± 0.0186, i.e., from 11.04% to 14.76%
53 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
9 / 19
54 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch21 - What is a Confidence Interval?
The Rule for Sample Means
Ch21 - What is a Confidence Interval?
Distribution of the mean
The proportion is a particular “mean”, i.e., if positive answer is valued 1
and a negative answer is valued 0 then the average value (or mean) is
indeed the proportion, of whole population p and of the SRS p̂. For
instance, we may phase the questions as “How strong you feel about this
particular issue” and then the answer in percent, say from 0% to 100%.
Analogously, we may ask about “something” that is measure is some
natural unit and then we have the answers as a numerical values, which
are called numeric random variables.
As n becomes large, the law of the large number says that the average x̄ of a SRS
approximate the mean of the whole population µ.
The central limit theorem says that the sampling distribution of the x̄
follows approximately (if n is large) a normal distribution, with mean µ
√
and standard deviation sd = σ/ n, where σ is the standard deviation of
the whole population.
Figure 21.6 The sampling distribution of the sample mean x̄ of 10 observations compared with
the distribution of individual observations.
JLM (WSU)
STA 1020
Ch21 - What is a Confidence Interval?
55 / 112
JLM (WSU)
Some Simulations
STA 1020
Ch21 - What is a Confidence Interval?
56 / 112
Margin of error for the mean
The C-level (%) confidence interval for the population mean µ is given by
σ
s
either x̄ ± z ∗ √
or x̄ ± z ∗ √
n
n
where z ∗ is the critical value of the standard normal distribution for confidence
level C. If the population standard deviation σ is unknown then the sample
standard deviation s is used
* “We are 95% confident that the mean resting pulse rate for the
population of all exercisers is between 62.8 and 69.2 bpm” (We feel that
plausible values for the population of exercisers’ mean resting pulse rate are
between 62.8 and 69.2.)
“This does not mean that 95% of all people who exercise regularly will
have resting pulse rates between 62.8 and 69.2 bpm”
Figure 21.7 The distribution of a sample mean x̄ becomes more Normal as the size of the
sample increases. The distribution of individual observations (n = 1) is far from Normal. The
distributions of means of 2, 10, and finally 25 observations move closer to the Normal shape.
JLM (WSU)
STA 1020
Ch21 - What is a Confidence Interval?
57 / 112
JLM (WSU)
What is the meaning of Confidence?
* Next, a C-level (%) confidence means that the interval (calculated as
above) is guaranteed to capture the true (population) parameter (either
the proportion p or the mean µ) in C% of all samples.
* In other words, e.g., take C=68%: if you take 100 samples and with
each of them you use the above formula to get a ”confidence interval”
then approximatively 68 of those samples will give confidence intervals
containing the (true) parameter (either proportion or mean, of the whole
population), i.e., 68% of the ”confidence intervals” contain the (true)
population proportion or mean.
* In short, C is the chance (probability) that the one sample (we took!)
yields a confidence interval (calculated as above) containing the parameter.
STA 1020
STA 1020
Ch21 - What is a Confidence Interval?
* First, calculate the C-level (%) confidence interval from (sample) data with the formula
r
p(1 − p)
σ
either p ± z ∗
or µ ± z ∗ √
n
n
where either the population proportion p could be replaced by the sample proportion p̂, or the
population mean µ and standard deviation σ could be replaced by the sample mean x̄ and
standard deviation s (if necessary), and z ∗ is the critical value of the standard normal
distribution for confidence level C
JLM (WSU)
* Statistically: 95% of all samples of size n = 29 from the population of
exercisers should yield a sample mean within two standard errors of the
population mean; i.e., in repeated samples, 95% of the confidence intervals
should contain the true population mean.
58 / 112
Inference (Ch23!)
The design of the data production matters. “Where do the data come
from?” remains the first question to ask in any statistical study. Any inference
method is intended for use in a specific setting. For our confidence interval and
test for a proportion p
1
The data must be a simple random sample (SRS) from the population of
interest. When you use these methods, you are acting as if the data are
SRS. In practice, it is often not possible to actually choose a SRS from the
population. Your conclusions may then be open to challenge.
2
These methods are not correct for sample designs more complex than an
SRS, such as stratified samples. There are other methods that fit these
settings.
3
There is no correct method for inference from data haphazardly collected
with bias of unknown size. Fancy formulas cannot rescue badly produced
data.
4
Other sources of error, such as dropouts and nonresponse, are important.
Remember that confidence intervals and tests use the data you give them
and ignore these practical difficulties.
59 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
10 / 19
60 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch21 - What is a Confidence Interval?
Inference (Ch23!)(cont)
Ch21 - What is a Confidence Interval?
Know how confidence intervals behave. A confidence interval estimates the unknown
value of a parameter and also tells us how uncertain the estimate is. All confidence
intervals share these behaviors:
1
The confidence level says how often the method catches the true parameter in very
many uses. We never know whether this specific data set gives us an interval that
contains the true parameter. All we can say is that “we got this result from a
method that works 95% of the time.” This data set might be one of the 5% that
produce an interval that misses the parameter. If that risk bothers you, use a 99%
confidence interval.
2
High confidence is not free. A 99% confidence interval will be wider than a 95%
confidence interval based on the same data. There is a trade-off between how
closely we can pin down the parameter and how confident we are that we have
caught the parameter.
3
Extra. . .
INFO: If the population standard deviation σ is unknown and the sample
size n is small (e.g., n ≤ 30) then the critical values z ∗ should be obtained
from the “Student t distribution” instead of the normal distribution, which
is called Critical t ∗ Value, while the number n − 1 = df stands for the
Degree of Freedom.
This is generally ignored when estimating population proportions (as in
this course).
The following Table may be needed. . .
(http://www.math.wayne.edu/˜menaldi/teach/others/Sta1020/Table 21 1.pdf)
Larger samples give narrower intervals. If we want high confidence and a narrow
interval, we must take a larger sample. The length of our confidence interval for p
goes down in proportion to the square root of the sample size. To cut the interval
in half, we must take four times as many observations. This is typical of many
types of confidence interval.
JLM (WSU)
STA 1020
Ch21 - What is a Confidence Interval?
61 / 112
JLM (WSU)
Extra. . . (cont)
STA 1020
Ch21 - What is a Confidence Interval?
Comment:
As mentioned early, it is better to take SRS with size n as large as
possible. Now, what seems to better to do with the data of a SRS of size
n = 10, 000: (a) consider this a what it is, a simple random sample of size
n = 10, 000 and calculate a 95%-level confidence interval or (b)
re-evaluate and consider your data as 10 SRS of size n = 1000, calculate
95%-level confidence intervals for each of your 10 SRS and then average
those confidence intervals to get a final answer?
Discussion:
The difference between (a) and (b) is not in collecting different kind of the
data, the data is the same, simply, data is arranged in two alternative
ways, and comparable calculations are performed.
62 / 112
Exercise Ch21
21.18 The quick method. The quick method of Chapter 3 (pages
√
42–43) uses p̂ ± 1/ n as a rough recipe for a 95% confidence interval for
a population proportion. The margin of error from the quick method is a
bit larger than needed. It differs most from the more accurate method of
this chapter when p̂ is close to 0 or 1. An SRS of 500 motorcycle
registrations finds that 68 of the motorcycles are Harley-Davidsons. Give a
95% confidence interval for the proportion of all motorcycles that are
Harleys by the quick method and then by the method of this chapter. How
much larger is the quick-method margin of error?
Questions:
What basic argument (theory) is behind each procedure (a) and (b)?
When could “(b) be better than (a)” or “(a) be better than (b)”?
How about 100 SRS of size n = 1, 000 or 1000 SRS of size n = 10?
JLM (WSU)
STA 1020
Ch21 - What is a Confidence Interval?
63 / 112
p
(0.136)(0.864)/500 = 0.0307
or
p
1.96 (0.136)(0.864)/500 = 0.0300
STA 1020
64 / 112
Multiple choice Ch21
A recent Gallup Poll asked, “Do you consider the amount of federal
income tax you have to pay as too high, about right, or too low?” 52% of
the sample answered “Too high.” Gallup says that: “For results based on
the sample of national adults (n=1,021) surveyed April 6-9, 2008, the
margin of sampling error is 3 percentage points.”
1
The poll was carried out by telephone, so people without phones are
always excluded from the sample. Any errors in the final result due to
excluding people without phones (a) are included in the announced
margin of error. (b) are in addition to the announced margin of error.
(c) can be ignored, because these people are not part of the
population. (d) can be ignored, because this is a non sampling error.
Answer: (b)
2
If Gallup had used an SRS of size n=1021 and obtained the sample
proportion p̂ = 0.52, you can calculate that the margin of error for
95% confidence would be (a) ±1.6 percentage points. (b) ±0.05
percentage points. (c) ±3.0 percentage points. (d) ±3.1 percentage
points.
Answer: (d)
The quick method margin of error is nearly 1.5 times larger than necessary.
JLM (WSU)
STA 1020
Ch21 - What is a Confidence Interval?
**Answers
√
The quick method. By the quick method, the margin of error is 1/ n ,
√
i.e., 1/ 500 = 0.0447. Because p̂ = 68/500 = 0.136, the margin of error
p
from the method of this chapter is z ∗ p̂(1 − p̂)/n , i.e.,
2
JLM (WSU)
Exercise (answer) Ch21
65 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
11 / 19
66 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch22 - What is a Test of Significance?
Ch22 - What is a Test of Significance?
Previously . . . Ex1 Is the coffee fresh?
Chapter 22
Matched pairs Experiment: . . . Each of the 50 subjects tastes two
unmarked cups of coffee and says which he or she prefers. One cup in each
pair contains instant coffee and the other, fresh-brewed coffee. We find
that 36 of our 50 subjects choose the fresh coffee, i.e., p̂ = 36/50 = 0.72.
STA 1020
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
Instructor: Dr. J.L. Menaldi
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
“Statistics” is the Science of collecting, describing and interpreting data...
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
The formula for a “C-level (%) Confidence Interval” for the population
p
proportion is p̂ ± z ∗ p̂(1 − p̂)/n , where z ∗ is the critical value of the
standard normal distribution for confidence level C , see Table 21.1.
∗
At the
p 99%-level we find z =
p2.58 and the Margin of Error is
∗
±z
p̂(1 − p̂)/n = ±(2.58) (0.72)(1 − 0.72)/50 = ±0.164 and the
Confidence Interval is from 0.72 − 0.16 = 0.56 to 0.72 + 0.16 = 0.88.
(at 95% we find z ∗ = 1.96, so MoE= ±0.124 and CI= [0.60, 0.84]).
• What is the rational argument for accepting or rejecting the claim that
population proportion p = 0.5?
• What is the probability that the confidence interval [0.56, 0.88] captures the
true population proportion p?
JLM (WSU)
STA 1020
Ch22 - What is a Test of Significance?
67 / 112
JLM (WSU)
Ex1 Is the coffee fresh?
STA 1020
Ch22 - What is a Test of Significance?
68 / 112
Ex1 Sampling distribution
Matched pairs Experiment: . . . Each of the 50 subjects tastes two
unmarked cups of coffee and says which he or she prefers. One cup in each
pair contains instant coffee and the other, fresh-brewed coffee. We find
that 36 of our 50 subjects choose the fresh coffee, i.e., p̂ = 36/50 = 0.72.
The claim. The skeptic claims that coffee drinkers can not tell fresh from
instant, so that only half will choose fresh-brewed coffee, i.e., the
population proportion p is only 0.5.
If this claim is true,
of p̂ is approx. normal with
pthe sampling distribution
p
p = 0.5 and sd = p(1 − p)/n = (0.5)(0.5)/50 = 0.0707.
The data. In our SRS we got p̂ = 0.72, i.e., 72%, but in another SRS we
could find p̂ = 0.56, i.e., 56%, or any other value! Do we have evidence
against the claims?.
The Probability. We can measure the strength of the evidence against the claim
by a probability, i.e., “What is the probability that a sample gives p̂ this large or
larger if the truth about the population is that p = 0.5?”
JLM (WSU)
STA 1020
Ch22 - What is a Test of Significance?
Figure 22.2 The sampling distribution of the proportion of 50 coffee drinkers who prefer
fresh-brewed coffee if the truth about all coffee drinkers is that 50% prefer fresh coffee.
The shaded area is the probability that the sample proportion is 56% or greater.
69 / 112
JLM (WSU)
Ex1 Is the coffee fresh? (cont)
STA 1020
Ch22 - What is a Test of Significance?
The Probability. We can measure the strength of the evidence against the claim
by a probability, i.e., “What is the probability that a sample gives p̂ this large or
larger if the truth about the population is that p = 0.5?”.
Our sample actually gave p̂ = 0.72, and the probability of getting a
sample outcome this large (or larger) is only 0.001, i.e., 1 out of 1000
times this may happen just by change. We may declare this as a good
evidence (that the claim is false). If our p̂ were equal to 0.56 then our
probability would be 0.2, i.e., 2 out of 10 times, not a really evidence to
reject the claim.
Be sure to understand why this is a convincing evidence. There are two
possible explanations of the fact that 72% of our subject prefer fresh to
instant coffee:
The skeptic is correct (p = 0.5), and by bad luck a very unlikely
outcome occurred
70 / 112
Thought Questions. . .
The defendant in a court case is either guilty or innocent. Which of these
is assumed to be true when the case begins? The jury looks at the
evidence presented and makes a decision about which of these two options
appears more plausible. Depending on this decision, what are the two
types of errors that could be made by the jury? Which is more serious?
.......................................................................
Suppose 60% (0.60) of the population are in favor of new tax legislation.
A random sample of 265 people results in 175, or 66%, who are in favor.
From the Rule for Sample Proportions, we know the potential sample
proportions in this situation follow an approximately normal distribution,
with a mean of 0.60 and a standard deviation of 0.03. Find the standard
score for the observed value of 0.66; then find the probability of observing
a standard score at least that large or larger.
In fact, the population proportion is greater then 0.5 (p > 0.5), so
that the outcome is about what would be expected
JLM (WSU)
STA 1020
71 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
12 / 19
72 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch22 - What is a Test of Significance?
Thought Questions. . . (cont)
Sampling Distribution of p
mean = 0.60
standard deviation = 0.03
Ch22 - What is a Test of Significance?
z=
0.66−0.60
0.03
.
.
= 2.0
p -value = 1 − 0.9773 = 0.0227
p = 0.60
p̂ = 0.66
.
Suppose that in the previous question we do not know for sure that the proportion of
the population who favor the new tax legislation is 60%. Instead, this is just the claim
of a politician. From the data collected, we have discovered that if the claim is true,
then the sample proportion observed falls at the 97.73 percentile (about the 98th
percentile) of possible sample proportions for that sample size. Should we believe the
claim and conclude that we just observed strange data, or should we reject the claim?
What if the result fell at the 85th percentile? At the 99.99th percentile?
JLM (WSU)
STA 1020
Ch22 - What is a Test of Significance?
A test of significance begins by supposing that the effect we seek is not
present. Then we look for a “statistical” evidence against this supposition
and in favor of the effect we hope to find
The claim being tested in a statistical test is called null hypothesis
H0 . The test is designed to assess the strength of the evidence
against the null hypothesis. Usually, the null hypothesis is a
statement of “non effect” or “no different”, which is translated into
something relative to the proportion p (or the mean µ, or standard
deviation σ) of an entire population. What we hope or suspect is true
instead of H0 is called the alternative hypothesis Ha .
The probability computed assuming that H0 is true, that the SRS
outcome would be as extreme or more extreme than the actual
observed outcome is called the P-value of the test. The smaller the
P-value is, the stronger is the evidence against H0 .
Typical examples are (H0 : p = p0 ) and either (Ha : p 6= p0 ) or
(Ha : p > p0 ) or (Ha : p < p0 ) for the alternative hypothesis.
73 / 112
JLM (WSU)
Ex2 Count Buffon’s coin
STA 1020
Ch22 - What is a Test of Significance?
For instance, in Ex1, we used (H0 : p = 0.5) with (Ha : p > 0.5), because
we have discharged the possibility (Ha : p < 0.5) a priori
In Ex2, the French naturalist Count Buffon tossed a coin 4040 times, he
got 2048 heads, i.e., the sample proportion p̂ = 2048/4040 = 0.507. We
ask: “Is this evidence that Buffon’s coin was not balanced?”.
Hypotheses and P-values
74 / 112
Count Buffon’s coin (cont)
Figure 22.3 The sampling distribution of the proportion of heads in 4040 tosses of a balanced
coin. Count Buffon’s result, proportion 0.507 heads, is marked.
We translate this into a null hypothesis (H0 : p = 0.5) and the alternative
hypothesis (Ha : p 6= 0.5).
If the p
null hypothesis isptrue then p = 0.5 and the sample
sd = p(1 − p)/n = (0.5)(0.5)/4040 = 0.00787.
** Now, for p̂ = 0.507 we get a P-value 0.37, i.e., a truly balanced coin
would give a result this far or farther from 0.5 in 37% of all repetitions of
Buffon’s trial. This test give no reason to think that his coin was not
balanced.
JLM (WSU)
STA 1020
Ch22 - What is a Test of Significance?
The P-value or observed significance level of a test of hypotheses is the
smallest value of α (the critical value) for which H0 (null hypothesis) can
be rejected. The P-value measures the strength of evidence against H0 .
75 / 112
JLM (WSU)
Ex3 Testing Coffee
“Significant” in the statistical sense does not mean “important”, it means
not likely to happen just by change. Use a table (and your logic) to find
the P-value.
For Ex3 (Testing Coffee) the null hypothesis is (H0 : p = 0.5) and the
alternative hypothesis is (Ha : p > 0.5).
If null hypothesis is true then p̂ follows (approx.) a Normal distribution
with mean 0.5 and standard deviation 0.0707.
The data yields a p̂ = 0.72, which yields a standard score
z = (0.72 − 0.5)/0.0707 = 3.1, and the table (check table here!) gives a
P-value 0.001.
Since the P-value is small, these data provide very strong evidence that a
majority of the population prefers fresh coffee
STA 1020
STA 1020
Ch22 - What is a Test of Significance?
If the P-value is as small or smaller than α, we say that the data are
statistical significant at the level α.
JLM (WSU)
Figure 22.4 The P-value for testing whether Count Buffon’s coin was balanced. This is the
probability, calculated assuming a balanced coin, of a sample proportion as far or farther from
0.5 as Buffon’s result of 0.507.
76 / 112
P-values
When the alternative hypothesis includes a greater than symbol
(Ha : p > p0 ), the P-value is the probability of getting a value as large
or larger than the observed test statistic (z) value: Look up the
percentile for the value of z in the standard normal table (Table B),
the P-value is 1 minus this probability
When the alternative hypothesis includes a less than symbol
(Ha : p < p0 ), the P-value is the probability of getting a value as
small or smaller than the observed test statistic (z) value: Look up
the percentile for the value of z in the standard normal table (Table
B), the P-value is this probability
When the alternative hypothesis includes a not equal to symbol
(Ha : p 6= p0 ), the P-value is found as follows: Make the value of the
observed test statistic (z) positive (absolute value), look up the
percentile for this positive value of z in the standard normal table
(Table B), find 1 minus this probability, and double the answer to get
the P-value
77 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
13 / 19
78 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch22 - What is a Test of Significance?
P-values (alt)
Ch22 - What is a Test of Significance?
The National Assessment of Adults Literacy (NAAL) survey indicates that a score
of 289 or higher on its quantitative test reflects skills that include those needed to
balance a checkbook. A SRS size n = 2001 of young men (aged 19 to 24) had
mean score x̄ = 279, with a standard deviation s = 103.
Alternative Method for P-value
1. Make the value of the observed test statistic (z) negative
2. Look up the percentile for this negative value of z in the standard
normal table (Table B)
The pessimist’s claim is that the mean NAAL score is less than 289. That is
our alternative hypothesis (why not the H0 ?), the statement we seek
evidence for. Thus (H0 : µ = 289) and (Ha : µ < 289).
Now
If the alternative hypothesis includes a greater than (Ha : p > p0 ) or
less than (Ha : p < p0 ) symbol, the P-value is this probability found
as “percentile” in step 2
If the null hypothesis is true, µ = 289, then the sample mean x̄ follows
(approx.)
a Normal distribution with mean µ = 289 and standard deviation
√
the unknown σ with s, we find
σ/√ n, approximating
√
s/ n = 103/ 2001 = 2.3.
If the alternative hypothesis includes a not equal to (Ha : p 6= p0 )
symbol, double this probability found as “percentile” in step 2 to get
the P-value
The data gave x̄ = 279, which yields a standard score
z = (279 − 289)/(2.3) = −4.35 and so, the P-value is equal to 0.0000068,
very small.
There are other ways, use your logic and the fact that the total area under
any distribution must be equal to 1
JLM (WSU)
STA 1020
Ch22 - What is a Test of Significance?
Ex4 Checkbook
Hence, our conclusion is to reject the null hypothesis, i.e., this data gives a
strong evidence that the mean score for all young men (aged 19 to 24) is
below the level that includes the skills necessary to balance a checkbook.
79 / 112
JLM (WSU)
Checkbook (cont)
STA 1020
Ch22 - What is a Test of Significance?
80 / 112
Procedure
The Five Steps of Hypothesis Testing
Figure 22.5 The P-value is 0.0000068, for a onesided test when the standard score for the sample mean is −4.35.
1
Determining the Two Hypotheses
2
Computing the Sampling Distribution
3
Collecting and Summarizing the Data (calculating the observed test
statistic)
Ex5: Executives’ blood pressures:
n = 72, x̄ = 126.1, s = 15.2
(H0 : µ = 128), with (Ha : µ 6= 128)
√
√
s/ n = 15.2 72 = 1.79
4
Determining How Unlikely the Test Statistic is if the Null Hypothesis
is True (calculating the P-value)
5
Making a Decision/Conclusion (based on the P-value, is the result
statistically significant?)
The P-value is 0.289, for a two-sided test when
the standard score for the sample mean is
(126.1 − 128)/(1.79) = −1.06
Possible Null Hypothesis H0 : population parameter equals some value,
status quo, no relationship, no change, no difference in two groups, etc.
The logical Alternative Hypothesis Ha is “NOT H0 ”
Now it’s your turn . . . Read Case Study evaluated
JLM (WSU)
STA 1020
Ch22 - What is a Test of Significance?
81 / 112
82 / 112
Decision
We find the P-value associated with the standard score obtained from the
data.
Alternative, one-sided: (Ha : p > p0 ) or (Ha : p < p0 ), and one of
these possibilities is discharged as a fact
Alternative, two-sided: (Ha : p 6= p0 )
Sampling Distribution for Proportions: If numerous simple random
samples of size n are taken, the sample proportions p̂ from the various
samples will have an approximately normal distribution with mean equal to
p (theppopulation proportion) and standard deviation equal to
sd = p(1 − p)/n. Since we assume the null hypothesis is true, we replace p
with p0 to complete the test.
To determine if the observed proportion is unlikely to have occurred under
the assumption that H0 is true, we must first convert the observed value
to a standard score z = (p̂ − p0 )/sd
STA 1020
STA 1020
Ch22 - What is a Test of Significance?
Null: (H0 : p = p0 )
JLM (WSU)
JLM (WSU)
Procedure (cont)
If we think the P-value is too low to believe the observed test statistic
is obtained by chance only, then we would reject chance (reject the
null hypothesis) and conclude that a statistically significant
relationship exists (accept the alternative hypothesis)
Otherwise, we fail to reject chance and do not reject the null
hypothesis of no relationship (result not statistically significant)
Commonly, P-values less than 0.05 are considered to be small enough to
reject chance (reject the null hypothesis). However, some researchers use
0.10 or 0.01 as the cut-off instead of 0.05. This “cut-off” value is typically
referred to as the significance level α of the test
The P-value is like an estimation of the probability that the null hypothesis
is true. Because our objective it to reject the null hypothesis (i.e., to
disprove the null hypothesis), it is clear that small P-value are desired.
83 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
14 / 19
84 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch22 - What is a Test of Significance?
A Survey
Ch22 - What is a Test of Significance?
Parental Discipline: Nationwide random telephone survey of 1,250 adults, where
474 respondents had children under 18 living at home. The results on behavior
based on the smaller sample reported “3% for the full sample” and “5% for the
smaller sample” as margin of error.
“The 1994 survey marks the first time a majority of parents reported not having
physically disciplined their children in the previous year. Figures over the past six
years show a steady decline in physical punishment, from a peak of 64 percent in
1988”.
The 1994 sample proportion who did not spank or hit was 51%.
Question: Is this evidence that a majority of the population did not spank or hit?
Null: The proportion of parents who physically disciplined their children in
the previous year is the same as the proportion p of parents who did not
physically discipline their children, i.e., (H0 : p = 0.5)
Based on the sample:
Sample size n = 474 (large, so proportions follow normal distribution)
No physical discipline: 51%
p̂ = 0.51 p
s.d. of p̂ is (0.50)(1 − 0.50)/474 = 0.023 (recall we assume
H0 : p = 0.5 true)
Standard score z = (0.51 − 0.50)/0.023 = 0.43
Table B, (0.43) 7→ (65.54%), so the P-value is 1 − 0.6554 = 0.3446
Since the P-value (0.3446) is not small, we cannot reject chance as
the reason for the difference between the observed proportion (0.51)
and the (null) hypothesized proportion (0.50).
We do not find the result to be statistically significant at α = 0.01 (or
even 0.05 or 0.10)
We fail to reject the null hypothesis. It is plausible that there was not a
majority (over 50%) of parents who refrained from using physical
discipline.
Alt: A majority of parents did not physically discipline their children in the
previous year, i.e., (Ha : p > 0.5)
JLM (WSU)
STA 1020
Ch22 - What is a Test of Significance?
85 / 112
Decisions
Reject H0
Accept H0
H0 is correct
Type Error I (α)
Correct (1 − α)
H0 is incorrect
Correct (1 − β)
Type Error II (β)
Null: (H0 : µ = µ0 )
The probability of this incorrect decision is equal to the cut-off α for
the P-value
Type II: If we decide not to reject chance and thus allow for the
plausibility of the null hypothesis (complicate to estimate!)
This is an incorrect decision only if the alternative hypothesis is true
The probability of this incorrect decision depends on (a) the
magnitude of the true relationship, (b) the sample size, (c) the cut-off
for the P-value.
STA 1020
Alternative, one-sided: (Ha : µ > µ0 ) or (Ha : µ < µ0 ), and one of
these possibilities is discharged as a fact
Alternative, two-sided: (Ha : µ 6= µ0 )
As before, if numerous simple random samples of size n are taken, the
sample means from the various samples will have an approximately normal
distribution with mean equal to µ (the population mean) and standard
√
deviation equal to sd = σ/ n. Here we approximate the population
standard deviation σ with the sample standard deviation s (i.e., remark the
√
factor 1/ n between s and the standard deviation of sampling distribution
of the sample means sd)
87 / 112
JLM (WSU)
Tomato plants
.......................................................................
[standard score]: [(sample mean diff.) - (population mean diff.)] divided by
[standard deviation of the mean difference], i.e., z = (6.82 − 0)/3.10 = 2.2
This is the 98.61 percentile for a standard normal curve, so the probability
of seeing a z-value this large or larger is 1.39% (i.e., 0.0139).
STA 1020
STA 1020
Ch22 - What is a Test of Significance?
A study showed that the difference in sample means for the heights of
tomato plants when using a nutrient rich potting soil versus using ordinary
top soil was 6.82 inches. The corresponding standard deviation (of the
sample distribution of the mean difference) was 3.10 inches. Suppose the
means are actually equal, so that the mean difference in heights for the
populations is actually zero.
* What is the standard score (z) corresponding to the observed difference
of 6.82 inches?
* How often would you expect to see a standardized score that large or
larger?
JLM (WSU)
86 / 112
Mean
The population proportion p could be replaced by a population mean µ
when setting up the two hypotheses
This is an incorrect decision only if the null hypothesis is true
Ch22 - What is a Test of Significance?
STA 1020
Ch22 - What is a Test of Significance?
Type I: If we decide there is a relationship in the population (reject null
hypothesis)
JLM (WSU)
JLM (WSU)
Errors
Hypothesis Testing: Significance level (α) and Power (1 − β)
A Survey (cont)
88 / 112
Bacteria
One of the conclusions made by researchers from a study comparing the
amount of bacteria in carpeted and uncarpeted rooms was, “The average
difference [in mean bacteria colonies per cubic foot] was 3.48 colonies [95%
Confidence Interval: between (−2.72) and (9.68), and P-value: (0.29)].”
* What are the null and alternative hypotheses being tested here?
* Is there a statistically significant difference between the means of the
two groups?
......................................................................
H0 : The mean number of bacteria for carpeted rooms is equal to the
mean number of bacteria for uncarpeted rooms.
Ha : The mean number of bacteria for carpeted rooms is different
from the mean number of bacteria for uncarpeted rooms.
P-value is large (> .05), so there is not a significant difference (fail to
reject the Null hypothesis) (Note that the confidence interval for the
difference contains 0)
89 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
15 / 19
90 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch22 - What is a Test of Significance?
Inference (Ch23!)
Ch22 - What is a Test of Significance?
Know what statistical significance says. Many statistical studies hope to show that some claim
is true. A clinical trial compares a new drug with a standard drug because the doctors hope that
patients given the new drug will do better. A psychologist studying gender differences suspects
that women will do better than men (on the average) on a test that measures social-networking
skills. The purpose of significance tests is to weight the evidences that the data give in favor of
such claims. That is, a test helps us know if we found what we were looking for.
To do this, we ask what would happen if the claim were not true. That’s the null hypothesis (no
difference between the two drugs, no difference between women and men). A significance test
answers only one question: “How strong is the evidence that the null hypothesis is not true?” A
test answers this question by giving a P-value. The P-value tells us how unlikely data as or more
extreme than ours (in the sense of providing evidence against the null hypothesis) would be if
the null hypothesis were true. Data that are very unlikely are good evidence that the null
hypothesis is not true. We usually don’t know whether the hypothesis is true for this specific
population. All we can say is that “data as or more extreme than these would occur only 5% of
the time if the hypothesis were true.”
This kind of indirect evidence against the null hypothesis (and for the effect we hope to find) is
less straightforward than a confidence interval.
Know what your methods require. Significance test and confidence interval for a proportion p
require that the population be much larger than the sample. They also require that the sample
itself be reasonably large so that the sampling distribution of the sample proportion p̂ is close to
Normal. We have said little about the specifics of these requirements because the reasoning of
inference is more important. Just as there are inference methods that fit stratified samples,
there are methods that fit small samples and small populations.
JLM (WSU)
STA 1020
Ch22 - What is a Test of Significance?
INFO: Sometimes, the alternative hypothesis Ha is denoted by H1 .
Example: Finding Sample Size Required to Achieve 80% Power. Here is a
statement similar to the one in an article from the Journal of the American
Medical Association: “The trial design assumed that with a 0.05
significance level, 153 randomly selected subjects would be needed to
achieve 80% power to detect a reduction in the coronary heart disease rate
from 0.5 to 0.4.” Before conducting the experiment, the researchers
selected a significance level of 0.05 and a power at least 80%. They also
decided that a reduction in the proportion of coronary heart disease from
0.5 to 0.4 is an important difference that they want to detect (by correctly
rejecting the false null hypothesis). Using a significance level of 0.05,
power 0.80, and the alternative proportion of 0.4, we deduce that the
required minimum sample size is 153.
Related to Power of a test Check Wikipedia
91 / 112
JLM (WSU)
Exercise Ch22
STA 1020
Ch22 - What is a Test of Significance?
22.26 Do chemists have more girls? Some people think that chemists
are more likely than other parents to have female children. (Perhaps
chemists are exposed to something in their laboratories that affects the sex
of their children.) The Washington State Department of Health lists the
parents occupations on birth certificates. Between 1980 and 1990, 555
children were born to fathers who were chemists. Of these births, 273 were
girls. During this period, 48.8% of all births in Washington State were
girls. Is there evidence that the proportion of girls born to chemists is
higher than the state proportion?
Extra. . .
92 / 112
Exercise (answer) Ch22
**Answers
Do chemists have more girls? Our hypotheses are H0 : p = 0.488 and
Ha : p > 0.488, where p is the proportion of girls among children born to
chemists. If the null hypothesis is true, then the proportion of girls in an
SRS of n=555 chemists’ children would have (approximately) a normal
distribution with mean p = p0 = 0.488 and standard deviation
p
p
p(1 − p)/n = (0.488)(0.512)/555 = 0.02122
Our sample had p̂ = 273/555 = 0.4919, for which the standard score is
z = (p̂ − p0 )/0.02122 = (0.4919 − 0.488)/0.02122 = 0.18
From Table B [> find (0.2) 7→ (57.93%) and (0.1) 7→ (53.98%) so take
percentile 57.93% and 1 − 0.5793 = 0.4207 <], we estimate the P-value
to be about 0.42 (calculator/better table output gives P = 0.4272). Thus,
we cannot reject the null hypothesis (not enough evidence!)
JLM (WSU)
STA 1020
Ch22 - What is a Test of Significance?
93 / 112
JLM (WSU)
Multiple choice Ch22
STA 1020
94 / 112
Ch24 - Two-way Tables and the Chi-Square Test
If the value of the standard test statistic z is 2.5 then (a) we should use a
different null hypothesis. (b) we reject the null hypothesis at the 5%
significance level. (c) we fail to reject the null hypothesis at the 5%
significance level. (d) we reject the alternative hypothesis at the 5%
significance level.
Answer: (b)
STA 1020
Fall 2013 Section 09 MWF 10:40-11:35 0035 State
.......................................................................
If a significance test gives a P-value of 0.50 then (a) the margin of error is
0.50. (b) the null hypothesis is very likely to be true. (c) we do not have
good evidence against the null hypothesis. (d) we do have good evidence
against the null hypothesis.
Answer: (c)
Instructor: Dr. J.L. Menaldi
Textbook - Statistics: Concepts and Controversies,
by David S. Moore and William I. Notz, 2013, W.H. Freeman & Company [8th ed]
Class Link: http://www.math.wayne.edu/˜menaldi/teach/13f1020.htm
“Statistics” is the Science of collecting, describing and interpreting data...
It is said that “Probability” is the vehicle of Statistics, i.e., if were not for the laws
of probability, the theory of statistics would not be possible
JLM (WSU)
STA 1020
95 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
16 / 19
96 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch24 - Two-way Tables and the Chi-Square Test
Ex1 & Ex2 Two-way Tables
Ch24 - Two-way Tables and the Chi-Square Test
Chapter 24
A university offers only two degree programs, one in electrical engineering
and one in English
*Admission Status is the row variable
*Gender is the column variable
Admit
Deny
Total
* (% male)=80/140 = 0.57, i.e., 57%
* (% female)=60/140 = 0.43, i.e., 43%
Male
35
45
80
Female
20
40
60
Total
55
85
140
Discrimination in admission? Because there are only two categories of
admission status, we can see the relation between gender and admission
status by comparing the
(percentage male applicants admitted) = 35/80 = 0.44, i.e., 44%
(percentage female applicants admitted) = 20/60 = 0.33, i.e., 33%
STA 1020
Ch24 - Two-way Tables and the Chi-Square Test
A random sample of registered voters were asked whether they preferred
balancing the budget or cutting taxes. Each was then categorized as being
either a Democrat or a Republican. Of the 30 Democrats, 12 preferred
cutting taxes, while of the 40 Republicans, 24 preferred cutting taxes.
Democrats Republican Total
Prefer Tax Cutting
12
24
36
Do not Prefer TxC
18
16
34
Total
30
40
70
....................................................................................
How would you
display the data in
a table?
To describe relationships among categorical variables, calculate appropriate
percentage from counts given.
JLM (WSU)
Thought Questions. . .
When there are two categorical variables, the data are summarized in a
two-way table
each row represents a value of the row variable
each column represents a value of the column variable
* The number of observations falling into each combination of categories is entered into
each cell of the table.
* Relationships between categorical variables are described by calculating appropriate
percents from the counts given in the table (prevents misleading comparisons due to
unequal sample sizes for different groups)
JLM (WSU)
97 / 112
Ex3 Treating cocaine addiction
STA 1020
Ch24 - Two-way Tables and the Chi-Square Test
98 / 112
Ex3 Bar graph
. . . A three-year study compared an antidepressant (desipramine) with
lithium, and a placebo. 72 subjects were randomly divided into 3 groups
(each having 24 subjects) and assigned to each treatment
Group
1
2
3
Treatment
Desipramine
Lithium
Placebo
Subjects
24
24
24
Successes
14
6
4
Percent
58.3%
25.0%
16.7%
Are these data good evidence that there is a relationship between
treatment and outcome in the population of all cocaine addicts?
To answer this question we
begin with a two-way table
Desipramine
Lithium
Placebo
Success
14
6
4
Failure
10
18
20
Total
24
24
24
Figure 24.1 Bar graph comparing the success rates of three treatments for cocaine addiction
JLM (WSU)
STA 1020
Ch24 - Two-way Tables and the Chi-Square Test
99 / 112
JLM (WSU)
Chi-square test
Our null hypothesis takes the form
In a two-way table when H0 is true we computer
Desipramine
Lithium
Placebo
(row total)×(column total)
(table total)
e.g., the expect count of successes in the desipramine group is
(24)(24)/72 = 8, namely, if the null hypothesis of no treatment differences
is true then we expect 8 of the 24 desipramine subjects to succeed
The chi-square statistic, denoted by χ2 , is a measure of how far the
observed count in a two-way table are from the expected counts
χ2 =
where
P
100 / 112
Ex4 Cocaine addiction (cont)
Here are the observed and expected counts
H0 :There is no association between treatment and success in the
population of all cocaine addicts
(expected count) =
STA 1020
Ch24 - Two-way Tables and the Chi-Square Test
Observed
Success Failure
14
10
6
18
4
20
Expected
Success Failure
8
16
8
16
8
16
Finding the chi-square statistics, adding 6 terms for the 6 cells in the
two-way table note that all “failure” values are obtainable from the “success”
values
(14 − 8)2 (10 − 16)2
(4 − 8)2 (20 − 16)2
+
+ ··· +
+
=
8
16
8
16
= 4.50 + 2.25 + · · · + 2.00 + 1.00 = 10.50
χ2 =
P [(observed count)−(expected count)]2
(expected count)
Now it’s your turn: Smoking and survival . . .
means “sum over all cells in the table”
JLM (WSU)
STA 1020
101 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
17 / 19
102 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch24 - Two-way Tables and the Chi-Square Test
Chi-square distribution
Ch24 - Two-way Tables and the Chi-Square Test
The chi-square statistic is a measure of the distance of the observed
counts from the expected counts
Chi-square table
Figure 24.2 The density curves for three members of the chisquare family of distributions. The sampling distributions of chisquare statistics belong to this family
is always zero or positive and skewed to the right
is only zero when the observed counts are exactly equal to the
expected counts
large values of χ2 are evidence against H0 because these would show
that the observed counts are far from what would be expected if H0
were true
** In a two-way table
the chi-square test is one-sided (any violation of H0 produces a large
value of χ2 )
df = (r − 1)(c − 1)
A specific χ2 distribution requires to know the degree of freedom (in short
df), which is computed as (r − 1)(c − 1) for a two-way table with r rows
and c columns
JLM (WSU)
STA 1020
Ch24 - Two-way Tables and the Chi-Square Test
103 / 112
Hence, the cocaine study shows a significant relationship P < 0.01
between treatment and success.
Conclusion: We found a strong evidence of some association between
treatment and success, and by looking at the two-way table, we see that
desipramine works better than the other treatments.
NOTE: You can safely use the chi-square test when no more than 20% of
the expected counts are less than 5 and all individual expected counts are
1 or greater
STA 1020
CHD count
No CHD
=
Low
53
3057
High
27
606
Low
69.73
3040.27
Expected
Moderate
106.08
4624.92
104 / 112
Ex6 . . . heart disease?
Low Moderate High Total
First step is to write the data
CHD count
53
110
27
190
as a two-way table, by adding
No CHD
3057
4621
606 8284
the count of subjects who did
Total
3110
4731
633 8474
not suffer form heart disease
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The chi-square method tests these hypotheses:
H0 : no relationship between anger and CHD
Ha : some relationship between anger and CHD
There are r = 2 rows and c = 3 columns, so df = (2 − 1)(3 − 1) = 2
JLM (WSU)
105 / 112
STA 1020
Ch24 - Two-way Tables and the Chi-Square Test
106 / 112
Ex7 Discrimination in admissions?
** The effects of lurking variables can change and even reverse
relationship between two variables
(row 1 total) × (column 3 total)
(190)(633)
=
= 14.19
(table total)
8474
Observed
Moderate
110
4621
STA 1020
Anger Score
People who get angry easLow Moderate High
ily tend to have more heart
disease.
. . . 8474 peoSample size
3110
4731
633
ple . . . coronary heart disease
CHD count
53
110
27
(CHD)
CHD percent 1.7%
2.3%
4.3%
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Ex6 . . . heart disease? (cont)
Find the expected cell count, e.g., of high-anger people with CHD is
(expected count)
JLM (WSU)
Ch24 - Two-way Tables and the Chi-Square Test
Thus, the chi-square statistic has (3 − 1)(2 − 1) = 2 degree of freedom.
P
From the
we found χ2 = 10.5, so we look in the Table 24.1 for df = 2
to find the critical value 9.21 required for significance at the α = 0.01
level, and 13.82 for α = 0.001.
Ch24 - Two-way Tables and the Chi-Square Test
r = (number of rows),
c = (number of columns)
Ex5 Using chi-square test
Back to Ex3, the two-way table has 3 treatment and 2 outcomes, i.e., it
has r = 3 rows and c = 2 columns.
JLM (WSU)
** There
P are r × c terms in
the , where
Ex7: Discrimination in admissions? Go back to Ex1. Suspect women
discrimination. From the two-way table we found
(percentage male applicants admitted) = 35/80 = 0.44, i.e., 44%
(percentage female applicants admitted) = 20/60 = 0.33, i.e., 33%
High
14.19
618.81
In its defense, the University produces a three-way table
** It is safe to apply the chi-square test since all expected cell counts are greater
than 5, so
χ
2
=
=
(53 − 68.73)2
(110 − 106.08)2
(4621 − 4624.92)2
(606 − 618.81)2
+
+ ··· +
+
=
68.73
106.08
4624.92
618.81
4.014 + 0.145 + · · · + 0.003 + 0.264 = 16.083
** For df = 2 in Table 24.1 the χ2 = 16.083 is larger than the critical value 13.82
for α = 0.001. We have highly significant evidence (P < 0.001) that anger and
heart disease are related. Statistical software can give the actual P-value of P = 0.0003.
JLM (WSU)
STA 1020
107 / 112
http://www.math.wayne.edu/˜menaldi/teach/
Admit
Deny
Total
Engineering
Male Female
30
10
30
10
60
20
English
Male Female
5
10
15
30
20
40
Combined
Male Female
35
20
45
40
80
60
% Admit
50%
25%
44%
JLM (WSU)
100%
STA 1020
18 / 19
25%
33%
108 / 112
** STA 1020 - Part 3 (08/Dec/13) **
Ch24 - Two-way Tables and the Chi-Square Test
Simpson’s paradox
Ch24 - Two-way Tables and the Chi-Square Test
** Simpson’s paradox: An association or comparison that holds for all of
several groups can disappear or even reverse direction when the data are
combined to form a single group.
This is just an extreme form of the fact that observed associations can be
misleading when there are lurking variables . . .
** Summary:
Make a two-way table to display the relationship between two
categorical variables
Exercise Ch24
24.5 Smoking by students and their families. How are the smoking
habits of students related to the smoking habits of their close family
members? Here is a two-way table from a survey of male students in six
secondary schools in Malaysia:
Student
At least one close family member smokes
No close family member smokes
smokes
115
25
does not smoke
207
75
Write a brief answer to the question posed, including a comparison of
selected percentages.
Conclude by using the P-value (critical) of the chi-square statistic
*Read Ex9: Discrimination in mortgage lending?
*Case Study Evaluated
Chi-square Table and Tables 21.1 & 24.1
JLM (WSU)
STA 1020
Ch24 - Two-way Tables and the Chi-Square Test
109 / 112
JLM (WSU)
Exercise (answer) Ch24
**Answers
The table below shows the percent of male students who smoke within
each status of family member smoking status.
At least one close family member smokes
No close family member smokes
STA 1020
Ch24 - Two-way Tables and the Chi-Square Test
115/322 = 35.7%
25/100 = 25%
In our sample, male students with at least one close family member who
smokes are more likely to smoke than are male students with no close
family member who smokes.
110 / 112
Multiple choice Ch24
Which of these is an example of Simpson’s paradox? (a) Teachers’ salaries and
sales of alcoholic beverages have risen together over time, but paying teachers
more does not cause higher alcohol sales. (b) Alaska Air has a lower percent of
late flights than America West at every airport, but America West has a lower
percent when we combine all airports. (c) The percent of surgery patients given
Anesthetic A who die is higher than the percent for Anesthetic B, but this is
because A is used in more serious surgeries. (d) States in which a smaller percent
of students take the SAT exam have higher median scores on the SAT.
Answer: (b)
.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
If surgical procedure A has a higher success rate than surgical procedure B in
every hospital where they are used and yet procedure B has a higher overall
success rate, then we suspect that: (a) this is an example of Simpson’s paradox.
(b) it must be easier to achieve success at some hospitals than at others,
whatever procedure is used. (c) procedure B must be used predominantly in
hospitals where it is easier to achieve success, while procedure A must be used
predominantly where it is harder to achieve success. (d) All of (a), (b), and (c)
are true.
Answer: (d)
JLM (WSU)
STA 1020
111 / 112
http://www.math.wayne.edu/˜menaldi/teach/
JLM (WSU)
STA 1020
19 / 19
112 / 112
Download