Notes 5-Slides: Hypothesis tests

advertisement
Statistical Inference and Regression
Analysis:
Stat-GB.3302.30, Stat-UB.0015.01
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics
Part 5 – Hypothesis Testing
Part 5 – Hypothesis Testing
3/100
Objectives of Statistical Analysis
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Percent
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Listing

Estimation
 How long do hard drives last?
 What is the median income among the
99%ers?
Inference – hypothesis testing
 Did minorities pay higher mortgage rates
during the housing boom?
 Is there a link between environmental factors
and breast cancer on eastern long island?
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
4/100
General Frameworks

Parametric Tests: features of specific
distributions such as the mean of a Bernoulli or
normal distribution.

Specification Tests (Semiparametric)
Do the data arrive from a Poisson process
Are the data normally distributed


800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Frequency
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Nonparametric Tests: Are two discrete
processes independent?
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
5/100
Hypotheses
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

Frequency

Listing

Hypotheses - labels
 State 0 of Nature – Null Hypothesis
 State 1 – Alternative Hypothesis
Exclusive: Prob(H0 ∩ H1) = 0
Exhaustive: Prob(H0) + Prob(H1) = 1
Symmetric: Neither is intrinsically
“preferred” – the objective of the study is
only to support one or the other. (Rare?)
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
6/100
Testing Strategy
Before the investigation begins
Prior beliefs: Prob(H 0 ), Prob(H1 )
Prob(H 0 )
Prob(H1 )
Prior odds:
Results of the investigation:
Likelihood of the observed data assuming H 0 or H1
Prob(data|H 0 )
Likelihood ratio:
Prob(data|H1 )
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
7/100
Posterior (to the Evidence) Odds
Prob(data|H 0 )Prob(H 0 )
Prob(H 0 | data) 
Prob(data)
Prob(data|H1 )Prob(H1 )
Prob(H1 | data) 
Prob(data)
Prob(H 0 | data)
Posterior odds =
Prob(H1 | data)
Prob(data|H 0 )Prob(H 0 )
Prob(data|H1 )Prob(H1 )
=
 Prob(H 0 )  Prob(data|H 0 ) 
= 


Prob(H
)
Prob(
data
|H
)

1 
1 
= Prior odds  Likelihood ratio
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
8/100
Does the New Drug Work?



Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
Posterior odds in favor of H0 =
(.4/.6)(.0270059/.0148156) = 1.2152 > 1
Priors favored H1 1.5 to 1, but the posterior odds favor
H0 , 1.2152 to 1. The evidence discredits H1 even though
the ‘data’ seem more consistent with prior P1.
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

L0 (31|  =.50) = Binomial(50,31,.50) = .0270059
L1 (31|  =.75) = Binomial(50,31,.75) = .0148156
Percent

Frequency

Listing

Hypotheses: H0 = .50, H1 = .75
Priors:
P0 = .40, P1 = .60
Clinical Trial: N = 50, 31 patients “respond’” p = .62
Likelihoods:
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
9/100
Decision Strategy
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency

Listing

Prefer the hypothesis with the higher posterior
odds
A gap in the theory: How does the investigator
do the cost benefit test?
 Starting a new business venture or entering
a new market: Priors and market research
 FDA approving a new drug or medical
device. Priors and clinical trials
Statistical Decision Theory adds the costs and
benefits of decisions and errors.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
10/100
An Alternative Strategy

Recognize the asymmetry of null and
alternative hypotheses.

Eliminate the prior odds (which are
rarely formed or available).
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
11/100
http://query.nytimes.com/gst/fullpage.html?res=9C00E4DF113BF935A3575BC0A9649C8B63
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
12/100
Classical Hypothesis Testing
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency

Listing

Percent

The scientific method applied to statistical hypothesis testing
Hypothesis: The world works according to my hypothesis
Testing or supporting the hypothesis
 Data gathering
 Rejection of the hypothesis if the data are inconsistent with it
 Retention and exposure to further investigation if the data are
consistent with the hypothesis
Failure to reject is not equivalent to acceptance.
Listing

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
13/100
Asymmetric Hypotheses
Null Hypothesis: The proposed state
of nature
 Alternative hypothesis: The state of
nature that is believed to prevail if the
null is rejected.

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
14/100
Hypothesis Testing Strategy
Formulate the null hypothesis
 Gather the evidence
 Question: If my null hypothesis were
true, how likely is it that I would have
observed this evidence?

Very unlikely: Reject the hypothesis
 Not unlikely: Do not reject. (Retain the
hypothesis for continued scrutiny.)

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
15/100
Some Terms of Art



Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Type I error: Incorrectly rejecting a true null
Type II error: Failure to reject a false null
Power of a test: Probability a test will correctly reject a false null
Alpha level: Probability that a test will incorrectly reject a true null.
This is sometimes called the size of the test.
Significance Level: Probability that a test will retain a true null = 1 –
alpha.
Rejection Region: Evidence that will lead to rejection of the null
Test statistic: Specific sample evidence used to test the hypothesis
Distribution of the test statistic under the null hypothesis:
Probability model used to compute probability of rejecting the null.
(Crucial to the testing strategy – how does the analyst assess the
evidence?)
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
16/100
Possible Errors in Testing
Hypothesis is
True
Correct
Decision
I Do Not Reject
the Hypothesis
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Type II Error
Correct
Type I Error
Decision
I Reject the
Hypothesis
Mushroom and Onion
9.2%
Hypothesis is
False
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
17/100
A Legal Analogy:
The Null Hypothesis is INNOCENT
Null Hypothesis
Not Guilty
Finding: Verdict
Not Guilty
Alternative Hypothesis
Guilty
Type II Error
Guilty defendant goes
free
Correct Decision
Type I Error
Finding: Verdict
Guilty
Innocent defendant is
convicted
Correct Decision
The errors are not symmetric. Most thinkers consider Type I
errors to be more serious than Type II in this setting.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
18/100
(Jerzy) Neyman –
(Karl) Pearson Methodology
“Statistical” testing
 Methodology

Formulate the “null” hypothesis
Decide (in advance) what kinds of
“evidence” (data) will lead to rejection of the
null hypothesis. I.e., define the rejection
region
Gather the data
Mechanically carry out the test.

Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
19/100
Formulating the Null Hypothesis
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Percent
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Listing

Stating the hypothesis: A belief about the “state of nature”
 A parameter takes a particular value
 There is a relationship between variables
 And so on…
The null vs. the alternative
 By induction: If we wish to find evidence of something, first
assume it is not true.
 Look for evidence that leads to rejection of the assumed
hypothesis.
 Evidence that rejects the null hypothesis is significant
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
20/100
Example: Credit Scoring Rule
Investigation: I believe that Fair Isaacs
relies on home ownership in deciding
whether to “accept” an application.


900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Percent
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
What decision rule should I use?
Listing

Null hypothesis: There is no relationship
Alternative hypothesis: They do use
homeownership data.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
21/100
Some Evidence
= Homeowners
5469
5030
1845
1100
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
22/100
Hypothesis Test
Acceptance rate for homeowners =
5030/(5030+1100) = .82055
 Acceptance rate for renters is .74774
 H0: Acceptance rate for renters is not
less than for owners.
 H0: p(renters) > .82055
 H1: p(renters) < .82055

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
23/100
The Rejection Region
What is the “rejection region?”
 Data (evidence) that are inconsistent
with my hypothesis
 Evidence is divided into two types:
Data that are inconsistent with my
hypothesis (the rejection region)
 Everything else

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
24/100
My Testing Procedure
I will reject H0 if p(renters) < .815
(chosen arbitrarily)
 Rejection region is sample values of
p(renters) < 0.815

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
25/100
Distribution of the Test Statistic
Under the Null Hypothesis

Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Test statistic
p(renters) = 1/N i Accept(=1 or 0)
Use the central limit theorem:
Assumed mean = .82055
Implied standard deviation
= sqr(.82055*.17945/7413)=.00459
Using CLT, normally distributed. (N is very
large).
Use z = (p(renters) - .82055) / .00459
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
26/100
Alpha Level and Rejection Region
Prob(Reject H0|H0 true)
= Prob(p < .815 | H0 is true)
= Prob[(p - .82055)/.00459)
= Prob[z < -1.209]
= .11333
 Probability of a Type I error
 Alpha level for this test

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
27/100
Distribution of the Test Statistic
and the Rejection Region
Area=.11333
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
28/100
The Test
800000
800000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
700000
600000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
Frequency
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
The null hypothesis is rejected at the
11.333% significance level (by the
design of the test)
Listing
Pepperoni
21.8%

Listing
Meatball
Garlic 5.0%
2.3%
The observed proportion is
5469/(5469+1845) = 5469/7314 =
.74774
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
29/100
Power of the test
Power = probability that the test will reject a false null
Power depends on the alternative hypothesis
For this test, for specific renter ,
Power = Prob[p renter  .815 | renter = the value]


p renter  renter
.815  renter
= Prob 


renter (1  renter ) / 7314 
 renter (1  renter ) / 7314


.815  renter
= Prob  z 
 using the normal distribution
renter (1  renter ) / 7314 

= 1 - (renter )
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
30/100
Power Function for the Test
(Power = size when alternative = the null.)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
31/100
Application: Breast Cancer On
Long Island
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency

Listing

Null Hypothesis: There is no link between the high
cancer rate on LI and the use of pesticides and toxic
chemicals in dry cleaning, farming, etc.
Neyman-Pearson Procedure
 Examine the physical and statistical evidence
 If there is convincing covariation, reject the null
hypothesis
 What is the rejection region?
The NCI study:
 Working null hypothesis: There is a link: We will
find the evidence.
 How do you reject this hypothesis?
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
32/100
Formulating the Testing Procedure
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Percent
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Listing

Usually: What kind of data will lead me
to reject the hypothesis?
Thinking scientifically: If you want to
“prove” a hypothesis is true (or you want
to support one) begin by assuming your
hypothesis is not true, and look for
evidence that contradicts the
assumption.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
33/100
Hypothesis About a Mean
I believe that the average income of
individuals in a population is $30,000.


Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

Frequency

I will draw the sample and examine the
data.
The rejection region is data for which the
sample mean is far from $30,000.
How far is far????? That is the test.
Listing

H0 : μ = $30,000 (The null)
H1: μ ≠ $30,000 (The alternative)
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
34/100
Application
The mean of a population takes a
specific value:
 Null hypothesis:
H0: μ = $30,000
H1: μ ≠ $30,000
 Test: Sample mean close to
hypothesized population mean?
 Rejection region: Sample means that
are far from $30,000

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
35/100
Deciding on the
Rejection Region


If the sample mean is far from $30,000, reject the
hypothesis.
Choose, the region, for example,
Rejection
Rejection
29,500
30,000
30,500
The probability that the mean falls in the rejection region
even though the hypothesis is true (should not be rejected)
is the probability of a type 1 error. Even if the true mean
really is $30,000, the sample mean could fall in the rejection
region.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
36/100
Reduce the Probability of a Type I Error by
Making the (non)Rejection Region Wider
Reduce the probability of a type I error by moving the
boundaries of the rejection region farther out.
Probability outside
this interval is large.
28,500 29,500
You can make a type I error
impossible by making the
rejection region very far from the
null. Then you would never make
a type I error because you would
never reject H0.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
31,500
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
30,500
Probability outside this
interval is much smaller.
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
30,000
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
37/100
Setting the α Level
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency

Listing

“α” is the probability of a type I error
Choose the width of the interval by
choosing the desired probability of a type
I error, based on the t or normal
distribution. (How confident do I want to
be?)
Multiply the z or t value by the standard
error of the mean.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
38/100
Testing Procedure
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

Frequency

Listing

The rejection region will be the range of
values
greater than μ0 + zσ/√N or
less
than μ0 - zσ/√N
Use z = 1.96 for 1 - α = 95%
Use z = 2.576 for 1 - α = 99%
Use the t table if small sample, variance
is estimated and sampling from a normal
distribution.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
39/100
Deciding on the
Rejection Region

If the sample mean is far from $30,000, reject
the hypothesis.

Choose, the region, say,
Rejection
Rejection
$30,000  1.96


$30,000  1.96
N
N
I am 95% certain that I will not commit a type I error (reject the
hypothesis in error). (I cannot be 100% certain.)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
40/100
The Testing Procedure (For a Mean)
Reject if x < 0 -1.96
x - 0 > 1.96
or
x - 0

> 1.96
x - 30,000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
200000
< -1.96
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
Listing
500000
N
 1.96
Frequency
800000
800000
600000
/ N
Percent
Sausage
5.8%
900000
or
x - 0

/ N
or z < -1.96
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
x - 0 < -1.96
N
Reject if
Meatball
Garlic 5.0%
2.3%
or
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
or
Pie Chart of Percent vs Type
N
N
/ N
or z > 1.96
Mushroom and Onion
9.2%

Percent
Reject if x > 0  1.96

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
41/100
The Test Procedure
Choosing z = 1.96 makes the
probability of a Type I error 0.05.
 Choosing z = 2.576 would reduce the
probability of a Type I error to 0.01.
 Reducing the probability of a Type I
error reduces the power of the test
because it reduces the probability that
the null hypothesis will be rejected.

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
42/100
P Value
800000
800000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
700000
600000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
Frequency
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
Null hypothesis is rejected if
P value < 
Listing
Pepperoni
21.8%

Listing
Meatball
Garlic 5.0%
2.3%
Probability of observing the sample
evidence assuming the null
hypothesis is true.
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
43/100
P value < 
Prob[p(renter) < .74774] = Prob[z < (.74774 - .82055)/.00459]
= (-15.86) = .59946942854362260 * 10-56
Impossible
=.11333
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
44/100
Confidence Intervals
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Frequency
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
For a two sided test about a
parameter, a confidence interval is the
complement of the rejection region.
(Proof in text, p. 338)
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
45/100
Confidence Interval

If the sample mean is far from $30,000, reject
the hypothesis.

Choose, the region, say,
Rejection
Confidence
$30,000  1.96

Rejection

$30,000  1.96
N
N
I am 95% certain that the confidence interval contains the true
mean of the distribution of incomes. (I cannot be 100% certain.)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
46/100
One Sided Tests



H0  = 0, H1   0
Rejection region is sample mean far from 0 in
either direction
H0  = 0, H1  > 0. Sample means less than
0 cannot be in the rejection region.
Entire rejection region is above 0.
Reformulate: H0  < 0, H1  > 0.

Rejection region is $30,000+1.645
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
N
0
1000000
60
800000
40
Listing

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
47/100
Likelihood Ratio Tests
Likelihood(0 | data)
General Format: Likelihood Ratio =
Likelihood(1 | data)
Resembles posterior odds with equal priors for a simple null
and a simple alternative. (E.g., Poisson mean = 2 or 1.1)
Practical format: Simple null vs. composite alternative
Likelihood(0 | Information in H 0 )
 =Likelihood Ratio =
Likelihood( | All sample information)
Small value of  weighs against H 0 .
In standard format, reject H 0 if  < c* for a given .
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
48/100
Carrying Out the LR Test
In most cases, exact distribution of the
statistic is unknown
 Use -2log  Chi squared [1]
 For a test about 1 parameter, threshold
value is 3.84 (5%) or 6.45 (1%)

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
49/100
Poisson Likelihood Ratio Test
y = 5, 0, 1, 1, 0, 3, 2, 3, 4, 1
f(y|) =
exp(-) y
=Poisson
y!
exp()5 exp()0 exp()1
Likelihood =
...
5!
0!
1!
20
exp(10)
=
207,360
Log likelihood = -10 + 20log - 12.242
Maximum occurs at  = 2,
LogL = -18.379
Null Hypothesis
 = 1.10 LogL = -21.335
Chi squared = -2(-18.379-(-21.335)) = 5.912 (Reject H 0 )
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
49
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
50/100
Generalities About LR Test
log L | less sample information

log L | more sample information
More sample information will mean estimating more parameters.
Test cannot be used for a simple null vs. a simple alternative.
2 log  = chi squared with degrees of freedom equal to the
difference in the number of parameters.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
51/100
Gamma Application
Income data: N = 27,326
 P e Income Income P 1
Gamma Model: f(income)=
(P)
LogL  NP log   N log (P)
   i 1 Incomei  (P  1)  i 1 log Incomei
N
N
Maximum Likelihood Estimates are
 =2.55971, P = 4.55320,
LogL = 12574.88
Fix P at 3.5. Estimate of  =2.29664, LogL = 12106.56
Chi squared = 936.64 much larger than 3.84. Reject H 0 .
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
52/100
Specification Tests
Generally a test about a distribution
where the alternative is “some other
distribution.”
 Test is generally based on a feature of
the distribution that is true under the
null but not true under the alternative.

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
53/100
Poisson Specification Tests
3820 observations on doctor visits
 Poisson distribution?

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
54/100
Deviance Test
Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Poisson Distribution p(x) = exp(-)x/x!
H0: Everyone has the same Poisson Distribution
H1: Everyone has their own Poisson distribution
Under H0, observations will tend to be near the mean.
Under H1, there will be much more variation.
Likelihood ratio statistic (Text, p. 348)
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
55/100
Deviance Test
 xi 
2 log L  2 i 1 x i log   (0  log0 = 0)
x
Chi squared has N-1 degrees of freedom.
N
Sample value 17,862 with 3,820 degrees of freedom.
Treat as normal: (17,862 - 3820)/ 2(3820)  160.65
REJECT Null Hypothesis of Poisson
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
56/100
Dispersion Test

Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Poisson Distribution p(x) = exp(-)x/x!
H0: The distribution is Poisson
H1: The distribution is something else
Under H0, the mean will be (almost) the same as
the variance
Approximate Likelihood ratio statistic (Text, p. 348)
= N * Variance / Mean
For the doctor visit data, this is 22,348.6 vs. chi
squared with 1 degree of freedom. H0 is rejected.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
57/100
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
58/100
Specification Test - Normality
800000
800000
500000
400000
Mushroom
16.2%
Plain
32.5%
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
700000
600000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
Frequency
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
Compare observed 3rd and 4th
moments to what would be expected
from a normal distribution.
Listing
Pepperoni
21.8%

Listing
Meatball
Garlic 5.0%
2.3%
Normal Distribution is symmetric and
has kurtosis = 3.
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
59/100
Symmetric and Skewed Distributions
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
60/100
Kurtosis: t[5] vs. Normal
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
=3
= 3 + 6/(k-4); for t[5] = 3+6/(5-4) = 9.
Percent
Kurtosis of normal(0,1)
Kurtosis of t[k]
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
61/100
Bowman and Shenton Test for
Normality
 m / s3 2
 3  
Chi Squared [2] = N 

6

 x

N
mj
i 1
i
 x
N 1
m

2
4

/
s

3

4


24

j
, s = m2
For the income data, chi squared = 1709.62 vs. 5.99.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
62/100
Testing for a Distribution
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency

Listing

H0: The distribution is assumed
H1: The assumed distribution is incorrect
Strategy: Do the features of the sample resemble
what we would observe if H0 were correct
 Continuous: CDF of data resemble CDF of the
assumed distribution
 Discrete: Sample cell probabilities resemble
predictions from the assumed distribution
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
63/100
Probability Plot for Normality
x1 , x 2 ,..., x N hypothesized to be a sample from N[,].
Sample mean = x, sample standard deviation = s
Sort the data into x (1)  x (2)  ...  x ( N) .
Theoretical quantiles of normal with mean x and standard deviation s is
 k 
x̂ (k )  x  s  1 
.
 N 1 
Plot of x (k ) vs. xˆ (k ) should lie on a diagonal line.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
64/100
Normal (log)Income?
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
65/100
Random Sample from Normal
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
66/100
Normality Tests
Bowman and Shenton based on 3rd and 4th moments
Kolmogorov-Smirnov based on CDF
D N  sup x | Theoretical cdf - Sample cdf |
Test statistic is
N DN
Rough critical values values (95%)
N=20, .294
N=25, .27
N=30, .24
N=35, .23
N>35, use 1.36/ N for 95%
use 1.63/ N for 99%
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
67/100
Kolmogorov - Smirnov Test
Sorted data
min = x (1)
x (2)
...
x (k )
...
max = x ( N)
1
2
k
N
...
...
N 1
N 1
N 1
N 1
x x
 x (2)  x 
 x (k )  x 
 x ( N)  x 
Theoretical CDF   (1)

...

...








s
s
s
s








Empirical CDF
For log income data, K-S = 0.1181
Critical values with N = 3820 are
0.0219 and 0.0262. The hypothesis
of normality would be rejected.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
68/100
Chi Squared Test for a Discrete
Distribution
Outcomes = A1, A2,…, AM
 Predicted probabilities based on a
theoretical distribution = E1(),
E2(),…,EM().
 Sample cell frequencies = O1,…,OM

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
69/100
Test Statistics
 Om 
Deviance = 2 m 1 O m log 
, (0log0 = 0)
 E m () 
2
M [Oi  E i ()]
Pearson chi squared N  m 1
E i ()
M
Both distributed chi squared with degrees of freedom
M - 1 - number of parameters in  under the null hypothesis.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
70/100
Adapted from Richard Isaac,
The Pleasures of Probability,
Springer Verlag, 1995, pp. 99101.
V2 Rocket Hits
576 0.25Km2 areas of South London in a grid (24 by 24)
535 rockets were fired randomly into the grid = N
P(a rocket hits a particular grid area) = 1/576 = 0.001736 = θ
Expected number of rocket hits in a particular area = 535/576 = 0.92882
How many rockets will hit any particular area? 0,1,2,… could be anything
up to 535.
The 0.9288 is the λ for a Poisson distribution:
exp(-λ)λ#hits
P(# hits) 
, # hits  0,1,2,...
# hits!
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
71/100
1
2
3
4
5
6
7
8
9
10
11
12
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
700000
500000
6
200000
2
1
100000
15000
800000
1000000
Normal
8
5
400000
600000
Listing
Empirical CDF of Listing
Mean
StDev
N
100
369687
156865
51
10
4
200000
13
Marginal Plot of Listing vs IncomePC
80
300000
0
12
12
400000
10
11
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
400000
10
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
90
500000
9
Scatterplot of Listing vs IncomePC
Normal - 95% CI
600000
200000
8
Probability Plot of Listing
99
700000
300000
100000
7
95
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
6
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pepperoni
21.8%
5
Percent
Meatball
Garlic 5.0%
2.3%
4
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
3
Listing
2
Percent
1
Listing
13
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
1
72/100
2
3
4
5
6
7
8
9
10
11
12
13
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
400000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
700000
500000
6
200000
2
1
100000
15000
800000
1000000
Normal
8
5
400000
600000
Listing
Empirical CDF of Listing
Mean
StDev
N
100
369687
156865
51
10
4
200000
13
Marginal Plot of Listing vs IncomePC
80
300000
0
12
12
400000
10
11
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
10
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
90
600000
9
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
200000
8
Probability Plot of Listing
99
95
300000
100000
7
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Sausage
5.8%
6
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
5
Percent
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
4
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
3
Listing
2
Percent
1
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
73/100
1
2
3
4
5
6
7
8
9
10
11
12
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
700000
500000
6
200000
2
1
100000
15000
800000
1000000
Normal
8
5
400000
600000
Listing
Empirical CDF of Listing
Mean
StDev
N
100
369687
156865
51
10
4
200000
13
Marginal Plot of Listing vs IncomePC
80
300000
0
12
12
400000
10
11
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
400000
10
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
90
500000
9
Scatterplot of Listing vs IncomePC
Normal - 95% CI
600000
200000
8
Probability Plot of Listing
99
700000
300000
100000
7
95
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
6
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pepperoni
21.8%
5
Percent
Meatball
Garlic 5.0%
2.3%
4
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
3
Listing
2
Percent
1
Listing
13
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
74/100
Poisson Process
θ = 1/169
N = 144
λ = 144 * 1/169 = 0.852
Probabilities:


P(X=0) = .4266
P(X=1) = .3634
P(X=2) = .1548
P(X=3) = .0437
P(X=4) = .0094
P(X>4) = .0021






900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
75/100
Interpreting The Process

Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Probability Plot of Listing
Scatterplot of Listing vs IncomePC
Normal - 95% CI
Mean
StDev
N
AD
P-Value
95
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
99
700000
300000
100000

Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball

17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent


Frequency


There are 169 squares
There are 144 “trials”
Expect .4266*169 = 72.1 to
have 0 hits/square
Expect .3634*169 = 61.4 to
have 1 hit/square
Etc.
Expect the average number
of hits/square to = .852.
Listing

Meatball
Garlic 5.0%
2.3%

P(X=0) = .4266
P(X=1) = .3634
P(X=2) = .1548
P(X=3) = .0437
P(X=4) = .0094
P(X>4) = .0021

Mushroom and Onion
9.2%

Percent

λ = 0.852
Probabilities:
Listing

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
76/100
Does the Theory Work?
Theoretical
Outcomes
Sample Outcomes
Outcome
Probability
Number Sample Proportion
of Cells
0
.4266
72
.4733
80
1
.3634
61
.2899
49
2
.1548
26
.1539
26
3
.0437
7
.0769
13
4
.0094
2
.0059
1
>4
.0021
1
.0000
0
169*Prob(Outcome)
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Observed frequencies
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Number of
cells
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
77/100
Chi Squared for the Bombing Run
 (.4733  .4266) 2 (.2899  .3634) 2


.4266
.3634

 (.1539  .1548) 2 (.0769  .0437) 2
2
  144 

.1548
.0437

2
2
 (.0059  .0094) (.0000  .0021)


.0094
.0021










= 6.99976
Degrees of freedom = 6 - 1 - 1 = 4.
Critical chi squared = 9.49
Poisson is not rejected
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
700000
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
77
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
78/100
Difference in Means of Two
Populations

Two Independent Normal Populations
 Common known variance
 Common unknown variance
 Different Variances
 One and two sided tests

Paired Samples
Means of paired observations
 Treatments and Controls – Diff-in-Diff SAT
Nonparametric – Mann/Whitney
Two Bernoulli Populations

900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Percent
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Frequency
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Listing

Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
79/100
Comparing Two Normal Populations
X ~ N[ x ,  2 ], Y ~ N[ y ,  2 ], independent
H 0 :  x   y , H1 :  x   y
Equivalent : H 0 :  x   y  0
Common known 2
Samples N x and N y
Base test on x - y
Given independence and normality,


 2 2  
 1
1 
2
x - y ~ N  x   y , 


  or N 0,  
  assuming H 0

N
N
N
N


y 
y 
 x
 x


 1
1 
Confidence interval:  x - y   z  /2 2 


N
N
y 
 x
The region outside the confidence interval is the rejection region
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
80/100
Unknown Common Variance
iNx1 (x i  x) 2  i 1y (yi  y) 2
s 
Nx  Ny  2
N
2

(N x  1)s 2x  (N y  1)s 2y
(N x  1)
(N x  1)  (N y  1)
= ws 2x  (1  w)s 2y , w =
(N x  1)  (N y  1)
Test is based on the t distribution with N x  N y  2 degrees of freedom
 1
1 
Confidence interval:  x - y   t  /2 s 


N
N
y 
 x
The region outside the confidence interval is the rejection region
2
If N x  N y  50, will be indistinguishable from the normal distribution.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
81/100
Household Incomes, Equal Variances
-----------------------------------------------------t test of equal means INCOME
by MARRIED
-----------------------------------------------------MARRIED = 0 Nx =
817
MARRIED = 1 Ny =
3057
t [ 3872]
= 3.7238
P value = .0002
-----------------------------------------------------Mean
Std.Dev.
Std.Error
INCOME ---------------------------------------------MARRIED = 0
.27982
.12939
.00453
MARRIED = 1
.30145
.15194
.00275
------------------------------------------------------
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
82/100
Unknown Different Variances
iNx1 (x i  x) 2 2 iNx1 (yi  y) 2
s 
, sy 
Nx 1
Ny 1
2
x
Small samples, use t distribution with


2
 2
2


  s x / N x    s x / N x   
degrees of freedom = int 
2
2 
2
2
s
/
N
s
/
N
 x
 y y 
x



N

1
N

1
x
y


Test is usually based on the (asymptotic) normal distribution with
 s 2x
s 2y 
Confidence interval:  x - y   z  /2 


N
N
y 
 x
The region outside the confidence interval is the rejection region
If N x  N y  50, will be indistinguishable from the normal distribution.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
83/100
2 Proportions
Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

Two Bernoulli Populations:
Xi ~ Bernoulli with Prob(xi=1) = x
Yi ~ Bernoulli with Prob(yi=1) = y
H0: x = y
The sample proportions are
px = (1/Nx)ixi and py = (1/Ny)iyi
Sample variances are px(1-px) and py(1-py).
Use the Central Limit Theorem to form the test
statistic.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
84/100
z Test for Equality of Proportions
p x -p y
z=
p x (1-p x ) p y (1-p y )
+
Nx
Ny
Application: Take up of public health insurance.
-----------------------------------------------------t test of equal means PUBLIC
by FEMALE
-----------------------------------------------------FEMALE =0 Nx =
1812
FEMALE =1 Ny =
1565
t [ 3375] = 5.8627
P value = .0000
-----------------------------------------------------Mean
Std.Dev.
Std.Error
PUBLIC ---------------------------------------------FEMALE
= 0
.84713
.35996
.00846
FEMALE
= 1
.91310
.28178
.00712
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
85/100
Paired Sample t and z Test
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

Frequency

Listing

Observations are pairs (Xi,Yi), i = 1,…,N
Hypothesis x = y.
Both normal distributions. May be correlated.
 Medical Trials: Smoking vs. Nonsmoking
(separate individuals, probably independent)
 SAT repeat tests, before and after.
(Definitely correlated)
Test is based on Di = Xi – Yi. Same as earlier
with H0:D = 0.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
86/100
Treatment Effects
SAT Do Overs



Placebo: In Medical trials, N1 subjects receive
a drug (treatment), N2 receive a placebo.
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Hypothesis: Effect is greater in the treatment
group than in the control (placebo) group.
Frequency

Listing

Experiment: X1, X2, …, XN = first SAT score,
Y1, Y2, …, YN = second
Treatment: T1,…,TN = whether or not the
student took a Kaplan (or similar) prep score
Hypothesis, y > x.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
87/100
Measuring Treatment Effects
Measuring SAT test scores: Difference in Differences
Use  D =  (y - x)|T=1 - (y - x)|T=0
Hypothesis is  D = 0.
Major complication: Nonrandom treatment assignment.
Individuals choose the test prep course themselves. Choosers
believe the difference will be positive.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
88/100
Treatment Effects in Clinical Trials
N+0 = “The placebo effect”
N+T – N+0 = “The treatment effect”
The hypothesis is that the difference in
differences has mean zero.



Placebo
Plain
32.5%
N+T
Scatterplot of Listing vs IncomePC
900000
900000
800000
800000
600000
500000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
400000
Mushroom
16.2%
N+0
Frequency
Sausage
5.8%
Positive Effect
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepper and Onion
7.3%
N0T
Listing
Pepperoni
21.8%
N00
Listing
Meatball
Garlic 5.0%
2.3%
No Effect
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Drug Treatment
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Does Phenogyrabluthefentanoel (Zorgrab)
work?
Investigate: Carry out a clinical trial.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
89/100
A Test of
Independence
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Frequency

Listing

Percent

In the credit card example, are Own/Rent and
Accept/Reject independent?
Hypothesis: Prob(Ownership) and Prob(Acceptance)
are independent
Formal hypothesis, based only on the laws of
probability:
Prob(Own,Accept) = Prob(Own)Prob(Accept)
(and likewise for the other three possibilities.
Rejection region: Joint frequencies that do not look like
the products of the marginal frequencies.
Listing

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
90/100
Contingency Table Analysis
The Data:
Frequencies
Reject
Accept Total
Rent
1,845
5,469
7,214
Own
1,100
5,030
6,630
Total
2,945
10,499 13,444
Step 1: Convert to Actual Proportions
Reject
Accept
Total
Rent
0.13724 0.40680 0.54404
Own
0.08182 0.37414 0.45596
Total 0.21906
0.78094 1.00000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
91/100
Independence Test
Step 2: Expected proportions assuming independence: If the
factors are independent, then the joint proportions should equal
the product of the marginal proportions.
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
Percent
900000
0.11918
0.42486
0.09988
0.35606
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
=
=
=
=
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
0.54404 x 0.21906
0.54404 x 0.78094
0.45596 x 0.21906
0.45596 x 0.78094
Frequency
[Rent,Reject]
[Rent,Accept]
[Own,Reject]
[Own,Accept]
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
92/100
Comparing Actual to Expected
The statistic is N times the sum over the four cells
(Observed-Expected)2
 = N ×  Rows  Columns
Expected
If this is large (because the observed proportions don't
2
look like the expected ones) then reject the hypothesis.
 (0.13724  0.11918)2 (0.40680  0.42486)2 



0.11918
0.42486
2

  13,444 
2
2
 (0.08182  0.09988)
(0.37414  0.35608) 



0.09988
0.35608


= 103.33013
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
93/100
When is the Chi Squared Large?
Critical chi squared
D.F. .05
.01
1
3.84
6.63
2
5.99
9.21
3
7.81 11.34
4
9.49 13.28
5 11.07 15.09
6 12.59 16.81
7 14.07 18.48
8 15.51 20.09
9 16.92 21.67
10 18.31 23.21
Critical values from
chi squared table
 Degrees of freedom
= (R-1)(C-1).

600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
94/100
Analyzing Default
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
100000
996
9.49
10499
100.00
Scatterplot of Listing vs IncomePC
90
369687
156865
51
0.994
0.012
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
4
5
200000
2
1
100000
15000
200000
400000
600000
Listing
800000
1000000
Mean
StDev
N
369687
156865
51
80
8
300000
0
Normal
10
500000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
100
12
700000
600000
70
20
300000
Histogram of Listing
14
800000
80
400000
100000
15000
9503
90.51
900000
Mean
StDev
N
AD
P-Value
95
500000
200000
All
Normal - 95% CI
600000
200000
5030
47.91
Probability Plot of Listing
99
700000
300000
381
3.63
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
4649
44.28
Percent
Pepperoni
21.8%
1
Frequency
Meatball
Garlic 5.0%
2.3%
OWNRENT
0
Listing
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
Percent

Do renters default
more often (at a
different rate) than
owners?
To investigate, we
study the cardholders
(only)
Listing

DEFAULT
0
1
All
4854
615
5469
46.23 5.86
52.09
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
95/100
Hypothesis Test
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
96/100
Multiple Choices: Travel Mode
Pie Chart of Percent vs Type
Pepperoni
21.8%
Sausage
5.8%
900000
800000
800000
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
900000
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
500000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
600000
300000
100000
Probability Plot of Listing
99
700000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Meatball
Garlic 5.0%
2.3%
Mushroom and Onion
9.2%
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing

Percent

Frequency

Listing

210 Travelers between Sydney and
Melbourne
4 available modes, air, train, bus, car
Among the observed variables is income.
Does income help to explain mode choice?
Hypothesis: Mode choice and income are
independent.
Percent

20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
97/100
Travel Mode Choices
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
98/100
Travel Mode Choices and Income
+----------------------------------------------------------+
|
Travel MODE Data
|
+--------+-------------------------------------------------+
|INCOME |
AIR
TRAIN
BUS
CAR ||
Total |
+--------+-------------------------------------++----------+
|LOW
|
10
36
9
8 ||
63 |
|
| 0.04761 0.17143 0.04286 0.03810 || 0.30000 |
|----------------------------------------------++----------+
|MEDIUM |
19
20
13
24 ||
76 |
|
| 0.09048 0.09524 0.06190 0.11429 || 0.36190 |
|----------------------------------------------++----------+
|HIGH
|
29
7
8
27 ||
71 |
|
| 0.13810 0.03333 0.03810 0.12857 || 0.33810 |
|==============================================++==========+
|Total
|
58
63
30
59
||
210 |
|
| 0.27619 0.30000 0.14286 0.28095 || 1.00000 |
+--------+-------------------------------------+-----------+
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
99/100
Contingency Table
+----------------------------------------------------------+
|
Travel MODE Data
|
+--------+-------------------------------------------------+
|INCOME |
AIR
TRAIN
BUS
CAR ||
Total |
+--------+-------------------------------------++----------+
|
|
10
36
9
8 ||
63 |
|LOW
| 0.04761 0.17143 0.04286 0.03810 || 0.30000 |
|
| 0.08286 0.09000 0.04286 0.08429 ||
|----------------------------------------------++----------+
|
|
19
20
13
24 ||
76 |
|MEDIUM | 0.09048 0.09524 0.06190 0.11429 || 0.36190 |
|
| 0.09995 0.10857 0.05170 0.10168 ||
|----------------------------------------------++----------+
|
|
29
7
8
27 ||
71 |
|HIGH
| 0.13810 0.03333 0.03810 0.12857 || 0.33810 |
|
| 0.09338 0.10143 0.04830 0.09499 ||
|==============================================++==========+
|Total
|
58
63
30
59
||
210 |
|
| 0.27619 0.30000 0.14286 0.28095 || 1.00000 |
+--------+-------------------------------------+-----------+
Assuming independence, P(Income,Mode) = P(Income) x P(Mode).
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Part 5 – Hypothesis Testing
100/100
Computing Chi Squared
(Observed-Expected)2
 = N ×  Rows  Columns
Expected
= 42.26158.
2
For our transport mode problem, R = 3, C = 4, so
DF = 2x3 = 6. The critical value is 12.59. The
hypothesis of independence is rejected.
600000
500000
400000
Mushroom
16.2%
Plain
32.5%
Scatterplot of Listing vs IncomePC
Normal - 95% CI
90
500000
400000
200000
100000
15000
60
50
40
30
17500
20000
22500
25000
IncomePC
27500
30000
32500
6
5
200000
2
1
100000
15000
400000
600000
Listing
800000
1000000
369687
156865
51
80
8
4
200000
Mean
StDev
N
10
500000
300000
0
Normal
100
12
700000
400000
10
Marginal Plot of Listing vs IncomePC
Empirical CDF of Listing
14
800000
600000
70
20
300000
200000
369687
156865
51
0.994
0.012
80
600000
Histogram of Listing
900000
Mean
StDev
N
AD
P-Value
95
700000
300000
100000
Probability Plot of Listing
99
17500
20000
22500
25000
IncomePC
27500
30000
32500
0
1000000
60
800000
40
Listing
800000
800000
Percent
900000
Frequency
Sausage
5.8%
Scatterplot of Listing vs IncomePC
900000
700000
Listing
Pepper and Onion
7.3%
Boxplot of Listing
Category
Pepperoni
Plain
Mushroom
Sausage
Pepper and Onion
Mushroom and Onion
Garlic
Meatball
Listing
Pepperoni
21.8%
Listing
Meatball
Garlic 5.0%
2.3%
Percent
Pie Chart of Percent vs Type
Mushroom and Onion
9.2%
20
600000
400000
0
0
200000
300000
400000
500000 600000
Listing
700000
800000
900000
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
10
20
30
40
50
60
70
80
90
Listing
200000
15000
20000
25000
IncomePC
30000
Download