Marginal Probability
Probability of a single event occurring.
Event A = “price of IBM stock rises by at least $1 in one day”
Pr(A) = 0.04 = 4%
© Copyright 2003. Do not distribute or copy without permission.
1
Joint Probability
Probability of all of multiple events occurring.
Event A = “price of IBM stock rises by at least $1 in one day”
Event B = “price of GE stock rises by at least $1 in one day”
Pr(A) = 0.04 = 4%
Pr(B) = 0.01 = 1%
“Probability of both IBM and GE rising by at least $1 in one day”
= Pr(A and B) = 0.02 = 2%
© Copyright 2003. Do not distribute or copy without permission.
2
Joint Probability
Two events are independent if the occurrence of one is not contingent on the
occurrence of the other.
A = “Price of IBM rises by at least $1 in one day.”
B = “Price of IBM rises by at least $1 in one week.”
The events are not independent because, the an increase in the probability of
A implies an increase in the probability of B.
For independent events:
Pr(A and B) = Pr(A) Pr(B)
© Copyright 2003. Do not distribute or copy without permission.
3
Disjoint Probability
Probability of any of multiple events occurring.
Event A = “price of IBM stock rises by at least $1 in one day”
Event B = “price of GE stock rises by at least $1 in one day”
Pr(A) = 0.04 = 4%
Pr(B) = 0.01 = 1%
Pr(A or B) = P(A) + P(B) – P(A and B)
“Probability of either IBM or GE rising by at least $1 in one day”
= Pr(A or B) = 0.04 + 0.01 – 0.02 = 0.03
© Copyright 2003. Do not distribute or copy without permission.
4
Venn Diagram
Area
1
B
A
2
© Copyright 2003. Do not distribute or copy without permission.
3
4
Meaning
1
A’ and B’
2
A and B’
3
A and B
4
A’ and B
23
A
34
B
124
A’ or B’
123
A or B’
234
A or B
134
A’ or B
1234
A or A’
1234
B or B’
Empty
A and A’
Empty
B and B’
5
Venn Diagram
A = “Price of IBM stock
rises by at least $1 in
one day.”
1 – 0.02 – 0.02 – 0.01 = 0.95
B
A
0.02
0.02
0.01
B = “Price of GE stock
rises by at least $1 in
one day.”
Pr(A and B) = 0.02
Pr(A) = 0.04
Pr(B) = 0.03
© Copyright 2003. Do not distribute or copy without permission.
6
Venn Diagram
A = “Price of IBM stock rises by at least $1 in one day.”
B = “Price of GE stock rises by at least $1 in one day.”
0.95
B
A
What is the probability of the
price of GE rising by at least
$1 and the price of IBM not
rising by at least $1?
Pr(B and A’) = Pr(A’ and B)
0.02
0.02
0.01
= 0.01
What is the probability of
neither the price of IBM rising
by at least $1 nor the price of
GE rising by at least $1?
Pr(A’ and B’) = 0.95
© Copyright 2003. Do not distribute or copy without permission.
7
Conditional Probability
Probability of an event occurring given that another event has already
occurred.
Event A = “price of IBM stock rises by at least $1 in one day”
Event B = “price of IBM stock rises by at least $1 in same week”
Pr(B|A) = Pr(A and B) / Pr(A)
Pr(A|B) = Pr(A and B) / Pr(B)
Pr(A) = 0.04
Pr(B) = 0.02
Pr(A and B) = 0.01
Pr(B|A) = 0.01 / 0.04 = 0.25
Pr(A|B) = 0.01 / 0.02 = 0.50
© Copyright 2003. Do not distribute or copy without permission.
8
Conditional Probability
A = “Price of IBM stock rises
by at least $1 in one
day.”
B
A
0.03
0.01
0.01
B = “Price of IBM stock rises
by at least $1 in same
week.”
Pr(A and B) = 0.01
Pr(A) = 0.04
Pr(B) = 0.02
Pr(A|B) = 0.01 / (0.01+0.01)
Pr(B|A) = 0.01 / (0.01+0.03)
© Copyright 2003. Do not distribute or copy without permission.
9
Conditional Probability
Table shows number of NYC police officers promoted and not promoted.
Question: Did the force exhibit gender discrimination in promoting?
Promoted
Not Promoted
Male
288
672
Female
36
204
Define the events. There are two events…
1. An officer can be male.
2. An officer can be promoted.
Being female is not a separate event  it is “not being a male.”
Events
M = Being a male
P = Being promoted
© Copyright 2003. Do not distribute or copy without permission.
M’ = Being a female
P’ = Not being promoted
10
Conditional Probability
Promoted
Not Promoted
Male
288
672
Female
36
204
204
17%
P
M
672
288
36
56%
24%
3%
© Copyright 2003. Do not distribute or copy without permission.
Divide all areas by 1,200 to
find the probability
associated with each area.
11
Conditional Probability
Promoted
Not Promoted
Male
288
672
Female
36
204
What is the probability of
being male and being
promoted?
17%
P
M
56%
24%
3%
Pr(M and P) = 0.24
What is the probability of
being female and being
promoted?
Pr(M’ and P) = 0.03
 Males appear to be
promoted at 8 times the
frequency of females.
© Copyright 2003. Do not distribute or copy without permission.
12
Conditional Probability
Promoted
Not Promoted
Male
288
672
But, perhaps Pr(M and P) is
greater than Pr(M’ and P)
simply because there are more
males on the force.
Female
36
204
17%
P
M
56%
24%
3%
The comparison we want to
make is
Pr(P|M) vs. Pr(P|M’).
Pr(P|M) = Pr(P and M) / Pr(M)
= 0.24 / (0.56 + 0.24) = 0.3
Pr(P|M’) = Pr(P and M’) / Pr(M’)
= 0.03 / (0.03 + 0.17) = 0.15
 Males are promoted at 2 times the
frequency of females.
© Copyright 2003. Do not distribute or copy without permission.
13
Mutually Exclusive and Jointly Exhaustive Events
A set of events is mutually exclusive if no more than one of the events can occur.
 A = IBM stock rises by at least $1, B = IBM stock falls by at least $1
A and B are mutually exclusive but not jointly exhaustive
A set of events is jointly exhaustive if at least one of the events must occur.
 A = IBM stock rises by at least $1, B = IBM stock rises by at least $2,
C = IBM stock rises by less than $1 (or falls)
A, B, and C are jointly exhaustive but not mutually exclusive
A set of events is mutually exclusive and jointly exhaustive if exactly one of the
events must occur.
 A = IBM stock rises, B = IBM stock falls, C = IBM stock does not change
A, B, and C are mutually exclusive and jointly exhaustive
© Copyright 2003. Do not distribute or copy without permission.
14
Bayes’ Theorem
Pr(A|B) = Pr(B|A) Pr(A) / Pr(B)
For N mutually exclusive and jointly exhaustive events,
Pr(B) = Pr(B|A1) Pr(A1) + Pr(B|A2) Pr(A2) + … + Pr(B|AN) Pr(AN)
© Copyright 2003. Do not distribute or copy without permission.
15
Bayes’ Theorem
Your firm purchases steel bolts from two suppliers: #1 and #2.
65% of the units come from supplier #1; the remaining 35% come from supplier
#2.
Inspecting the bolts for quality is costly, so your firm only inspects periodically.
Historical data indicate that 2% of supplier #1’s units fail, and 5% of supplier #2’s
units fail.
During production, a bolt fails causing a production line shutdown. What is the
probability that the defective bolt came from supplier #1?
The naïve answer is that there is a 65% chance that the bolt came from supplier
#1 since 65% of the bolts come from supplier #1.
The naïve answer ignores the fact that the bolt failed. We want to know not
Pr(bolt came from #1), but Pr(bolt came from #1 | bolt failed).
© Copyright 2003. Do not distribute or copy without permission.
16
Bayes’ Theorem
Define the following events:
S1 = bolt came from supplier #1, S2 = bolt came from supplier #2
F = bolt fails
Solution:
We know:
Pr(F | S1) = 2%, Pr(S1) = 65%, Pr(F | S2) = 5%
We want to know: Pr(S1 | F)
Bayes’ Theorem:
Pr(S1 | F) = Pr(F | S1) Pr(S1) / Pr(F)
Because S1 and S2 are mutually exclusive and jointly exhaustive:
Pr(F) = Pr(F | S1) Pr(S1) + Pr(F | S2) Pr(S2)
= (2%)(65%) + (5%)(35%) = 3.1%
Therefore:
© Copyright 2003. Do not distribute or copy without permission.
Pr(S1 | F) = (2%) (65%) / (3.1%) = 42%
17
Probability Measures: Summary
Pr(A and B) = Pr(A) Pr(B)
where A and B are independent events
Pr(A or B)
= Pr(A) + Pr(B) – Pr(A and B)
Pr(A|B)
= Pr(A and B) / Pr(B)
= Pr(B|A) Pr(A) / Pr(B)
Pr(A)
= Pr(A|B1) Pr(B1) + Pr(A|B2) Pr(B2) + … + Pr(A|Bn) Pr(Bn)
where B1 through Bn are mutually exclusive and jointly exhaustive
© Copyright 2003. Do not distribute or copy without permission.
18
Probability Measures: Where We’re Going
Random events
Probabilities Given
Simple Probability
Joint Probability
Disjoint Probability
Conditional Probability
© Copyright 2003. Do not distribute or copy without permission.
Probabilities not Given
Random event is discrete
Random event is continuous
Binomial
Hypergeometric
Poisson
Negative binomial
Exponential
Normal
t
Log-normal
Chi-square
F
19
Probability Distributions
So far, we have seen examples in which the probabilities of events are known
(e.g. probability of a bolt failing, probability of being male and promoted).
The behavior of a random event (or “random variable”) is summarized by the
variable’s probability distribution.
A probability distribution is a set of probabilities, each associated with a
different event for all possible events.
Example: A die is a random variable. There are 6 possible events that can
occur. The probability of each event occurring is the same (1/6) for all the
events. We call this distribution a uniform distribution.
© Copyright 2003. Do not distribute or copy without permission.
20
Probability Distributions
Mechanism that selects one event
out of all possible events.
Example:
Let X be the random variable defined as the roll of a die. There are six
possible events: X = {1, 2, 3, 4, 5, 6}.
Pr(X
Pr(X
Pr(X
Pr(X
Pr(X
Pr(X
=
=
=
=
=
=
1)
2)
3)
4)
5)
6)
=
=
=
=
=
=
1/6
1/6
1/6
1/6
1/6
1/6
=
=
=
=
=
=
16.7%
16.7%
16.7%
16.7%
16.7%
16.7%
Function that gives the probability
of each event occurring.
In general, we say that the probability distribution function for X is:
Pr(X = k) = 0.167
and the cumulative distribution function for X is:
Pr(X  k) = 0.167 k
Function that gives the probability of any
one of a set of events occurring.
© Copyright 2003. Do not distribute or copy without permission.
21
Discrete vs. Continuous Distributions
Discrete vs. Continuous Distributions
In discrete distributions, the random variable takes on specific values.
For example:
If X can take on the values {1, 2, 3, 4, 5, …}, then X is a discrete random
variable.
 Number of profitable quarters is a discrete random variable.
If X can take on any value between 0 and 10, then X is a continuous random
variable.
 P/E ratio is a continuous random variable.
© Copyright 2003. Do not distribute or copy without permission.
22
Discrete Distributions
Terminology
Trial
Success
An opportunity for an event to occur or not occur.
The occurrence of an event.
© Copyright 2003. Do not distribute or copy without permission.
23
Binomial Distribution
Binomial Distribution
The binomial distribution gives the probability of an event occurring multiple
times.
N
x
p
Number of trials
Number of successes
Probability of a single success
N 
Pr(x successes out of N trials)    p x (1  p )N x
x 
where
N 
N!

 
 x  x !(N  x )!
© Copyright 2003. Do not distribute or copy without permission.
mean  Np
variance  Np 1  p 
24
Binomial Distribution
N 
Pr(x successes out of N trials)    p x (1  p )N x
x 
N 
N!

 
 x  x !(N  x )!
Example
A CD manufacturer produces CD’s in batches of 10,000. On average, 2% of the
CD’s are defective.
A retailer purchases CD’s in batches of 1,000. The retailer will return any
shipment if 3 or more CD’s are found to be defective. For each batch received,
the retailer inspects thirty CD’s. What is the probability that the retailer will return
the batch?
N = 30 trials
 30 
x = 3 successes Pr(3 successes out of 30 trials)    0.023 (1  0.02)303  0.019  1.9%
3
p = 0.02
© Copyright 2003. Do not distribute or copy without permission.
25
Binomial Distribution
Example
A CD manufacturer produces CD’s in batches of 10,000. On average, 2% of the
CD’s are defective.
A retailer purchases CD’s in batches of 1,000. The retailer will return any
shipment if 3 or more CD’s are found to be defective. For each batch received,
the retailer inspects thirty CD’s. What is the probability that the retailer will return
the batch?
N = 30 trials
 30 
x = 3 successes Pr(3 successes out of 30 trials)    0.023 (1  0.02)303  0.019  1.9%
3
p = 0.02
Error
The formula gives us the probability of exactly 3 successes out of 30 trials. But,
the retailer will return the shipment if it finds at least 3 defective CD’s. What we
want is
Pr(3 out of 30) + Pr(4 out of 30) + … + Pr(30 out of 30)
© Copyright 2003. Do not distribute or copy without permission.
26
Binomial Distribution
N = 30 trials
 30 
x = 3 successes Pr(3 successes out of 30 trials)    0.023 (1  0.02)303  0.019  1.9%
3
p = 0.02
N = 30 trials
 30 
x = 4 successes Pr(4 successes out of 30 trials)    0.024 (1  0.02)304  0.003  0.3%
4
p = 0.02
N = 30 trials
 30 
x = 5 successes Pr(5 successes out of 30 trials)    0.025 (1  0.02)305  0.0003  0.03%
5
p = 0.02
Etc. out to x = 30 successes.
Alternatively
Because Pr(0 or more successes) = 1, we have an easier path to the answer:
Pr(3 or more successes) = 1 – Pr(2 or fewer successes)
© Copyright 2003. Do not distribute or copy without permission.
27
Binomial Distribution
N = 30 trials
x = 0 successes
p = 0.02
 30 
Pr(0 successes out of 30 trials)    0.020 (1  0.02)300  0.545
0
N = 30 trials
x = 1 successes
p = 0.02
 30 
Pr(1 successes out of 30 trials)    0.021 (1  0.02)301  0.334
1
N = 30 trials
x = 2 successes
p = 0.02
 30 
Pr(2 successes out of 30 trials)    0.022 (1  0.02)302  0.099
2
 Pr(2 or fewer successes) = 0.545 + 0.334 + 0.099 = 0.978
 Pr(3 or more successes) = 1 – 0.978 = 0.022 = 2.2%
© Copyright 2003. Do not distribute or copy without permission.
28
Binomial Distribution
Using the Probabilities worksheet:
1.
2.
3.
4.
5.
Find the section of the worksheet titled “Binomial Distribution.”
Enter the probability of a single success.
Enter the number of trials.
Enter the number of successes.
For “Cumulative?” enter FALSE to obtain Pr(x successes out of N trials); enter
TRUE to obtain Pr( x successes
out of N trials).

Example:
Binomial Distribution
Prob of a Single Success
Number of Trials
Number of Successes
Cumulative?
P(# of successes)
1 - P(# of successes)
0.02
30
2
TRUE
0.978
0.022
TRUE yields Pr(x  2) instead of Pr(x = 2)
Pr(x  2)
1 – Pr(x  2) = Pr(x  3)
© Copyright 2003. Do not distribute or copy without permission.
29
Binomial Distribution
Application:
Management proposes tightening quality control so as to reduce the defect rate from 2% to
1%. QA estimates that the resources required to implement the additional quality controls
will cost the firm an additional $70,000 per year.
Suppose the firm ships 10,000 batches of CD’s annually. It costs the firm $1,000 every time
a batch is returned. Is it worth it for the firm to implement the additional quality controls?
Low QA:
Defect rate = 2%
Pr(batch will be returned) = Pr(3 or more defects out of 30) = 2.2%
Expected annual cost of product returns
= (2.2%)($1,000 per batch)(10,000 batches shipped annually)
= $220,000
Going with improved QA results
in cost savings of $190,000 at a
High QA:
cost of $70,000 for a net gain of
$120,000.
Defect rate = 1%
Pr(batch will be returned) = Pr(3 or more defects out of 30) = 0.3%
Expected annual cost of product returns
= (0.3%)($1,000 per batch)(10,000 batches shipped annually)
= $30,000
© Copyright 2003. Do not distribute or copy without permission.
30
Binomial Distribution
Application:
Ford suspects that the tread on Explorer tires will separate from the tire causing a fatal
accident. Tests indicate that this will happen on one set of (four) tires out of 5 million. As of
2000, Ford had sold 875,000 Explorers. Ford estimated the cost of a general recall to be
$30 million. Ford also estimated that every accident involving separated treads would cost
Ford $3 million to settle.
Should Ford recall the tires?
What we know:
Success = tread separation
Pr(a single success) = 1 / 5 million = 0.0000002
Number of trials = 875,000
Employing the pdf for the binomial distribution, we have:
Pr(0
Pr(1
Pr(2
Pr(3
successes)
success)
successes)
successes)
=
=
=
=
© Copyright 2003. Do not distribute or copy without permission.
83.9%
14.7%
1.3%
0.1%
31
Binomial Distribution
Expectation:
An expectation is the sum of the probabilities of all possible events multiplied by the
outcome of each event.
If there are three mutually exclusive and jointly exhaustive events: A, B, and C.
The costs to a firm of events A, B, and C occurring are, respectively, TCA, TCB, and TCC.
The probabilities of events A, B, and C occurring are, respectively, pA, pB, and pC.
 The expected cost to the firm is:
E(cost) = (TCA)(pA) +(TCB)(pB) + (TCC)(pC)
Should Ford issue a recall?
Issue recall:
Cost = $30 million
Do not issue recall:
E(cost) = Pr(0 incidents)(Cost of 0 incidents) + Pr(1 incident)(Cost of 1 incident) + …
 (83.9%)($0 m) + (14.7%)($3 m) + (1.3%)($6 m) + (0.1%)($9 m)
 $528,000
© Copyright 2003. Do not distribute or copy without permission.
32
Hypergeometric Distribution
Hypergeometric Distribution
The hypergeometric distribution gives the probability of an event occurring
multiple times when the number of possible successes is fixed.
N
n
X
x
Number
Number
Number
Number
of
of
of
of
possible trials
actual trials
possible successes
actual successes
 X  N  X 
 

x  n  x 

Pr(x successes out of n trials) 
N 
 
n 
where
N 
N!
 
 x  x !(N  x )!
© Copyright 2003. Do not distribute or copy without permission.
33
Hypergeometric Distribution
 X  N  X 
 

x  n  x 

Pr(x successes out of n trials) 
N 
 
n 
Example
A CD manufacturer ships a batch of 1,000 CD’s to a retailer. The manufacturer
knows that 20 of the CD’s are defective. The retailer will return any shipment if 3
or more CD’s are found to be defective. For each batch received, the retailer
inspects thirty CD’s. What is the probability that the retailer will return the batch?
N
n
X
x
=
=
=
=
1,000 possible trials
30 actual trials
20 possible successes
3 actual successes
© Copyright 2003. Do not distribute or copy without permission.
 20  1000  20 
 

3   30  3 

Pr(3 successes out of 30 trials) 
 0.017
1000 


 30 
34
Hypergeometric Distribution
Example
A CD manufacturer ships a batch of 1,000 CD’s to a retailer. The manufacturer
knows that 20 of the CD’s are defective. The retailer will return any shipment if 3
or more CD’s are found to be defective. For each batch received, the retailer
inspects thirty CD’s. What is the probability that the retailer will return the batch?
N
n
X
x
=
=
=
=
1,000 possible trials
30 actual trials
20 possible successes
3 actual successes
 20  1000  20 
 

3   30  3 

Pr(3 successes out of 30 trials) 
 0.017
1000 


 30 
Error
The formula gives us the probability of exactly 3 successes. The retailer will
return the shipment if there are 3 or more defects. Therefore, we want
Pr(return shipment) = Pr(3 defects) + Pr(4 defects) + … + Pr(20 defects)
Note: There are a maximum of 20 defects.
© Copyright 2003. Do not distribute or copy without permission.
35
Hypergeometric Distribution
N
n
X
x
=
=
=
=
1,000 possible trials
30 actual trials
20 possible successes
0 actual successes
 20  1000  20 
 

0   30  0 

Pr(0 successes out of 30 trials) 
 0.541
1000 


 30 
N
n
X
x
=
=
=
=
1,000 possible trials
30 actual trials
20 possible successes
1 actual successes
 20  1000  20 
 

1   30  1 

Pr(1 successes out of 30 trials) 
 0.341
1000 


 30 
N
n
X
x
=
=
=
=
1,000 possible trials
30 actual trials
20 possible successes
2 actual successes
 20  1000  20 
 

2   30  2 

Pr(2 successes out of 30 trials) 
 0.099
1000 


 30 
Pr(return shipment) = 1 – (0.541 + 0.341 + 0.099 = 0.019 = 1.9%
© Copyright 2003. Do not distribute or copy without permission.
36
Hypergeometric Distribution
Using the Probabilities worksheet:
1.
2.
3.
4.
5.
Find the section of the worksheet titled “Hypergeometric Distribution.”
Enter the number of possible trials.
Enter the number of possible successes.
Enter the number of actual trials.
Enter the number of actual successes.
Note: Excel does not offer the option of calculating the cumulative distribution
function. You must do this manually.
Example:
Hypergeometric Distribution
Number of Possible Trials
Number of Possible Successes
Number of Actual Trials
Number of Actual Successes
P(# of successes in sample)
1 - P(# successses in sample)
1,000
20
30
3
0.017
0.983
Pr(x = 3)
1 – Pr(x = 3) = Pr(x  3)
© Copyright 2003. Do not distribute or copy without permission.
37
Hypergeometric Distribution
If we erroneously use the binomial distribution, what is our estimate of the
probability that the retailer will return the batch?
Results using hypergeometric distribution
Possible Trials = 1,000
Actual Trials = 30
Possible Successes = 20
Actual Successes = 0, 1, 2
Pr(return shipment) = 1 – (0.541 + 0.341 + 0.099 = 0.019 = 1.9%
Results using binomial distribution
Trials = 30
Successes = 0, 1, 2
Probability of a single success = 20 / 1000 = 0.02
Pr(return shipment) = 2.2%
© Copyright 2003. Do not distribute or copy without permission.
38
Hypergeometric Distribution
Using the incorrect distribution underestimates the probability of return by only
0.7%  who cares?
Suppose each return costs us $1,000 and we ship 10,000 cases per year.
Estimated cost of returns using hypergeometric distribution
($1,000)(10,000)(1.9%) = $190,000
Estimated cost of returns using binomial distribution
($1,000)(10,000)(2.2%) = $220,000
 Using the incorrect distribution resulted in a $30,000 overestimation of costs.
© Copyright 2003. Do not distribute or copy without permission.
39
Hypergeometric Distribution
How does hypergeometric distribution differ from binomial distribution?
With binomial distribution, the probability of a success does not change as trials are
realized.
With hypergeometric distribution, the probabilities of subsequent successes change as trials
are realized.
Binomial Example:
Suppose the probability of a given CD being defective is 50%. You have a shipment of 2
CD’s.
You inspect one of the CD’s. There is a 50% chance that it is defective.
You inspect the other CD. There is a 50% chance that it is defective.
On average, you expect 1 defective CD. However, it is possible that there are no defective
CD’s. It is also possible that both CD’s are defective.
Because the probability of defect is constant, this process is binomial.
© Copyright 2003. Do not distribute or copy without permission.
40
Hypergeometric Distribution
How does hypergeometric distribution differ from binomial distribution?
With binomial distribution, the probability of a success does not change as trials are
realized.
With hypergeometric distribution, the probabilities of subsequent successes change as trials
are realized.
Hypergeometric Example:
Suppose there is one defective CD in a shipment of two CD’s.
You inspect one of the CD’s. There is a 50% chance that it is defective. You inspect the
second CD. Even without inspecting, you know for certain whether the second CD will be
defective or not.
 Because you know that one of the CD’s is defective, if the first one is not defective, then
the second one must be defective.
 If the first one is defective, then the second one cannot be defective.
Because the probability of the second CD being defective depends on whether or not the
first CD was defective, the process is hypergeometric.
© Copyright 2003. Do not distribute or copy without permission.
41
Hypergeometric Distribution
Example
Andrew Fastow, former CFO of Enron, was tried for securities fraud. As is usual in these
cases, if the prosecution requests documents, then the defense is obligated to surrender
those documents – even if the documents contain information that is damaging to the
defense. One tactic is for the defense to submit the requested documents along with many
other documents (called “decoys”) that are not damaging to the defense. The point is to
bury the prosecution under a blizzard of paperwork so that it becomes difficult for the
prosecution to find the few incriminating documents among the many decoys.
Suppose that the prosecutor requests all documents related to Enron’s financial status.
Fastow’s lawyers know that there are 10 incriminating documents among the set requested.
Fastow’s lawyers also know that the prosecution will be able to examine only 50 documents
between now and the trial date.
If the prosecution finds no incriminating documents, it is likely that Fastow will be found not
guilty. Assuming that each document requires the same amount of time to examine, and
assuming that the prosecution will randomly select 50 documents out of the total for
examination, how many documents (decoys plus the 10 incriminating documents) should
Fastow’s lawyers submit so that the probability of the prosecution finding no incriminating
documents is 90%?
© Copyright 2003. Do not distribute or copy without permission.
42
Hypergeometric Distribution
Example
Success = an incriminating document
N = unknown
n = 50
X = 10
x =0
N = 4775  Pr(0 successes out of 50 trials) = 0.900
Hypergeometric Distribution
Number of Possible Trials
Number of Possible Successes
Number of Actual Trials
Number of Actual Successes
P(# of successes in sample)
1 - P(# successses in sample)
© Copyright 2003. Do not distribute or copy without permission.
4,775
10
50
0.900
0.100
43
Poisson Distribution
Poisson Distribution
The Poisson distribution gives the probability of an event occurring multiple times
within a given time interval.
δ
x
e
Average number of successes per unit time.
Number of successes
2.71828…
e   x
Pr(x successes per unit time) 
x!
© Copyright 2003. Do not distribute or copy without permission.
44
Poisson Distribution
e   x
Pr(x successes per unit time) 
x!
Example
Over the course of a typical eight hour day, 100 customers come into a store.
Each customer remains in the store for 10 minutes (on average). One
salesperson can handle no more than three customers in 10 minutes. If it is likely
that more than three customers will show up in a single 10-minute interval, then
the store will have to hire another salesperson.
What is the probability that more than 3 customers will arrive in a single 10minute interval?
Time interval = 10 minutes
There are 48 ten-minute intervals during an 8 hour work day.
100 customers per day / 48 ten-minute intervals = 2.08 customers per interval.
δ = 2.08 successes per interval (on average)
x = 4, 5, 6, … successes
© Copyright 2003. Do not distribute or copy without permission.
45
Poisson Distribution
Time interval = 10 minutes
δ = 2.08 successes per interval
x = 4, 5, 6, … successes
Pr(x  4)  1  Pr(x  0)  Pr(x  1)  Pr(x  2)  Pr(x  3)
e 2.08 2.080
Pr(0 successes) 
 0.125
0!
e 2.08 2.081
Pr(1 successes) 
 0.260
1!
e 2.08 2.082
Pr(2 successes) 
 0.270
2!
e 2.08 2.083
Pr(3 successes) 
 0.187
3!
© Copyright 2003. Do not distribute or copy without permission.
Pr(x  4) = 1 – (0.125 + 0.260 + 0.270 + 0.187)
= 0.158 = 15.8%
46
Poisson Distribution
Using the Probabilities worksheet:
1.
2.
3.
4.
Find the section of the worksheet titled “Poisson Distribution.”
Enter the average number of successes per time interval.
Enter the number of successes per time interval.
For “Cumulative?” enter FALSE to obtain Pr(x successes out of N trials); enter
TRUE to obtain Pr( x successes
out of N trials).

Example:
Poisson Distribution
E(Successes / time interval)
Successes / time interval
Cumulative?
2.08
3
TRUE yields Pr(x  3) instead of Pr(x = 3)
TRUE
P(# successes in a given interval)
0.842
1 - P(# successes in a given interval)
0.158
Pr(x  3)
1 – Pr(x  3) = Pr(x  4)
© Copyright 2003. Do not distribute or copy without permission.
47
Poisson Distribution
Suppose you want to hire an additional salesperson on a part-time basis. On
average, for how many hours per week will you need this person? (Assume a 40hour work week.)
There is a 15.8% probability that, in any given 10-minute interval, more than 3
customers will arrive. During these intervals, you will need another salesperson.
In one work day, there are 48 ten-minute intervals. In a 5-day work week, there
are (48)(5) = 240 ten-minute intervals.
On average, you need a part-time worker for 15.8% of these, or (0.158)(240) =
37.92 intervals.
37.92 ten-minute intervals = 379 minutes = 6.3 hours, or 6 hours 20 minutes.
Note: An easier way to arrive at the same answer is: (40 hours)(0.158) = 6.3
hours.
© Copyright 2003. Do not distribute or copy without permission.
48
Negative Binomial Distribution
Negative Binomial Distribution
The binomial distribution gives the probability of the xth occurrence of an event
happening on the Nth trial.
N
x
p
Number of trials
Number of successes
Probability of a single success occurring
N  1 x
N x
Pr(x th success occurring on the N th trials)  
 p (1  p )
 x 1
where
 N  1 !
N  1



 x  1   x  1 !(N  x )!
© Copyright 2003. Do not distribute or copy without permission.
49
Discrete Distributions: Summary
Pertinent Information
Distribution
Probability of a single success
Number of trials
Number of successes
Binomial
Number
Number
Number
Number
Hypergeometric
of
of
of
of
possible trials
actual trials
possible successes
actual successes
Average successes per time interval
Number of successes per time interval
© Copyright 2003. Do not distribute or copy without permission.
Poisson
50
Continuous Distributions
While the discrete distributions are useful for describing phenomena in which the
random variable takes on discrete (e.g. integer) values, many random variables
are continuous and so are not adequately described by discrete distributions.
Example:
Income, Financial Ratios, Sales.
Technically, financial variables are discrete because they measure in discrete units
(cents). However, the size of the discrete units is so small relative to the typical
values of the random variable, that these variables behaves like continuous
random variables.
E.g.
A firm that typically earns $10 million has an income level that is 1 billion
times the size of the discrete units in which the income is measured.
© Copyright 2003. Do not distribute or copy without permission.
51
Continuous Distributions
The continuous uniform distribution is a distribution in which the probability of
the random variable taking on a given range of values is equal for all ranges of
the same size.
Example:
X is a uniformly distributed random variable that can take on any value in the
range [1, 5].
Pr(1
Pr(2
Pr(3
Pr(4
<
<
<
<
X
X
X
X
<
<
<
<
2)
3)
4)
5)
=
=
=
=
1/4
1/4
1/4
1/4
=
=
=
=
0.25
0.25
0.25
0.25
Note: The probability of X taking on a specific value is zero.
© Copyright 2003. Do not distribute or copy without permission.
52
Continuous Uniform Distribution
The continuous uniform distribution is a distribution in which the probability of
the random variable taking on a given range of values is equal for all ranges of
the same size.
Example:
X is a uniformly distributed random variable that can take on any value in the
range [1, 5].
Pr(1
Pr(2
Pr(3
Pr(4
<
<
<
<
X
X
X
X
<
<
<
<
2)
3)
4)
5)
=
=
=
=
1/4
1/4
1/4
1/4
=
=
=
=
0.25
0.25
0.25
0.25
Note: The probability of X taking on a specific value is zero.
© Copyright 2003. Do not distribute or copy without permission.
53
Continuous Uniform Distribution
Example:
Pr(1
Pr(2
Pr(3
Pr(4
<
<
<
<
X
X
X
X
<
<
<
<
2)
3)
4)
5)
=
=
=
=
1/4
1/4
1/4
1/4
=
=
=
=
0.25
0.25
0.25
0.25
In general, we say that the probability density function for X is:
pdf(X) = 0.2 for all k
(note: Pr(X = k) = 0 for all k)
and the cumulative density function for X is:
Pr(X  k) = (k – 1) / 4
mean 
a b
2
variance 
b  a  b  a 
12
a  minimum value of the random variable
b  maximum value of the random variable
© Copyright 2003. Do not distribute or copy without permission.
54
Exponential Distribution
Exponential Distribution
The exponential distribution gives the probability of the maximum amount of time
required until the next occurrence of an event.
λ
x
Average number of time intervals between the occurrence of successes.
Maximum time intervals until the next success occurs.
Pr(the next success occuring in x or fewer time intervals)  1  ex
mean   1
variance   2
© Copyright 2003. Do not distribute or copy without permission.
55
Normal Distribution
Many continuous random processes are normally distributed. Among them are:
1.
Proportions (provided that the proportion is not close to the extremes of 0 or
1).
2.
Sample Means (provided that the means are computed based on a large
enough sample size).
3.
Differences in Sample Means (provided that the means are computed based
on a large enough sample size).
4.
Mean Differences (provided that the means are computed based on a large
enough sample size).
5.
Most natural processes (including many economic, and financial processes).
© Copyright 2003. Do not distribute or copy without permission.
56
Normal Distribution
There are an infinite number of normal distributions, each with a different mean
and variance.
We describe a normal distribution by its mean and variance:
µ = Population mean
σ2 = Population variance
The normal distribution with a mean of zero and a variance of one is called the
standard normal distribution.
µ =0
σ2 = 1
© Copyright 2003. Do not distribute or copy without permission.
57
Normal Distribution
The pdf (probability density function) for normal distributions are bell-shaped. This means
that the random variable can take on any value over the range +  to – , but the
probability of the random variable straying from its mean decreases as the distance from
the mean increases.
© Copyright 2003. Do not distribute or copy without permission.
58
Normal Distribution
For all normal distributions, approximately:
50% of the observations lie within   2 / 3 
68% of the observations lie within   
95% of the observations lie within   2
99% of the observations lie within   3
Example:
Suppose the return on a firm’s stock price is normally distributed with a mean of 10%
and a standard deviation of 6%. We would expect that, at any given point in time:
1. There is a 50% probability that the return on the stock is between 6% and 14%
2. There is a 68% probability that the return on the stock is between 4% and 16%.
3. There is a 95% probability that the return on the stock is between –2% and 22%.
4. There is a 99% probability that the return on the stock is between –8% and 28%.
© Copyright 2003. Do not distribute or copy without permission.
59
Normal Distribution
Population Measures:
Population mean

Calculated using all possible observations.
2
Population variance 
Sample Measures (estimates of population measures):
Sample mean
x
Sample variance
s
Calculated using a subset of all possible observations.
2
Variance measures the square of the average dispersion of observations around a mean.
2
1 N
Sample Variance  s 
x i  x 
N  1 i 1
2
Population Variance   2 
1
N
x i

N i
1
© Copyright 2003. Do not distribute or copy without permission.
2
 
60
Problem of Unknown Population Parameters
If we do not have all possible observations, then we cannot compute the
population mean and variance. What to do?
 Take a sample of observations and use the sample mean and sample variance
as estimates of the population parameters.
Problem: If we use the sample mean and sample variance instead of the
population mean and population variance, then we can no longer say that “50%
of observations lie within   2 / 3 , etc.”
In fact, the normal distribution no longer describes the distribution of
observations. We must use the t-distribution.
The t-distribution accounts for the fact that (1) the observations are normally
distributed, and (2) we aren’t sure what the mean and variance of the
distribution is.
© Copyright 2003. Do not distribute or copy without permission.
61
t-Distribution
There are an infinite number of t-distributions, each with different degrees of
freedom. Degrees of freedom is a function of the number of observations in a
data set. The more degrees of freedom (i.e. observations) exist, the closer the tdistribution is to the standard normal.
For most purposes, degrees of freedom = N – 1, where N is the number of
observations in the sample.
The more degrees of freedom that exist, the closer the t-distribution is to the
standard normal distribution.
© Copyright 2003. Do not distribute or copy without permission.
62
t-Distribution
The standard normal distribution is the same as the t-distribution
with an infinite number of degrees of freedom.
© Copyright 2003. Do not distribute or copy without permission.
63
t-Distribution
Degrees of Freedom
Standard Deviations
© Copyright 2003. Do not distribute or copy without permission.
5
10
20
30
∞
2/3
47%
48%
49%
49%
50%
1
64%
66%
67%
68%
68%
2
90%
93%
94%
95%
95%
3
97%
98%
99%
99%
99%
64
t-Distribution
Example:
Consumer reports tests the gas mileage of seven SUV’s. They find that the sample of
SUV’s has a mean mileage of 15 mpg with a standard deviation of 3 mpg. Assuming
that the population of gas mileages is normally distributed, based on this sample,
what percentage of SUV’s get more than 20 mpg?
We don’t know the area indicated because we
don’t know the properties of a t-distribution with
a mean of 15 and a standard deviation of 3.
s = 3 mpg
area = ?
However, we can convert this distribution to a
distribution whose properties we do know.
The formula for conversion is:
15 mpg
20 mpg
Test statistic 
Test value  mean
standard deviation
“Test value” is the value we are examining (in
this case, 20 mpg), “mean” is the mean of the
sample observations (in this case, 15 mpg), and
“standard deviation” is the standard deviation of
the sample observations (in this case, 3 mpg).
© Copyright 2003. Do not distribute or copy without permission.
65
t-Distribution
Example:
Consumer reports tests the gas mileage of seven SUV’s. They find that the sample of
SUV’s has a mean mileage of 15 mpg with a standard deviation of 3 mpg. Assuming
that the population of gas mileages is normally distributed, based on this sample,
what percentage of SUV’s get more than 20 mpg?
s=1
s = 3 mpg
area = ?
t6
15 mpg
0
20 mpg
Test value  mean
 Test statistic
standard deviation
20  15
 1.67
3
© Copyright 2003. Do not distribute or copy without permission.
1.67
We can look up
the area to the
right of 1.67 on a
t6 distribution.
66
t-Distribution
s=1
s = 3 mpg
area = ?
t6
15 mpg
0
20 mpg
1.67
Test value  mean
 Test statistic
standard deviation
20  15
 1.67
3
area = 0.073
t Distribution
© Copyright 2003. Do not distribute or copy without permission.
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
1.670
6
7.30%
Pr(t < Test statistic)
92.70%
67
t-Distribution
Example:
A light bulb manufacturer wants to monitor the quality of the bulbs it produces. To monitor
product quality, inspectors test one bulb out of every thousand to find its burn-life. Since the
production machinery was installed, inspectors have tested 30 bulbs and found an average
burn-life of 1,500 hours with a standard deviation of 200. Management wants to recalibrate its
machines anytime a particularly short-lived bulb is discovered. Management defines “short-lived”
as a burn-life so short that 999 out of 1,000 bulbs burns longer. What is the minimum number
of hours a test bulb must burn for production not to be recalibrated?
s=1
s = 200 hrs
area = 0.001
area = 0.001
area = 1 – 0.001 = 0.999
X hrs
1,500 hrs
-3.3963
t29
0
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
29
Pr(t < Test statistic)
© Copyright 2003. Do not distribute or copy without permission.
Pr(t > Critical value)
99.90%
Critical Value
-3.3963
68
t-Distribution
Example:
A light bulb manufacturer wants to monitor the quality of the bulbs it produces. To monitor
product quality, inspectors test one bulb out of every thousand to find its burn-life. Since the
production machinery was installed, inspectors have tested 30 bulbs and found an average
burn-life of 1,500 hours with a standard deviation of 200. Management wants to recalibrate its
machines anytime a particularly short-lived bulb is discovered. Management defines “short-lived”
as a burn-life so short that 999 out of 1,000 bulbs burns longer. What is the minimum number
of hours a test bulb must burn for production not to be recalibrated?
s=1
s = 200 hrs
area = 0.001
area = 0.001
t29
821 hrs
1,500 hrs
-3.3963
0
Test value  mean
 Test statistic
standard deviation
X  1500
200
© Copyright 2003. Do not distribute or copy without permission.
 3.3963  X  821
69
t-Distribution
Example:
Continuing with the previous example, suppose we had used the normal distribution
instead of the t-distribution to answer the question.
The probabilities spreadsheet gives us the
following results.
Standard Normal Distribution (Z)
Test statistic
Pr(Z > Test statistic)
Pr(Z < Test statistic)
Pr(Z > Critical value)
Critical Value
Test statistic 
3.09 
0.10%
3.0902
Test value  mean
standard deviation
Test value  1,500
200
Test value  (3.09)(200)  1,500  882
© Copyright 2003. Do not distribute or copy without permission.
70
t-Distribution vs. Normal Distribution
Correct distribution
Using the t-distribution, we recalibrate production whenever we observe a light bulb
with a life of 821 or fewer hours.
Incorrect distribution
Using the standard normal distribution, we recalibrate production whenever we
observe a light bulb with a life of 882 or fewer hours.
 By incorrectly using the standard normal distribution, we would recalibrate
production too frequently.
When can we use the normal distribution?
 As an approximation, when the number of observations is large enough that the
difference in results is negligible. The difference starts to become negligible at 30
or more degrees of freedom. For more accurate results, use the t-distribution.
© Copyright 2003. Do not distribute or copy without permission.
71
Test Statistic vs. Critical Value
Terminology
We have been using the terms “test statistic” and “critical value” somewhat
interchangeably. Which term is appropriate depends on whether the number
described is being used to find an implied probability (test statistic), or represents a
known probability (critical value).
When we wanted to know the probability of an SUV getting more than 20 mpg, we
constructed the test statistic and asked “what is probability of observing the test
statistic?”
When we wanted to know what cut-off to impose for recalibrating production of light
bulbs, we found the critical value that gave us the probability we wanted, and then
asked “what test value has the probability implied by the critical value?”
© Copyright 2003. Do not distribute or copy without permission.
72
Test Statistic vs. Critical Value
Example
The return on IBM stock has averaged 19.3% over the past 10 years with a standard
deviation of 4.5%. Assuming that past performance is indicative of future results and
assuming that the population of rates of return is normally distributed, what is the
probability that the return on IBM next year will be between 10% and 20%?
1. Picture the problem with respect to the
appropriate distribution.
2. Determine what area(s) represents the
answer to the problem.
3. Determine what area(s) you must find
(this depends on how the probability
table or function is defined).
4. Perform computations to find desired
area based on known areas.
Question asks for this area.
Look up these areas.
© Copyright 2003. Do not distribute or copy without permission.
73
t-Distribution
Example
The return on IBM stock has averaged 19.3% over the past 10 years with a standard
deviation of 4.5%. Assuming that past performance is indicative of future results and
assuming that the population of rates of return is normally distributed, what is the
probability that the return on IBM next year will be between 10% and 20%?
Convert question
to form that can
be analyzed.
© Copyright 2003. Do not distribute or copy without permission.
74
t-Distribution
Example
The return on IBM stock has averaged 19.3% over the past 10 years with a standard
deviation of 4.5%. Assuming that past performance is indicative of future results and
assuming that the population of rates of return is normally distributed, what is the
probability that the return on IBM next year will be between 10% and 20%?
Test statistic 
Test value  mean
standard deviation
Left Test statistic 
10%  19.3%
 2.07
4.5%
Right Test statistic 
© Copyright 2003. Do not distribute or copy without permission.
20%  19.3%
 0.16
4.5%
75
t-Distribution
Example
The return on IBM stock has averaged 19.3% over the past 10 years with a standard
deviation of 4.5%. Assuming that past performance is indicative of future results and
assuming that the population of rates of return is normally distributed, what is the
probability that the return on IBM next year will be between 10% and 20%?
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
(2.070)
9.0
96.58%
0.160
9.0
43.82%
Pr(t > Critical value)
0.50%
Pr(t > Critical value)
0.50%
Critical Value
3.250
Critical Value
3.250
100% – 96.58% = 3.42%
-2.07
© Copyright 2003. Do not distribute or copy without permission.
0.16
76
t-Distribution
Example
The return on IBM stock has averaged 19.3% over the past 10 years with a standard
deviation of 4.5%. Assuming that past performance is indicative of future results and
assuming that the population of rates of return is normally distributed, what is the
probability that the return on IBM next year will be between 10% and 20%?
3.42%
43.82%
3.42% + 43.82% = 47.24%
100% – 47.24% = 52.76%
There is a 53% chance that IBM will yield a return between 10% and 20% next year.
© Copyright 2003. Do not distribute or copy without permission.
77
t-Distribution
Example
Your firm has negotiated a labor contract that requires that the firm provide annual raises no
less than the rate of inflation. This year, the total cost of labor covered under the contract will
be $38 million. Your CFO has indicated that the firm’s current financing can support up to a
$2 million increase in labor costs. Based on the historical inflation numbers below, calculate
the probability of labor costs increasing by at least $2 million next year.
Year
Inflation Rate
Year
Inflation Rate
1982
6.2%
1993
3.0%
1983
3.2%
1994
2.6%
1984
4.3%
1995
2.8%
1985
3.6%
1996
3.0%
1986
1.9%
1997
2.3%
1987
3.6%
1998
1.6%
1988
4.1%
1999
2.2%
1989
4.8%
2000
3.4%
1990
5.4%
2001
2.8%
1991
4.2%
2002
1.6%
1992
3.0%
2003
1.8%
© Copyright 2003. Do not distribute or copy without permission.
Calculate the mean
and standard
deviation for inflation.
Sample mean = 3.2%
Sample stdev = 1.2%
78
t-Distribution
Example
Your firm has negotiated a labor contract that requires that the firm provide annual raises no
less than the rate of inflation. This year, the total cost of labor covered under the contract will
be $38 million. Your CFO has indicated that the firm’s current financing can support up to a
$2 million increase in labor costs. Based on the historical inflation numbers below, calculate
the probability of labor costs increasing by at least $2 million next year.
A $2 million increase on a $38 million base is a 2/38 = 5.26% increase
Sample mean = 3.2%
Sample stdev = 1.2%
N = 22
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
t21
Pr(t < Test statistic)
Test Statistic 
© Copyright 2003. Do not distribute or copy without permission.
1.717
21
5.03%
94.97%
5.26%  3.2%
 1.717
1.2%
79
t-Distribution
Example
Your firm has negotiated a labor contract that requires that the firm provide annual raises no
less than the rate of inflation. This year, the total cost of labor covered under the contract will
be $38 million. Your CFO has indicated that the firm’s current financing can support up to a
$2 million increase in labor costs. The CFO wants to know what the magnitude of a possible
“worst-case” scenario. Answer the following: “There is a 90% chance that the increase in
labor costs will be no more than what amount?”
1.3232 
Test Value  3.2%
 Test Value  4.79%
1.2%
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
21
50.00%
Pr(t < Test statistic)
50.00%
Pr(t > Critical value)
10.00%
Critical Value
1.3232
A 4.79% increase on a $38 million base is (4.79%)($38 million) = $1.82 million.
© Copyright 2003. Do not distribute or copy without permission.
80
t-Distribution
The government has contracted a private firm to produce hand grenades. The specifications
call for the grenades to have 10 second fuses. The government has received a shipment of
100,000 grenades and will test a sample of 20 grenades. If, based on the sample, the
government determines that the probability of a grenade going off in less than 8 seconds
exceeds 1%, then the government will reject the entire shipment.
The test results are as follows.
Time to Detonation
8 seconds
9 seconds
10 seconds
11 seconds
12 seconds
13 seconds
Number of Grenades
2
3
10
3
1
1
In general, one would not expect time measures to be normally distributed (because time
cannot be negative). However, if the ratio of the mean to the standard deviation is large
enough, we can use the normal distribution as an approximation.
Should the government reject the shipment?
© Copyright 2003. Do not distribute or copy without permission.
81
t-Distribution
Time to Detonation
8 seconds
9 seconds
10 seconds
11 seconds
12 seconds
13 seconds
Number of Grenades
2
3
10
3
1
1
First: What is the ratio of the mean to the standard deviation?
Mean = 10.05 seconds
Standard deviation = 1.20 seconds
 Ratio is 8.375.
A ratio of greater than 8 is a decent heuristic.
This is not a rigorous test for the appropriateness of the normal distribution. But,
it is not too bad for a “quick and dirty assessment.”
© Copyright 2003. Do not distribute or copy without permission.
82
t-Distribution
Should the government reject the shipment?
Naïve answer: Don’t reject the shipment because none of the grenades detonated in
less than 8 seconds  Pr(detonation in less than 8 seconds) = 0.
12
No grenades
detonated in less
than 8 seconds.
Number of Grenades
10
8
6
4
2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Seconds to Detonation
Histogram: Shows number of observations according to type
© Copyright 2003. Do not distribute or copy without permission.
83
t-Distribution
Should the government reject the shipment?
Correct answer:
We use the sample data to infer the shape of the population distribution.
Inferred population
distribution shows a
positive probability
of finding detonation
times of less than 8
seconds.
© Copyright 2003. Do not distribute or copy without permission.
84
t-Distribution
Should the government reject the shipment?
Correct answer:
1.
Find the test statistic that corresponds to 8 seconds.
Test statistic 
2.
8  10.05
 1.71
1.2
Find the area to the left of the test statistic.
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
(1.710)
19
94.82%
Pr(t < Test statistic)
5.18%
Pr(detonation < 8 seconds) = 5.2%
-1.71
3.
Reject the shipment because probability of early detonation is too high.
© Copyright 2003. Do not distribute or copy without permission.
85
Lognormal Distribution
In the previous example, we noted that the normal distribution may not properly
describe the behavior of random variables that are bounded.
A normally distributed random variable can take on any value from negative infinity to
positive infinity. If the random variable you are analyzing is bounded (i.e. it cannot
cover the full range from negative to positive infinity), then using the normal
distribution to predict the behavior of the random variable can lead to erroneous
results.
Example:
Using the data from the hand grenade example, the probability of a single grenade
detonating in less than zero seconds is 0.0001. That means that, on average, we can
expect one grenade out of every 10,000 to explode after a negative time interval.
Since this is logically impossible, we must conclude that the normal distribution is not
the appropriate distribution for describing time-to-detonation.
© Copyright 2003. Do not distribute or copy without permission.
86
Lognormal Distribution
In instances in which a random variable must take on a positive value, it is often the
case that the random variable has a lognormal distribution.
A random variable is lognormally distributed when the natural logarithm of the
random variable is normally distributed.
Example: Return to the hand grenade example.
Time to Detonation
8 seconds
9 seconds
10 seconds
11 seconds
12 seconds
13 seconds
Log of Time to Detonation
2.0794
2.1972
2.3026
2.3979
2.4849
2.5649
Number of Grenades
2
3
10
3
1
1
As time approaches positive infinity, ln(time) approaches positive infinity.
As time approaches zero, ln(time) approaches negative infinity.
© Copyright 2003. Do not distribute or copy without permission.
87
Lognormal Distribution
Assuming that the times-to-detonation were normally distributed, we found a 2.6%
probability of detonation occurring in under 8 seconds.
Assuming that the times-to-detonation are lognormally distributed, what is the
probability of detonation occurring in under 8 seconds?
Log of Time to Detonation
2.0794
2.1972
2.3026
2.3979
2.4849
2.5649
Number of Grenades
2
3
t Distribution
10
Test statistic
Degrees of Freedom
3
Pr(t > Test statistic)
1
Pr(t < Test statistic)
1
(1.886)
19
96.27%
3.73%
Mean = 2.3010
Standard deviation = 0.1175
Test statistic 
ln(8)  2.3010
 1.8856
0.1175
Pr(detonation < 8 seconds) = 3.7%
© Copyright 2003. Do not distribute or copy without permission.
88
Lognormal Distribution
Example
You are considering buying stock in a small cap firm. The firm’s sales over the past
nine quarters are shown below. You expect your investment to appreciate in value
next quarter provided that the firm’s sales next quarter exceed $27 million. Based on
this assumption, what is the probability that your investment will appreciate in value?
Quarter
1
2
3
4
5
6
7
8
9
Sales (millions)
$25.2
$12.1
$27.9
$28.9
$32.0
$29.9
$34.4
$29.8
$23.2
© Copyright 2003. Do not distribute or copy without permission.
Because sales cannot be negative, it may
be more appropriate to model the firm’s
sales as lognormal rather than normal.
89
Lognormal Distribution
Example
What is the probability that the firm’s sales will exceed $27 million next quarter?
Quarter
1
2
3
4
5
6
7
8
9
Sales (millions)
$25.2
$12.1
$27.9
$28.9
$32.0
$29.9
$34.4
$29.8
$23.2
ln(Sales)
3.227
2.493
3.329
3.364
3.466
3.398
3.538
3.395
3.144
Mean = 3.261
Standard deviation = 0.311
Test statistic 
ln(27)  3.261
 0.1106
0.311
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
0.1106
8.0
45.73%
Pr(t > Critical value)
1.00%
Critical Value
2.896
Pr(sales exceeding $27 million next quarter) = 46%
 Odds are that the investment will decline in value.
© Copyright 2003. Do not distribute or copy without permission.
90
Lognormal Distribution
Example
Suppose we, incorrectly, assumed that sales were normally distributed.
Quarter
1
2
3
4
5
6
7
8
9
Sales (millions)
$25.2
$12.1
$27.9
$28.9
$32.0
$29.9
$34.4
$29.8
$23.2
Mean = 27.044
Standard deviation = 6.520
Test statistic 
27  27.044
 0.007
6.520
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
(0.007)
8.0
50.27%
Pr(t > Critical value)
1.00%
Critical Value
2.896
Pr(sales exceeding $27 million next quarter) > 50%
 Odds are that the investment will increase in value.
 Incorrect distribution yields opposite conclusion.
© Copyright 2003. Do not distribute or copy without permission.
91
Lognormal Distribution
Warning:
The “mean of the logs” is not the same as the “log of the mean.”
 Mean sales = 27.04
 ln(27.044) = 3.298
But:
 Mean of log sales = 3.261
The same is true for standard deviation  the “standard deviation of the logs” is not
the same as the “log of the standard deviations.”
In using the lognormal distribution, we need the “mean of the logs,” and the
“standard deviation of the logs.”
© Copyright 2003. Do not distribute or copy without permission.
92
Lognormal Distribution
When should one use the lognormal distribution?
You should use the lognormal distribution if the random variable is either non-negative or non-positive
Can one use the normal distribution as an approximation of the lognormal distribution?
Yes, but only when the ratio of the mean to the standard deviation is large (e.g. greater than 8).
Note: If the random variable is only positive (or only negative), then you are always better off using the
lognormal distribution vs. the normal or t-distributions. The rules above give guidance for using the
normal or t-distributions as approximations.
Hand grenade example

Mean / Standard deviation = 10.05 / 1.20 = 8.38

Normal distribution underestimated probability of early detonation by 1.1% (3.7% for lognormal vs.
2.6% for t-distribution)
Quarterly sales example

Mean / Standard deviation = 27.04 / 6.52 = 4.15

Normal distribution overestimated probability of appreciation by 4.6%
(45.7% for lognormal vs. 50.3% for t-distribution)
© Copyright 2003. Do not distribute or copy without permission.
93
Distribution of Sample Means
So far, we have looked at the distribution of individual observations.






Gas mileage for a single SUV.
Burn life for a single light bulb.
Return on IBM stock next quarter.
Inflation rate next year.
Time to detonation for a single hand grenade.
Firm’s sales next quarter.
In each case, we had sample means and sample standard deviations and asked,
“What is the probability of the next observation lying within some range?”
Note: Although we drew on information contained in a sample of many observations,
the probability questions we asked always concerned a single observation.
 In these cases, the random variable we analyzed was a “single draw” from the
population.
© Copyright 2003. Do not distribute or copy without permission.
94
Distribution of Sample Means
We now want to ask probability questions about sample means.
Example:
EPA standards require that the mean gas mileage for a manufacturer’s cars be at least
20 mpg. Every year, the EPA takes a sampling of the gas mileages of a manufacturer’s
cars. If the mean of the sample is below 20 mpg, the manufacturer is fined.
In 2001, GM produced 145,000 cars. Suppose five EPA analysts each select 10 cars
and measures their mileages. The analysts obtain the following results.
Analyst #1
17
16
19
21
19
21
16
16
19
22
Analyst #2
22
22
19
22
25
18
16
24
18
15
© Copyright 2003. Do not distribute or copy without permission.
Analyst #3
16
20
17
17
23
23
19
22
20
15
Analyst #4
21
20
22
20
18
22
19
23
17
21
Analyst #5
24
24
20
22
17
23
22
15
19
15
95
Distribution of Sample Means
Analyst #1
17
16
19
21
19
21
16
16
19
22
Analyst #2
22
22
19
22
25
18
16
24
18
15
Analyst #3
16
20
17
17
23
23
19
22
20
15
Analyst #4
21
20
22
20
18
22
19
23
17
21
Analyst #5
24
24
20
22
17
23
22
15
19
15
Notice that each analyst obtained a different sample mean. The sample means are:
Analyst
Analyst
Analyst
Analyst
Analyst
#1:
#2:
#3:
#4:
#5:
18.6
20.1
19.2
20.3
20.1
 The analysts obtain different sample means because their samples consist of
different observations. Which is correct?
 Each sample mean is an estimate of the population mean.
The sample means vary depending on the observations picked.
 The sample means are, themselves, random variables.
© Copyright 2003. Do not distribute or copy without permission.
96
Distribution of Sample Means
Notice that we have identified two distinct random variables:
1.
The process that generates the observations is one random variable (e.g. the
mechanism that determines each car’s mpg).
2.
The mean of a sample of observations is another random variable (e.g. the
average mpg of a sample of cars).
The distribution of sample means is governed by the central limit theorem.
Central Limit Theorem
Regardless of the distribution of the random variable generating the observations, the
sample means of the observations are t-distributed.
Example:
It doesn’t matter whether mileage is distributed normally, lognormally, or according to
any other distribution, the sample means of gas mileages are t-distributed.
© Copyright 2003. Do not distribute or copy without permission.
97
Distribution of Sample Means
Example:
The following slides show sample means taking from a uniformly distributed random
variable.
The random variable can take on any number over the range 0 through 1 with equal
probability.
For each slide, we see the mean of a sample of observations of this uniformly
distributed random variable.
© Copyright 2003. Do not distribute or copy without permission.
98
Distribution of Sample Means
One Thousand Sample Means: Each Derived from 1 Observation
35
Number of Sample Means Observed
30
25
20
15
10
5
0.92
0.88
0.84
0.80
0.76
0.72
0.68
0.64
0.60
0.56
0.52
0.48
0.44
0.40
0.36
0.32
0.28
0.24
0.20
0.16
0.12
0.08
0.04
0.00
0
Value of Sample Mean
© Copyright 2003. Do not distribute or copy without permission.
99
Distribution of Sample Means
One Thousand Sample Means: Each Derived from 2 Observations
60
Number of Sample Means Observed
50
40
30
20
10
0.92
0.88
0.84
0.80
0.76
0.72
0.68
0.64
0.60
0.56
0.52
0.48
0.44
0.40
0.36
0.32
0.28
0.24
0.20
0.16
0.12
0.08
0.04
0.00
0
Value of Sample Mean
© Copyright 2003. Do not distribute or copy without permission.
100
Distribution of Sample Means
One Thousand Sample Means: Each Derived from 5 Observations
80
Number of Sample Means Observed
70
60
50
40
30
20
10
0.92
0.88
0.84
0.80
0.76
0.72
0.68
0.64
0.60
0.56
0.52
0.48
0.44
0.40
0.36
0.32
0.28
0.24
0.20
0.16
0.12
0.08
0.04
0.00
0
Value of Sample Mean
© Copyright 2003. Do not distribute or copy without permission.
101
Distribution of Sample Means
One Thousand Sample Means: Each Derived from 20 Observations
160
Number of Sample Means Observed
140
120
100
80
60
40
20
0.92
0.88
0.84
0.80
0.76
0.72
0.68
0.64
0.60
0.56
0.52
0.48
0.44
0.40
0.36
0.32
0.28
0.24
0.20
0.16
0.12
0.08
0.04
0.00
0
Value of Sample Mean
© Copyright 2003. Do not distribute or copy without permission.
102
Distribution of Sample Means
One Thousand Sample Means: Each Derived from 200 Observations
400
Number of Sample Means Observed
350
300
250
200
150
100
50
0.92
0.88
0.84
0.80
0.76
0.72
0.68
0.64
0.60
0.56
0.52
0.48
0.44
0.40
0.36
0.32
0.28
0.24
0.20
0.16
0.12
0.08
0.04
0.00
0
Value of Sample Mean
© Copyright 2003. Do not distribute or copy without permission.
103
Distribution of Sample Means
Notice two things that occur as we increase the number of observations that feed into
each sample.
1. The distribution of sample means very quickly becomes “bell shaped.” This is the
result of the central limit theorem – basing a sample mean on more observations
causes the sample mean’s distribution to approach the normal distribution.
2. The variance of the distribution decreases. This is the result of our next topic – the
variance of a sample mean.
The variance of a sample mean decreases as the number of observations comprising the
sample increases.
Standard deviation of the observations
Standard deviation of sample means 
Number of observations comprising the sample means
(called the “standard error”)
© Copyright 2003. Do not distribute or copy without permission.
104
Distribution of Sample Means
Example:
In the previous slides, we saw sample means of observations drawn from a uniformly
distributed random variable.
The variance of a uniformly distributed random variable that ranges from 0 to 1 is 1/12.
Therefore:
1 /12
 0.0833
1
1 /12
2 observations 
 0.0417
2
1 /12
5 observations 
 0.0167
5
1 /12
20 observations 
 0.0042
20
1 /12
200 observations 
 0.0004
200
Variance of sample means based on 1 observation 
Variance of sample means based on
Variance of sample means based on
Variance of sample means based on
Variance of sample means based on
© Copyright 2003. Do not distribute or copy without permission.
105
Distribution of Sample Means
Example:
Let us return to the EPA analysts.
Analyst #1 looked at 10 cars and found an average mileage of 18.6 and a standard
deviation of 2.271. Analyst #1’s data was: 17, 16, 19, 21, 19, 21, 16, 16, 19, 22
Based on this sample, GM can expect that 95% of cars to have mileages between what
two extremes?
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
-2.262
© Copyright 2003. Do not distribute or copy without permission.
Pr(t > Critical value)
2.50%
Critical Value
2.262
2.262
Left  18.6
2.271
Right  18.6
2.262 
2.271
2.262 
9.0
50.00%
 Left  (2.262)(2.271)  18.6
13.5
 Right  (2.262)(2.271)  18.6
23.7
106
Distribution of Sample Means
Example:
Let us return to the EPA analysts.
Analyst #1 looked at 10 cars and found an average mileage of 18.6 and a standard
deviation of 2.271. Analyst #1’s data was: 17, 16, 19, 21, 19, 21, 16, 16, 19, 22
Based on this sample, GM can expect that 95% of analysts who look at 10 cars each
will find average mileages between what two extremes?
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
-2.262
9.0
50.00%
Pr(t > Critical value)
2.50%
Critical Value
2.262
2.262
Standard error 
2.271
 0.718
10
Left  18.6
2.262 
 Left  (2.262)(0.718)  18.6 17.0
0.718
Right  18.6
2.262 
 Right  (2.262)(0.718)  18.6 20.2
0.718
© Copyright 2003. Do not distribute or copy without permission.
107
Distribution of Sample Means
Example:
Let us return to the EPA analysts.
Analyst #1 looked at 10 cars and found an average mileage of 18.6 and a standard
deviation of 2.271. Analyst #1’s data was: 17, 16, 19, 21, 19, 21, 16, 16, 19, 22
Based on this sample, GM can expect that 95% of analysts who look at 20 cars each
will find average mileages between what two extremes?
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
-2.262
2.262
9.0
50.00%
Pr(t > Critical value)
2.50%
Critical Value
2.262
Standard error 
2.271
 0.508
20
Left  18.6
2.262 
0.508
 Left  (2.262)(0.508)  18.6
17.5
Right  18.6
0.508
 Right  (2.262)(0.508)  18.6
19.7
2.262 
© Copyright 2003. Do not distribute or copy without permission.
108
Distribution of Sample Means
Example:
Let us return to the EPA analysts.
Analyst #1 looked at 10 cars and found an average mileage of 18.6 and a standard
deviation of 2.271. Analyst #1’s data was: 17, 16, 19, 21, 19, 21, 16, 16, 19, 22
95% of cars will have mileages between 13.5 mpg and 23.7 mpg.
95% of analysts who look at 10 cars each should find average mileages between 17.0 mpg
and 20.2 mpg.
95% of analysts who look at 20 cars each should find average mileages between 17.5 mpg
and 19.7 mpg.
© Copyright 2003. Do not distribute or copy without permission.
109
Confidence Intervals
While we can’t know the values of the population parameters (unless we have the
entire population of data), we can make statements about how likely it is to find the
population parameters within certain ranges.
We construct confidence intervals to describe ranges over which population parameters
are likely to exist.
Example:
Suppose EPA analyst #1 found the following data:
Sample mean = 18.6 mpg
Sample standard deviation = 2.271 mpg
Sample size = 20
Standard error = 0.508
From the t9 distribution, we know that:
50% of the sample means lie within 0.7027 standard deviations of the population mean
75% of the sample means lie within 1.2297 standard deviations of the population mean
95% of the sample means lie within 2.2622 standard deviations of the population mean
© Copyright 2003. Do not distribute or copy without permission.
110
Confidence Intervals
We can use this information to construct confidence intervals around the population
mean, where a confidence interval is:
Measure  (critical value)(standard deviation of the measure)
The measure and the standard deviation are found in the data. What critical value we
select depends on the level of confidence we desire.
…widens the range of focus.
There is a 50% chance that the population mean is
found within the range: 18.6  (0.7027)(0.508)  [18.2, 19.0]
There is a 75% chance that the population mean is
found within the range: 18.6  (1.2297)(0.508)  [18.0, 19.2]
There is a 95% chance that the population mean is
found within the range: 18.6  (2.2622)(0.508)  [17.5, 19.7]
Increasing the level of confidence…
© Copyright 2003. Do not distribute or copy without permission.
111
Confidence Intervals
At the extremes, we can say…
1.
There is a 100% chance that the population mean is between negative infinity and
positive infinity.
2.
There is a 0% chance that the population mean is exactly 18.60000000000000…
The first statement gives perfect certainty about an infinitely unfocused range.
The second statement gives zero certainty about an infinitely focused range.
Usually, when statisticians mention “error,” they are referring to the range on a 95%
confidence interval.
© Copyright 2003. Do not distribute or copy without permission.
112
Confidence Intervals
Example:
You take a sample of 40 technology companies. The average P/E for the
sample is 71.8. The standard deviation of the P/E’s of the 40 companies is
22.4. What is the measurement error associated with this average (at the
95% confidence level)?
The confidence interval is:
Measure  (critical value)(standard deviation of the measure)
Sample mean  71.8
Standard error 
22.4
 3.54
40
Critical value (from t 39 )  2.0227
Measurement error
71.8  (2.0227)(3.54)  71.8  7.16
Average P/E ratio for all tech companies
© Copyright 2003. Do not distribute or copy without permission.
113
Confidence Intervals
Example:
Your firm solicits estimates for constructing a new building. You receive the
following seven estimates:
$10 million, $12 million, $15 million, $13 million,
$11 million, $14 million, $12 million
Based on this information, construct a 90% confidence interval for the
estimated cost of the building.
Measure  (critical value)(standard deviation of the measure)
Sample mean  $12.4 million
Standard deviation  $1.7 million
Critical value (from t 6 )  1.9432
$12.4 million  (1.9432)($1.7 million)  [$9.1 million, $15.7 million]
© Copyright 2003. Do not distribute or copy without permission.
114
Confidence Intervals
Standard deviation  $1.7 million
Critical value (from t 6 )  1.9432
$12.4 million  (1.9432)($1.7 million)  [$9.1 million, $15.7 million]
The difference lies in
the choice of
standard deviations.
This is a 90% confidence interval for the cost of the building.
What if we had used the standard deviation of the sample
mean (the “standard error”) instead of the standard deviation
of the observations?
Measure  (critical value)(standard deviation of the measure)
Sample mean  $12.4 million
Standard deviation of the sample mean 
$1.7 million
7
 $643, 000
Critical value (from t 6 )  1.9432
$12.4 million  (1.9432)($643, 000)  [$11.2 million, $13.6 million]
© Copyright 2003. Do not distribute or copy without permission.
This is not a 90%
confidence interval for
the cost of the building,
but a 90% confidence
interval for the average
cost of seven buildings.
115
Confidence Intervals
Confidence interval for the cost of the building
There is a 90% probability that the cost of a single building will be between $9.1 million
and $15.7 million.
Confidence interval for the average cost of the buildings
There is a 90% probability that, when constructing seven buildings, the average cost per
building will be between $11.2 million and $13.6 million.
© Copyright 2003. Do not distribute or copy without permission.
116
Distribution of Proportions
Proportions are means of categorical data. Categorical data is usually non-numeric and
represents a state or condition rather than a value.
Example:
In a vote between George Bush and Al Gore, the data are categorical. E.g. “Bush, Gore,
Gore, Bush, Gore, Bush, Bush, Bush, Gore, etc.”
A proportion measures the frequency of a single category relative to all categories. For
example, if the data set includes 10 “Bush’s” and 12 “Gore’s” then the category “Gore”
represents 12 / (10 + 12) = 55% of all the categories.
A population proportion (usually denoted as  ) is calculated based on the entire
population of data. A sample proportion (usually denoted as p) is calculated based on a
sample of the data.
The properties of the sample proportion are:
Population mean = 
Sample standard deviation =
p (1  p )
N
Distribution = normal (provided Np > 5 and N(1–p) > 5)
© Copyright 2003. Do not distribute or copy without permission.
117
Distribution of Proportions
Example:
There are 8.3 million registered voters in Florida. Within the first few hours after the
polls closed in the 2000 election, the count showed 50.5% of the vote going to George
Bush. This estimate was based on only 200,000 votes. Build a 99% confidence interval
for the population proportion of votes for Bush.
Measure  p  0.505
N  200,000
Standard deviation of the measure  0.0011
© Copyright 2003. Do not distribute or copy without permission.
Standard Deviation of a Sample Proportion
Proportion
0.505
Sample
200,000
Stdev(proportion)
0.0011
118
Distribution of Proportions
Example:
There are 8.3 million registered voters in Florida. Within the first few hours after the
polls closed in the 2000 election, the count showed 50.5% of the vote going to George
Bush. This estimate was based on only 200,000 votes. Build a 99% confidence interval
for the population proportion of votes for Bush.
Measure  (critical value)(standard deviation of the measure)
Measure  p  0.505
Standard deviation of the measure 
0.505(1  0.505)
 0.00112
200, 000
Standard Normal Distribution (Z)
Test statistic
Pr(Z > Test statistic)
50.00%
Pr(Z < Test statistic)
50.00%
Pr(Z > Critical value)
Critical Value
Half of 1% is 0.5%.
0.50%
2.5758
Critical value (standard normal)  2.5758
There is a 99% probability that the population proportion
of votes for Bush is between 50.2% and 50.8%.
© Copyright 2003. Do not distribute or copy without permission.
119
Distribution of Proportions
Example:
Given that a sample of voters shows one candidate with a 1% lead (50.5% vs. 49.5%),
what is the minimal number of votes that can be cast such that a 99.99% confidence
interval for the candidate’s population proportion is greater than 50%?
Measure  (critical value)(standard deviation of the measure)
Measure  p  0.505
Standard deviation of the measure 
0.505(1  0.505)
N

0.249975
N
Critical value (standard normal)  3.8906
Left end of confidence interval  0.505  (3.8906)
0.249975
N
Left end of confidence interval  0.50  0.505  (3.8906)
0.249975
N
 0.50  N  151,353
For elections in which the winner wins by at least 1%, one can poll (approximately) 150,000
voters and get, with a margin of error of 0.01%, the same result as that obtained by polling all
voters. This margin of error implies 1 miscalled election out of every 10,000 elections.
© Copyright 2003. Do not distribute or copy without permission.
120
Sampling Bias
Given these results, why were the political parties so concerned with counting “every
vote” in Florida?
Polling 150,000 people works only if the people are selected randomly.
In Florida, political parties were advocating recounts only for subsets of voters (i.e.
states and counties) that were predominantly aligned with one or the other party.
The argument in the Florida election ultimately revolved around attempts to introduce
and block sampling biases.
© Copyright 2003. Do not distribute or copy without permission.
121
Sampling Bias
Sampling bias is a systematic tendency for samples to misrepresent the population from which they
are drawn.
A sample is not biased if it fails to represent the population – there is a measurable probability that
a given sample will fail to represent the population. Rather, the data selection process is biased if
repeated samples consistently misrepresent the population.
Types of sampling biases:
Selection bias: Researcher excludes atypical subsets of the population from the data.
E.g.
Estimate the average rate of return on low P/B stocks.
Problem: Firms with low P/B fail at a higher rate than firms with high P/B. Failed firms do not
appear in the data set.
Result:
Sample mean return is greater than population mean return.
Non-response bias: Atypical subsets of subjects exclude themselves from the data.
E.g.
Estimate the standard deviation of household incomes.
Problem: Individuals at the high and low extremes will be less likely to respond.
Result:
Sample standard deviation is less than the population standard deviation.
Measurement bias: The measurement applied to the sample atypically approximates the population.
E.g.
Estimate average purchasing power by measuring income over time.
Problem: As prices rise, incomes rise, but purchasing power does not.
Result:
Sample mean of income exceeds population mean of purchasing power.
© Copyright 2003. Do not distribute or copy without permission.
122
Hypothesis Testing
Thus far, we have
1.
2.
3.
Estimated the probability of finding single observations that are certain distances
away from the population mean.
Estimated the probability of finding sample means that are certain distances away
from the population mean.
Estimated left and right boundaries that contain the population mean at varying
degrees of confidence.
We now want to test statements about the population mean.
Procedure for testing a hypothesis:
•
•
•
State a null hypothesis concerning the population parameter. The null
hypothesis is what we will assume is true.
State an alternative hypothesis concerning the population parameter. The
alternative hypothesis is what we will assume to be true if the null hypothesis is
false.
Calculate the probability of observing a sample that disagrees with the null at
least as much as the sample you observed.
© Copyright 2003. Do not distribute or copy without permission.
123
Hypothesis Testing
Example:
Suppose we want to test the hypothesis that Bush obtained more than 50% of the vote
in Florida.
1.
2.
3.
Our null hypothesis is   0.5
Our alternative hypothesis is   0.5
Based on a sample of 200,000 votes, we observed p = 0.505. Calculate the
probability of observing p = 0.505 (or less) when, in fact,   0.5.
H0 :   0.5
Ha :   0.5
Since we are assuming that   0.5, or (in the most conservative case),   0.5,
we are also assuming that the standard deviation of p is
(0.5)(1  0.5)
 0.001118.
200,000
We now ask the question: “Assuming that the null hypothesis is true, what is the probability of
observing a sample that disagrees with the null at least as much as the sample we observed?”
© Copyright 2003. Do not distribute or copy without permission.
124
Hypothesis Testing
We now ask the question: “Assuming that the null hypothesis is true, what is the probability of
observing a sample that disagrees with the null at least as much as the sample we observed?”
The area to the right of 0.505 is the
probability of finding a sample proportion of
at least 0.505 when, in fact, the population
proportion is 0.5.
The sample proportion we found was
0.505.
The area to the left of 0.505 is the
probability of finding a sample proportion
of at most 0.505 when, in fact, the
population proportion is 0.5.
© Copyright 2003. Do not distribute or copy without permission.
According to the null hypothesis, we
assume that the center of the
distribution is 0.5.
125
Hypothesis Testing
We now ask the question: “Assuming that the null hypothesis is true, what is the probability of
observing a sample that disagrees with the null at least as much as the sample we observed?”
Because the setup of the distribution
assumes that the population proportion is at
least 0.5, we are more concerned with the
alternative tail.
The area of the alternative tail tells us the
probability of observing a sample “as good
or worse” than the one we observed when,
in fact, the null hypothesis is true.
The area to the left of 0.505 is the
probability of finding a sample proportion
of at most 0.505 when, in fact, the
population proportion is 0.5.
© Copyright 2003. Do not distribute or copy without permission.
Using the formula for the test statistic, we
find that the area of the alternative tail is
0.9996.
We say: “Assuming that Bush would gain
at least 50% of the vote, there is a
99.96% chance that a sample of 200,000
votes would show at most 50.5% for
Bush.”
126
Hypothesis Testing
“Assuming that Bush would gain at least 50% of the vote, there is a 99.96% chance
that a sample of 200,000 votes would show at most 50.5% for Bush.”
Notice that this statement is not very enlightening. What it says (in effect) is: “If
you assume that Bush wins, then the sample results we see are reasonable.” This
sounds like a circular argument.
Example:
1.
You buy a new house and, although you have seen no termites in the house, you
assume that the house is in danger of termite infestation.
2.
You spend $5,000 on a new treatment that is supposed to guarantee that
termites will never infest your house.
3.
Following the treatment, you see no termites.
4.
You conclude that the treatment was worth the $5,000.
The problem with this line of reasoning is that your belief that the expensive
treatment works is based on the (possibly false) assumption that you had termites.
© Copyright 2003. Do not distribute or copy without permission.
127
Hypothesis Testing
Example:
1.
You buy a new house and, although you have seen no termites in the house, you
assume that the house is in danger of termite infestation.
2.
You spend $5,000 on a new treatment that is supposed to guarantee that
termites will never infest your house.
3.
Following the treatment, you see no termites.
4.
You conclude that the treatment was worth the $5,000.
Following the termite treatment, two things can happen:
•
You don’t see termites in the house.
•
You do see termites in the house.
If you don’t see termites, you can conclude nothing. It could be the case that the
treatment works, or it could be the case that the treatment doesn’t work but you’ll
never know because you don’t have termites.
If you do see termites, you can conclude that the treatment doesn’t work.
© Copyright 2003. Do not distribute or copy without permission.
128
Hypothesis Testing
Returning to the election example, finding a sample proportion of 0.505 does not tell
us that the population proportion is greater than 0.5 because we began the analysis
assuming that the population proportion was greater than 0.5.
However, if we found a sample proportion of (for example) 49.8%, this may tell us
something.
H0 :   0.5
Ha :   0.5
Assuming that (in the most conservative case)   0.5, stdev(p ) 
Test statistic 
(0.5)(1  0.5)
 0.001118.
200,000
Test value  mean 0.498  0.5

 1.7889
standard deviation
0.001118
The area of the alternative tail is 3.7%.
We conclude:
 If, in fact, the population proportion of
votes for Bush is at least 50%, then there is
only a 3.7% chance of observing a sample
proportion of, at most, 49.8%.
© Copyright 2003. Do not distribute or copy without permission.
129
Hypothesis Testing
The area corresponding to the alternative hypothesis is called the “p-value” (“p”
stands for “probability”).
In words, the p-value is the probability of rejecting the null hypothesis when, in fact,
the null hypothesis is true.
For example, suppose that the sample of 200,000 voters had a sample proportion of
49.8% voting for Bush.
The null hypothesis is that the population proportion exceeds 0.5 – i.e. “Bush wins the
election.”
So, if Bush were to concede the election before the entire population of votes were
tallied (i.e. if Bush were to reject the null hypothesis), then there is a 3.7% chance
that he would be conceding when, in fact, the population of votes is in his favor.
© Copyright 2003. Do not distribute or copy without permission.
130
Hypothesis Testing
In making decisions on the basis of samples, you can make either of two types of errors.
Type I Error
Reject the null hypothesis when, in fact, the null hypothesis is true.
Example: Conclude that the termite treatment does work when, in fact, it does not work.
Type II Error
Fail to reject the null hypothesis when, in fact, the null hypothesis is false.
Example: Conclude that the termite treatment does not work when, in fact, it does work.
Because all of our analyses begin with an assumption about the population, our p-values
will always refer to Type I errors. This does not mean that we are immune from Type II
errors. Rather, it means that the calculation of Type II errors is beyond the scope of this
course.
© Copyright 2003. Do not distribute or copy without permission.
131
Hypothesis Testing
Returning to the EPA example, there are two ways the EPA analyst could construct
hypotheses.
H0 :   20
Ha :   20
Presumption
GM is in compliance unless the data indicate otherwise.
Implications of Results
Reject the null: GM is not in compliance.
Fail to reject the null: No conclusion.
H0 :   20
Ha :   20
Presumption
GM is not in compliance unless the data indicate otherwise.
Implications of Results
Reject the null: GM is in compliance.
Fail to reject the null: No conclusion.
© Copyright 2003. Do not distribute or copy without permission.
132
Hypothesis Testing
H0 :   20
Ha :   20
Sample mean  18.6
Sample standard deviation of the sample means  0.508
18.6  20
Test statistic 
 2.756
0.508
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
Pr(t < Test statistic)
(2.756)
9
98.89%
1.11%
Pr(t > Critical value)
Critical Value
###########
Conclusion: If the fleet’s average mileage did exceed 20 mpg, then the probability of
finding a sample with (at most) an average mileage of 18.6 would be 1.1%.
Alternatively: The null hypothesis is that GM’s fleet meets or exceeds EPA requirements.
Based on the sample data, were the EPA to declare GM in violation of EPA requirements
(i.e. reject the null hypothesis), there would be a 1.1% chance that the EPA’s ruling
would be incorrect.
© Copyright 2003. Do not distribute or copy without permission.
133
Hypothesis Testing
Two approaches to hypothesis testing
Procedure for hypothesis testing using significance level approach:
1. State the null and alternative hypotheses.
2. Picture the distribution and identify the null and alternative areas.
3. Using the significance level, identify the critical values(s) that separate the null
and alternative areas.
4. Calculate the test statistic.
5. Place the test statistic on the distribution. If it falls in the alternative area, reject
the null hypothesis. If it falls in the null area, fail to reject the null hypothesis.
Procedure for hypothesis testing using p-value approach:
1. State the null and alternative hypotheses.
2. Picture the distribution and identify the null and alternative areas.
3. Calculate the test statistic.
4. Find the area from the test statistic toward the alternative area(s). This area is
the p-value.
5. Interpretation: p-value is the probability of rejecting the null when, in fact, the
null is true.
© Copyright 2003. Do not distribute or copy without permission.
134
Hypothesis Testing
Example (significance level approach):
Using the EPA data from analyst #1, test the hypothesis that the (population) average
mileage of GM’s car fleet exceeds 20 mpg. Test the hypothesis at the 5% significance
level.
Area in alternative tail = 5%
Critical value = -1.833
Test statistic = -1.949
Test statistic falls in alternative tail
 Reject the null hypothesis
© Copyright 2003. Do not distribute or copy without permission.
135
Hypothesis Testing
Example (p-value approach):
Using the EPA data from analyst #1, test the hypothesis that the (population) average
mileage of GM’s car fleet exceeds 20 mpg.
Test statistic = -1.949
Area from test statistic toward alternative
area = 4.16%
Interpretation: If we were to reject the
null, there would be a 4.16% chance that
we would be incorrect.
© Copyright 2003. Do not distribute or copy without permission.
t9
136
Hypothesis Testing
Example:
Test the hypothesis that the average real rate of return on 12 month municipal bonds
exceeds 3% at a 5% significance level.
Sample data:
N  50
x  4.2%
s x  9.9%
sx 
9.9%
50
 1.4%
Critical value:
Critical value is the value that causes
the alternative tail area to equal the
significance level.
Ha is to the left.
Hypotheses:
H0 :   3%
Ha :   3%
t49
Test Statistic:
Test statistic 
x   4.2%  3%

 0.8571
1.4%
sx
-1.677
0.8571
Fail to reject H0.
© Copyright 2003. Do not distribute or copy without permission.
137
Hypothesis Testing
Example:
A paint manufacturer advertises that, when applied correctly, its paint will resist peeling
for 5 years.
A consumer watchdog group has filed a class action suit against the manufacturer for
false advertisement. Based on the following data (numbers reflect years prior to peeling)
test the manufacturer’s claim at the 1% level of significance.
Sample data:
4.9, 5.2, 3.7, 5.3, 4.8, 4.5, 5.1, 5.8, 4.1, 4.7
N  10
x  4.81
s x  0.6064
sx 
0.6064
10
 0.1917
Presumption of innocence
H0 :   5
Ha :   5
t9
Ha on the left
Critical value = –2.8214
x   4.81  5
Test statistic 

 0.991
sx
0.1917
© Copyright 2003. Do not distribute or copy without permission.
-0.991
Test statistic falls in null tail
 fail to reject null hypothesis.
138
Hypothesis Testing
Example:
A paint manufacturer advertises that, when applied correctly, its paint will resist peeling
for 5 years.
A consumer watchdog group has filed a class action suit against the manufacturer for
false advertisement. Based on the following data (numbers reflect years prior to peeling)
calculate the p-value for the manufacturer’s claim.
Sample data:
4.9, 5.2, 3.7, 5.3, 4.8, 4.5, 5.1, 5.8, 4.1, 4.7
Using the p-value approach, we find the area of
the alternative tail, starting at the test statistic.
Conclusion: Assuming that the null hypothesis is
true, there is a 17.4% chance that we would find
a sample mean (based on 10 observations) of
4.81 or less.
t9
-0.991
Alternatively: We can reject the null hypothesis,
but there would be a 17.4% chance that we
would be wrong in doing so.
area = 0.174
Test statistic 
© Copyright 2003. Do not distribute or copy without permission.
x   4.81  5

 0.991
sx
0.1917
139
Distribution of a Difference in Sample Means
Frequently, we are interested in comparing the means of two populations.
Statistically, this is a more complicated problem than simply testing a single sample
mean.
In the means tests we have seen thus far, we have always compared a sample mean to
some fixed number.
Example: In testing the hypothesis that the mean return on bonds exceeds 3%, we
compared a random variable (the sample mean) to a fixed number (3%).
When we perform a test on a single sample mean, we are comparing a single random
variable to a fixed number.
When we perform a test comparing two sample means, we are comparing two random
variables to each other.
© Copyright 2003. Do not distribute or copy without permission.
140
Distribution of a Difference in Sample Means
Let x a  x b be a difference in sample means.
The properties of the difference in sample means are:
Population mean  a  b
Sample standard deviation  s x a  x b
Distribution  t df , where df 
© Copyright 2003. Do not distribute or copy without permission.
s x2a s x2b


Na Nb
 s x2a  s x2b 
2
s x4a
s x4b

Na  1 Nb  1
141
Difference in Means Test
Example:
Test the hypothesis (at a 1% significance level) that the average rate of return on 12
month Aaa bonds is less than the average rate of return on 12 month municipal bonds.
We draw two samples from two different populations (Aaa bonds and municipal bonds).
We now have two random variables (the sample means from each population).
Our hypotheses are:
H0 :  Aaa  muni  0%
Ha :  Aaa  muni  0%
We obtain the following sample data:
N Aaa  43, N muni  50
x Aaa  5.1%, x muni  4.2%
s Aaa  1.4%, s muni  1.1%
© Copyright 2003. Do not distribute or copy without permission.
s x Aaa x
s x2Aaa s x2
(0.014)2 (0.011)2




 0.003
N Aaa N muni
43
50
muni
muni
142
Difference in Means Test
Example:
Test the hypothesis (at a 1% significance level) that the average rate of return on 12
month Aaa bonds is less than the average rate of return on 12 month municipal bonds.
Our hypotheses are:
H0 :  Aaa  muni  0%
Test statistic 
 0.051  0.042  0  3.407
0.003
Ha :  Aaa  muni  0%
We obtain the following sample data:
N Aaa  43, N muni  50
x Aaa  5.1%, x muni  4.2%
s Aaa  1.4%, s muni  1.1%
s x Aaa  x  0.003
muni
© Copyright 2003. Do not distribute or copy without permission.
The degrees of freedom are:
  0.014 2  0.011 2 

2
 
 
2
2

s xa  s xb
  43   50  
df 

4
4
s x4a
s x4b
 0.014 
 0.011 





Na  1 Nb  1
 43    50 
43  1
50  1


2
79
143
Difference in Means Test
Example:
Test the hypothesis (at a 1% significance level) that the average rate of return on 12
month Aaa bonds is less than the average rate of return on 12 month municipal bonds.
Our hypotheses are:
H0 :  Aaa  muni  0%
Ha :  Aaa  muni  0%
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
79.0
50.00%
Pr(t < Test statistic)
50.00%
Pr(t > Critical value)
1.00%
Critical Value
Test statistic 
2.3745
 0.051  0.042  0  3.407
© Copyright 2003. Do not distribute or copy without permission.
0.003
Test statistic falls in alternative
tail  reject null hypothesis.
144
Difference in Means Test
Example:
Find the p-value for the hypothesis that the average rate of return on 12 month Aaa
bonds is less than the average rate of return on 12 month municipal bonds.
Our hypotheses are:
H0 :  Aaa  muni  0%
Ha :  Aaa  muni  0%
Test statistic 
 0.051  0.042  0  3.407
0.003
Probability of finding a sample that
disagrees with the null by at least as much
as the sample we observed when, in fact,
the null hypothesis is true = 0.05%.
We can reject the null hypothesis, but there
is a 0.05% chance that we would be wrong
in doing so.
© Copyright 2003. Do not distribute or copy without permission.
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
Pr(t < Test statistic)
3.407
79.0
0.05%
99.95%
Pr(t > Critical value)
Critical Value
###########
145
Difference in Means Test
Example:
Find the p-value for the hypothesis that the average rate of return on 12 month Aaa
bonds is less than the average rate of return on 12 month municipal bonds.
N Aaa  43, N muni  50
x Aaa  5.1%, x muni  4.2%
s Aaa  1.4%, s muni  1.1%
X1 bar
Sx1
N1
Difference in Means Test
0.051 Stdev(X1 bar - X2 bar)
0.014 Test statistic (distributed t)
43 df
X2 bar
0.042
Sx2
0.011
N2
© Copyright 2003. Do not distribute or copy without permission.
0.003
3.407
79.28
50
146
Difference in Means Test
Using Data Set #1, test the hypothesis (at a 5% level of significance) that the average
cost of unleaded gas is more expensive today than in the past.
Suggestion: The data set ranges from January 1976 through April 2003. Split the data set
into three parts (1/76 through 12/84, 1/85 through 12/93, and 1/94 through 4/03) and
test for a difference in population means between the first and third parts.
1. State the hypotheses
3. Find appropriate critical value
H0 : 1  3
H0 : 1  3  0
Ha : 1  3
Ha : 1  3  0
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
178
50.00%
Pr(t < Test statistic)
50.00%
Pr(t > Critical value)
5.00%
2. Calculate sample statistics and test statistic
N 1  108, N 3  112
x 1  80.58, x 3  106.92
s 1  24.49, s 3  15.37
X1 bar
Sx1
N1
X2 bar
Sx2
N2
© Copyright 2003. Do not distribute or copy without permission.
Critical Value
Difference in Means Test
80.580 Stdev(X1 bar - X2 bar)
24.490 Test statistic (distributed t)
108 df
1.6535
2.768
(9.515)
178.85
106.920
15.370
112
147
Difference in Means Test
Using Data Set #1, test the hypothesis (at a 5% level of significance) that the average
cost of unleaded gas is more expensive today than in the past.
Suggestion: The data set ranges from January 1976 through April 2003. Split the data set
into three parts (1/76 through 12/84, 1/85 through 12/93, and 1/94 through 4/03) and
test for a difference in population means between the first and third parts.
4. Compare test statistic to critical value
H0 : 1  3
Ha : 1  3
Test statistic falls in null area  fail to reject null hypothesis.
© Copyright 2003. Do not distribute or copy without permission.
148
Difference in Means Test
Using Data Set #1, test the hypothesis (at a 5% level of significance) that the average
cost of unleaded gas is more expensive today than in the past.
Note: The question asks if the average cost of unleaded gas is more expensive today
than in the past. One way to interpret this is in terms of price (which we have done).
Another way to interpret this is in terms of purchasing power. If the researcher intended
this latter interpretation, then we may have introduced measurement bias  the price of
gas in dollars may not reflect the cost of gas in purchasing power.
Using Data Set #2, test the hypothesis (at a 5% level of significance) that the average
cost of unleaded gas (in terms of purchasing power) is more expensive today than in the
past.
Suggestion: Again, split the data set in three parts and compare the sample means of
parts 1 and 3. This data set includes average hourly earnings of private sector
employees. Use the ratio of the price of gas to average hourly earnings as a
measurement of the purchasing power cost of gas.
Note: The cost of gas (in terms of purchasing power) is the price of gas divided by the
wage rate  ($ / gal) / ($ / hr) = hr / gal = how many hours a person must work to be
able to afford 1 gallon of gas.
© Copyright 2003. Do not distribute or copy without permission.
149
Difference in Means Test
Using Data Set #2, test the hypothesis (at a 5% level of significance) that the average
cost of unleaded gas (in terms of purchasing power) is more expensive today than in the
past.
1. State the hypotheses
3. Find appropriate critical value
H0 : 1  3
H0 : 1  3  0
Ha : 1  3
Ha : 1  3  0
2. Calculate sample statistics and test statistic
N 1  108, N 3  112
x 1  0.116, x 3  0.082
s 1  0.021, s 3  0.009
X1 bar
Sx1
N1
Difference in Means Test
0.116 Stdev(X1 bar - X2 bar)
0.021 Test statistic (distributed t)
108 df
0.082
Sx2
0.009
© Copyright 2003. Do not distribute or copy without permission.
144
50.00%
Pr(t < Test statistic)
50.00%
Pr(t > Critical value)
5.00%
Critical Value
X2 bar
N2
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
1.6555
0.002
15.508
143.91
112
150
Difference in Means Test
Using Data Set #2, test the hypothesis (at a 5% level of significance) that the average
cost of unleaded gas (in terms of purchasing power) is more expensive today than in the
past.
4. Compare test statistic to critical value
H0 : 1  3
Ha : 1  3
15.508
Test statistic falls in the alternative tail  reject null hypothesis.
© Copyright 2003. Do not distribute or copy without permission.
151
Difference in Means Test
You work for a printing firm. In the past, the firm employed people to maintain the high
speed copier. Six months ago, in an effort to reduce costs, management laid off the
maintenance crew and contracted out service of the machine. You are looking at
maintenance logs for the copier and note the following times between copier
breakdowns. Test the hypothesis (at the 5% level) that the contracted maintenance does
as good of a job as the in-house maintenance did.
In-House Maintenance
26, 27, 22, 13, 8, 10, 28, 7, 16, 23, 26, 25
Contracted Maintenance
17, 13, 21, 17, 8, 6, 27, 6, 2, 20, 8, 9
Convert data to logs:
Data represents time, so we can consider using
the lognormal distribution. Note: This is an
analysis of the sample means. The Central Limit
Theorem tells us that sample means are
(asymptotically) t-distributed regardless of the
distribution of the underlying data. So, while
taking logs will improve accuracy, it is not
necessary (and becomes less necessary the
larger the data set).
In-House Maintenance
3.26, 3.30, 3.09, 2.56, 2.08, 2.30, 3.33, 1.95, 2.77, 3.14, 3.26, 3.22
Contracted Maintenance
2.83. 2.56. 3.04. 2.83. 2.08. 1.79. 3.30. 1.79. 0.69. 3.00. 2.08. 2.20
© Copyright 2003. Do not distribute or copy without permission.
152
Difference in Means Test
You work for a printing firm. In the past, the firm employed people to maintain the high
speed copier. Six months ago, in an effort to reduce costs, management laid off the
maintenance crew and contracted out service of the machine. You are looking at
maintenance logs for the copier and note the following times between copier
breakdowns. Test the hypothesis (at the 5% level) that the contracted maintenance does
as good of a job as the in-house maintenance did.
ln(In-House Maintenance)
3.26, 3.30, 3.09, 2.56, 2.08, 2.30, 3.33, 1.95, 2.77, 3.14, 3.26, 3.22
ln(Contracted Maintenance)
2.83. 2.56. 3.04. 2.83. 2.08. 1.79. 3.30. 1.79. 0.69. 3.00. 2.08. 2.20
1. State the hypotheses
2. Calculate sample statistics and test statistic
N ln(in-house)  12, N ln(contracted)  12
H0 : in-house  contracted
x ln(in-house)  2.85, x ln(contracted)  2.35
Ha : in-house  contracted
X1 bar
Sx1
N1
s ln(in-house)  0.508, s ln(contracted)  0.729
Difference in Means Test
2.855 Stdev(X1 bar - X2 bar)
0.508 Test statistic (distributed t)
12 df
X2 bar
2.350
Sx2
0.729
N2
© Copyright 2003. Do not distribute or copy without permission.
0.256
1.967
19.64
12
153
Difference in Means Test
You work for a printing firm. In the past, the firm employed people to maintain the high
speed copier. Six months ago, in an effort to reduce costs, management laid off the
maintenance crew and contracted out service of the machine. You are looking at
maintenance logs for the copier and note the following times between copier
breakdowns. Test the hypothesis (at the 5% level) that the contracted maintenance does
as good of a job as the in-house maintenance did.
H0 : in-house  contracted  in-house  contracted  0
Ha : in-house  contracted  in-house  contracted  0
2.5%
N ln(in-house)  12, N ln(contracted)  12
2.5%
x ln(in-house)  2.85, x ln(contracted)  2.35
s ln(in-house)  0.508, s ln(contracted)  0.729
Test statistic  1.967, df  19
3. Find appropriate critical values
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
19
t Distribution
Pr(t < Test statistic)
Pr(t > Critical value)
97.50%
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
Critical Value
-2.0930
Pr(t < Test statistic)
Pr(t > Critical value)
Critical Value
© Copyright 2003. Do not distribute or copy without permission.
19
2.50%
2.0930
154
Difference in Means Test
You work for a printing firm. In the past, the firm employed people to maintain the high
speed copier. Six months ago, in an effort to reduce costs, management laid off the
maintenance crew and contracted out service of the machine. You are looking at
maintenance logs for the copier and note the following times between copier
breakdowns. Test the hypothesis (at the 5% level) that the contracted maintenance does
as good of a job as the in-house maintenance did.
H0 : in-house  contracted  in-house  contracted  0
Ha : in-house  contracted  in-house  contracted  0
N ln(in-house)  12, N ln(contracted)  12
x ln(in-house)  2.85, x ln(contracted)  2.35
s ln(in-house)  0.508, s ln(contracted)  0.729
Test statistic  1.967, df  19
3. Find appropriate critical value
4. Compare test statistic to critical value
1.967
Test statistic falls in the null area  fail to reject null hypothesis.
© Copyright 2003. Do not distribute or copy without permission.
155
Difference in Means Test
You work for a printing firm. In the past, the firm employed people to maintain the high
speed copier. Six months ago, in an effort to reduce costs, management laid off the
maintenance crew and contracted out service of the machine. You are looking at
maintenance logs for the copier and note the following times between copier
breakdowns. Test the hypothesis (at the 10% level) that the contracted maintenance
does as good of a job as the in-house maintenance did.
H0 : in-house  contracted  in-house  contracted  0
Ha : in-house  contracted  in-house  contracted  0
N ln(in-house)  12, N ln(contracted)  12
x ln(in-house)  2.85, x ln(contracted)  2.35
s ln(in-house)  0.508, s ln(contracted)  0.729
Test statistic  1.967, df  19
1.967
Test the hypothesis at the 10% significance level.
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
t Distribution
19
Pr(t < Test statistic)
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
Pr(t < Test statistic)
Pr(t > Critical value)
95.00%
Pr(t > Critical value)
Critical Value
-1.7291
Critical Value
© Copyright 2003. Do not distribute or copy without permission.
19
5.00%
1.7291
Test statistic falls in the alternative area
 Reject the null hypothesis.
156
Difference in Means Test
You work for a printing firm. In the past, the firm employed people to maintain the high
speed copier. Six months ago, in an effort to reduce costs, management laid off the
maintenance crew and contracted out service of the machine. You are looking at
maintenance logs for the copier and note the following times between copier
breakdowns. Test the hypothesis that the contracted maintenance does as good of a job
as the in-house maintenance did.
Conclusion:
1.
We reject the hypothesis that the contracted maintenance does as good a job as
the in-house maintenance at a 10% level of significance.
2.
We fail to reject the hypothesis that the contracted maintenance does as good a
job as the in-house maintenance at a 5% level of significance.
3.
p-value = (3.20%)(2) = 6.4%  Probability of rejecting the null when the null is
true.
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
Pr(t < Test statistic)
© Copyright 2003. Do not distribute or copy without permission.
1.967
19
3.20%
96.80%
Multiply the p-value by two
because this is a two-tailed test
 an equal portion of the
alternative tail exists on the
opposite side of the distribution.
157
Difference in Means Test
A plaintiff claims that an on-the-job injury has reduced his ability to earn tips. He is suing
for lost future income. His tips for twelve weeks before and after the injury are shown
below. Test the hypothesis that his injury reduced his earning power.
Before injury
200, 210, 250, 180, 220, 200, 210, 230, 240, 190, 220, 250
Presumption of innocence
After injury
200, 230, 190, 180, 200, 190, 210, 200, 220, 200, 180, 220
H0 : before  after  before  after  0
Ha : before  after  before  after  0
N before  12, N after  12
x before  216.67, x after  201.67
s before  22.697, s after  15.859
Test statistic  1.877
df  19
1.877
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
Pr(t < Test statistic)
© Copyright 2003. Do not distribute or copy without permission.
1.877
19
3.80%
96.20%
158
Distribution of a Difference in Proportions
A difference in proportions test examines samples from two populations in an attempt to
compare the two population proportions.
Let pa  pb be a difference in sample proportions.
Population proportion   a   b
Sample standard deviation  s pa  pb 
pa (1  pa ) p b (1  p b )

Na
Nb
Distribution  standard normal provided
N a pa  5, N a (1  pa )  5, N b p b  5, and N b (1  p b )  5
© Copyright 2003. Do not distribute or copy without permission.
159
Difference in Proportions Test
An ABC News poll (summer 2003) of 551 women and 478 men shows that 31% of men
and 36% of women would rather see Hillary Clinton as President in 2004 than George
Bush.
Test the hypothesis that the two proportions are equal.
H0 :  men   women  0
Ha :  men   women  0
N men  478, N women  551
pmen  0.31, p women  0.36
sp
men  p women

(0.31)(1  0.31) (0.36)(1  0.36)

 0.029
478
551
N men p men  148.2  5
N men (1  p men )  329.8  5
N women p women  198.4  5
N women (1  p women )  352.6  5
© Copyright 2003. Do not distribute or copy without permission.
Difference in sample proportions is
normally distributed.
160
Difference in Proportions Test
An ABC News poll (summer 2003) of 551 women and 478 men shows that 31% of men
and 36% of women would rather see Hillary Clinton as President in 2004 than George
Bush.
H0 :  men   women  0
Ha :  men   women  0
p1
N1
Difference in Proportions Test
0.310 Stdev(p1 - p2)
478 Test statistic (distributed stnd norm)
p2
0.360
N2
551
Standard Normal Distribution (Z)
Test statistic
(1.699)
Pr(Z > Test statistic)
95.53%
Pr(Z < Test statistic)
4.47%
Pr(Z > Critical value)
Critical Value
© Copyright 2003. Do not distribute or copy without permission.
0.029
0.00%
#NUM!
(1.699)
p-value = (4.47%)(2) = 8.94%
Probability of rejecting the null hypothesis
when the null is true = 9%.
161
Finite Population Correction Factor
So far, we have assumed that the population of data is infinite. For example, in the case
of bond yield data, the population of data representing the return on IBM bonds is all the
returns that ever were, ever will be, or ever could have been.
There are some instances in which the population data is not only finite, but small in
comparison to the sample size.
In these instances, the sample data reflects more information than normal because it
represents, not a sample from an infinitely sized population, but a significant portion of
the entire population.
© Copyright 2003. Do not distribute or copy without permission.
162
Finite Population Correction Factor
For example, suppose we want to construct a 95% confidence interval for the average
price of retail gas in Pittsburgh. There are 500 gas stations and we have the following
sample data:
$1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.
The mean for the sample is $1.12
The standard deviation is $0.06
The question is about the mean of the the price of gas. According to the Central Limit
Theorem, sample means are t-distributed regardless of the distributions of the underlying
data, therefore we can skip the lognormal transformation. The critical value for a 95%
confidence interval on a t11 distribution is 2.201.
The 95% confidence interval is:
 $0.06 
$1.12  (2.201) 
  [$1.08, $1.16]
 12 
© Copyright 2003. Do not distribute or copy without permission.
163
Finite Population Correction Factor
For example, suppose we want to construct a 95% confidence interval for the average
price of retail gas in Pittsburgh. There are 500 gas stations and we have the following
sample data:
$1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.
 $0.06 
$1.12  (2.201) 
  [$1.08, $1.16]
 12 
Now, suppose that we have the same sample, but that there are only 25 gas stations in
Pittsburgh. The 12 observations in our sample now constitute a large portion of the total
population. As such, the information we obtain from the sample should more clearly
reflect the population than it did when there were 500 gas stations in the population.
To account for this additional information, we adjust the standard deviation of the mean
by the finite population correction factor. The fpcf reduces the size of the standard
deviation of the mean to reflect the fact that the sample represents a large portion of the
total population.
© Copyright 2003. Do not distribute or copy without permission.
164
Finite Population Correction Factor
For example, suppose we want to construct a 95% confidence interval for the average
price of retail gas in Pittsburgh. There are 500 gas stations and we have the following
sample data:
$1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.
 $0.06 
$1.12  (2.201) 
  [$1.08, $1.16]
 12 
Correcting the standard deviation of the sample mean by the finite population correction
factor, we have:
N n 
Corrected s x  s x 

 N 1 
N  Population size
n  Sample size
© Copyright 2003. Do not distribute or copy without permission.
165
Finite Population Correction Factor
For example, suppose we want to construct a 95% confidence interval for the average
price of retail gas in Pittsburgh. There are 25 gas stations and we have the following
sample data:
$1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.
 25  12 $0.06 
$1.12  (2.201) 
  [$1.10, $1.14]
 25  1
12


Notes on the finite population correction factor:
1.
The correction does not apply to standard deviations of observations. The fpcf only
applies to standard deviations covered by the central limit theorem (including
standard deviations of means, standard deviations of proportions, standard
deviations of differences in means, and standard deviations of differences in
proportions).
2.
The correction becomes necessary only when the sample size exceeds 5% of the
population size.
© Copyright 2003. Do not distribute or copy without permission.
166
Distribution of Sample Variances
The analyses we have seen thus far all involve single observations or sample means.
Often, we will also want to conduct tests on variances.
Example:
Two paint companies both claim that their paints will resist peeling for an average of 10
years. You collect relevant durability data on both brands of paint.
Brand A
10, 12, 10, 9, 10, 11, 8, 12, 9, 9
Brand B
12, 6, 6, 1, 6, 17, 5, 17, 17, 13
Both samples have means of 10. But, the sample from brand A exhibits a standard
deviation of 1.3 compared to 5.9 for brand B.
 While both brands appear to have the same average performance, brand A has more
uniform product quality (i.e. lower variance).
© Copyright 2003. Do not distribute or copy without permission.
167
Distribution of Sample Variances
Let s be a sample standard deviation.
The properties of a sample standard deviation:
Population standard deviation  
(N  1)s 2

2
is distributed N2 1
© Copyright 2003. Do not distribute or copy without permission.
168
Variance Test
Example:
A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A
sample of 12 tires has a standard deviation of 18,000 miles. At a 5% level of significance,
test the hypothesis that the production process does not require adjustment.
H0 :   20,000
Ha :   20,000
(12  1)(18,0002 )
Test statistic 
 8.91
20,0002
2
Chi-Square Distribution (χ )
Test statistic
Degrees of Freedom
11
Pr(χ2 > Test statistic)
100.00%
Pr(χ2 < Test statistic)
0.00%
Pr(χ2 > Critical value)
Critical Value
© Copyright 2003. Do not distribute or copy without permission.
5.00%
19.675
112
Test statistic falls in null area
 fail to reject null hypothesis
169
Variance Test
Example:
A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A
sample of 12 tires has a standard deviation of 18,000 miles. At a 5% level of significance,
test the hypothesis that the production process does require adjustment.
H0 :   20,000
Ha :   20,000
(12  1)(18,0002 )
Test statistic 
 8.91
20,0002
2
Chi-Square Distribution (χ )
Test statistic
Degrees of Freedom
11
Pr(χ2 > Test statistic)
100.00%
Pr(χ2 < Test statistic)
0.00%
Pr(χ2 > Critical value)
Critical Value
© Copyright 2003. Do not distribute or copy without permission.
95.00%
4.575
112
Test statistic falls in null area
 fail to reject null hypothesis
170
Variance Test
Example:
A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A
sample of 12 tires has a standard deviation of 18,000 miles. At a 5% level of significance,
test the hypothesis that the production process does require adjustment.
H0 :   20,000
Ha :   20,000
H0 :   20,000
We have tested both sets of hypotheses and, in each case, failed
to reject the null hypothesis.
Ha :   20,000
Isn’t this contradictory because the two nulls are opposites?
No.
Remember: failing to reject the null (technically) leaves us with no conclusion.
Therefore, what happened is that we ran two tests and neither resulted in a
conclusion.
© Copyright 2003. Do not distribute or copy without permission.
171
Variance Test
Example:
A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A
sample of 12 tires has a standard deviation of 18,000 miles. What is the p-value for the
hypothesis that the production process does not require adjustment?
H0 :   20,000
Ha :   20,000
(12  1)(18,0002 )
Test statistic 
 8.91
20,0002
112
2
Chi-Square Distribution (χ )
Test statistic
8.910
Degrees of Freedom
11
Pr(χ2 > Test statistic)
63.02%
Pr(χ2 < Test statistic)
36.98%
2
Pr(χ > Critical value)
Critical Value
p-value is the area from the test statistic toward
the alternative area.
#NUM!
p-value is “the probability of erroneously rejecting
the null hypothesis.”
© Copyright 2003. Do not distribute or copy without permission.
172
Variance Test
Example:
A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A
sample of 12 tires has a standard deviation of 18,000 miles. What is the p-value for the
hypothesis that the production process does require adjustment?
H0 :   20,000
Ha :   20,000
(12  1)(18,0002 )
Test statistic 
 8.91
20,0002
112
2
Chi-Square Distribution (χ )
Test statistic
8.910
Degrees of Freedom
11
Pr(χ2 > Test statistic)
63.02%
Pr(χ2 < Test statistic)
36.98%
2
Pr(χ > Critical value)
Critical Value
p-value is the area from the test statistic toward
the alternative area.
#NUM!
p-value is “the probability of erroneously rejecting
the null hypothesis.”
© Copyright 2003. Do not distribute or copy without permission.
173
Variance Test
Example:
A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A
sample of 12 tires has a standard deviation of 18,000 miles. The production process
requires is faulty if the population standard deviation exceeds 20,000. The production
process is OK if the population standard deviation is less than 20,000.
H0 :   20,000
Ha :   20,000
H0 :   20,000
Ha :   20,000
There is a 63% chance that we would be wrong in believing that
the production process required adjustment.
There is a 37% chance that we would be wrong in believing that
the production process is OK.
Under most circumstances, we only regard probabilities below 5% as “unusual.”
Therefore, the sample data does not clearly refute either null hypothesis.
 The data tell us nothing.
© Copyright 2003. Do not distribute or copy without permission.
174
Variance Test
Example:
A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A
sample of 12 tires has a standard deviation of 18,000 miles. At a 5% level of significance,
test the hypothesis that the population standard deviation equals 20,000.
H0 :   20,000
Ha :   20,000
(12  1)(18,0002 )
Test statistic 
 8.91
2
20,000
2
Chi-Square Distribution (χ )
Test statistic
Degrees of Freedom
11
2
Pr(χ > Test statistic)
100.00%
2
Pr(χ < Test statistic)
0.00%
Pr(χ2 > Critical value)
Critical Value
112
97.50%
3.816
2
Chi-Square Distribution (χ )
Test statistic
Degrees of Freedom
11
2
Pr(χ > Test statistic)
100.00%
2
Pr(χ < Test statistic)
0.00%
Pr(χ2 > Critical value)
Critical Value
© Copyright 2003. Do not distribute or copy without permission.
Test statistic falls in null area
 Fail to reject the null hypothesis.
2.50%
21.920
175
Variance Test
Example:
Inspectors check chlorine levels in water at a processing facility several times each day.
The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to
maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die
of disease; too much chlorine and people die of poisoning). Over a two day period,
inspectors take the following readings. Test the hypothesis that the water is adequately
treated at the 1% significance level.
Chlorine samples (ppm)
3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
Note: Although the data is non-negative, for the analysis of the sample mean, it is not
necessary to perform the log-normal transformation. According to the Central
Limit Theorem, sample means are t-distributed regardless of the distribution of
the underlying data. Having said this, performing the log-transformation will not
hurt and may improve the accuracy of the results somewhat.
© Copyright 2003. Do not distribute or copy without permission.
176
Variance Test
Example:
Inspectors check chlorine levels in water at a processing facility several times each day.
The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to
maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die
of disease; too much chlorine and people die of poisoning). Over a two day period,
inspectors take the following readings. Test the hypothesis that the water is adequately
treated at the 1% significance level.
Chlorine samples (ppm)
3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
H0 :   3
Ha :   3
Sample mean of logs  1.156
Sample stdev of logs  0.195
1.156  ln(3)
Test statistic 
 0.923
0.195
10
© Copyright 2003. Do not distribute or copy without permission.
Test statistic falls in null area
 Fail to reject null hypothesis.
177
Variance Test
Example:
Inspectors check chlorine levels in water at a processing facility several times each day.
The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to
maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die
of disease; too much chlorine and people die of poisoning). Over a two day period,
inspectors take the following readings. Test the hypothesis that the water is adequately
treated at the 1% significance level.
Chlorine samples (ppm)
3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
Note: Although the data is non-negative, for the analysis of the sample variance, it is
not necessary to perform the log-normal transformation. This is because the
distributions we use for analyzing variances and standard deviations (the chisquare and F-distributions) account for the fact that sample variance is nonnegative.
© Copyright 2003. Do not distribute or copy without permission.
178
Variance Test
Example:
Test the hypothesis that the water is adequately treated at the 1% significance level.
Chlorine samples (ppm)
3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
H0 :   0.4
Ha :   0.4
Sample standard deviation  0.622
(10  1)(0.6222 )
Test statistic 
 21.76
0.4 2
2
Chi-Square Distribution (χ )
Test statistic
Degrees of Freedom
9
Pr(χ2 > Test statistic)
100.00%
Pr(χ2 < Test statistic)
0.00%
Pr(χ2 > Critical value)
Critical Value
1.00%
21.666
© Copyright 2003. Do not distribute or copy without permission.
Test statistic falls in alternative area
 Reject null hypothesis.
179
Variance Test
Example:
Inspectors check chlorine levels in water at a processing facility several times each day.
The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to
maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die
of disease; too much chlorine and people die of poisoning). Over a two day period,
inspectors take the following readings. Test the hypothesis that the water is adequately
treated at the 1% significance level.
Chlorine samples (ppm)
3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
The tests we conducted were at the 1% significance level. This means that there is a 1%
probability that we might draw a sample that caused us to erroneously reject the null
hypothesis.
Suppose we want to err on the side of caution wherein we would rather risk finding that
the water is not adequately treated when, in fact it is, than to risk finding that the water
is adequately treated when, in fact, it is not.
How should we adjust our significance level?
Increase significance level of the test  increases the probability of rejecting the null
when, in fact, the null is true.
© Copyright 2003. Do not distribute or copy without permission.
180
Variance Test
Example:
Inspectors check chlorine levels in water at a processing facility several times each day.
The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to
maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die
of disease; too much chlorine and people die of poisoning). Over a two day period,
inspectors take the following readings. Test the hypothesis that the water is adequately
treated at the 10% significance level.
Chlorine samples (ppm)
3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
H0 :   3
Ha :   3
Sample mean of logs  1.156
Sample stdev of logs  0.195
1.156  ln(3)
Test statistic 
 0.923
0.195
10
© Copyright 2003. Do not distribute or copy without permission.
Test statistic falls in null area
 Fail to reject null hypothesis.
181
Variance Test
Example:
Test the hypothesis that the water is adequately treated at the 10% significance level.
Chlorine samples (ppm)
3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
H0 :   0.4
Ha :   0.4
Sample standard deviation  0.622
(10  1)(0.6222 )
Test statistic 
 21.76
0.4 2
2
Chi-Square Distribution (χ )
Test statistic
Degrees of Freedom
9
Pr(χ2 > Test statistic)
100.00%
Pr(χ2 < Test statistic)
0.00%
Pr(χ2 > Critical value)
Critical Value
10.00%
14.684
© Copyright 2003. Do not distribute or copy without permission.
Test statistic falls in alternative area
 Reject null hypothesis.
182
Confidence Interval for a Variance
When we constructed confidence intervals for sample means and for observations, we
used the formula:
measure  (critical value)(stdev of measure)
This formula comes from the test statistic for normally (and t-) distributed random
variables. Note:
measure  upper limit
stdev
measure  lower limit
measure  (cv)(stdev)  lower limit   cv 
stdev
measure  (cv)(stdev)  upper limit  cv 
The formula for the critical value shown above (cv) is the same as the formula for the
test statistic.
test statistic 
estimate  parameter
stdev of estimate
Therefore, when we find a confidence interval, what we are really doing is:
1. Setting the test statistic equal to the critical value that gives us the desired level of
confidence, and
2. Solving for parameter.
© Copyright 2003. Do not distribute or copy without permission.
183
Confidence Interval for a Variance
Because the formula for the test statistic for a sample variance is different than the
formula for the test statistic for a sample mean, we would expect the formula for the
confidence interval to be different also.
(N  1) estimate2
Test statistic 
parameter 2
Setting the test statistic equal to the critical value that gives us the desired level of
confidence, and solving for parameter, we get:
(N  1) estimate2
parameter 
critical value
2
(N  1) estimate2
 parameter 
critical value
Note that we use only the positive root because standard deviations are non-negative.
© Copyright 2003. Do not distribute or copy without permission.
184
Confidence Interval for a Variance
Example:
A sample of 10 observations has a standard deviation of 3. Find the 95% confidence
interval for the population standard deviation.
To find a 95% confidence interval, we need the two critical values that give 2.5% in the
upper and lower tails.
2
2
Chi-Square Distribution (χ )
Test statistic
Degrees of Freedom
9
2
Pr(χ > Test statistic)
100.00%
2
Pr(χ < Test statistic)
0.00%
Chi-Square Distribution (χ )
Test statistic
Degrees of Freedom
9
2
Pr(χ > Test statistic)
100.00%
2
Pr(χ < Test statistic)
0.00%
Pr(χ2 > Critical value)
Critical Value
Pr(χ2 > Critical value)
Critical Value
97.50%
2.700
Upper limit 
(10  1) 32
 5.48
2.700
Lower limit 
(10  1) 32
 2.06
19.023
© Copyright 2003. Do not distribute or copy without permission.
2.50%
19.023
We find that there is a 95% probability that the
population standard deviation lies between 2.06
and 5.48.
185
Distribution of a Difference in Sample Variances
In the same way that we had different procedures for testing a sample mean versus
testing a difference in two sample means, we similarly have different procedures for
testing a sample variance versus testing a difference in two sample variances.
Where variances are concerned, however, we look not at the difference in the sample
variances, but at the ratio of the sample variances.
sa
be a ratio of two sample standard deviations.
sb
s
If s a  s b , then the ratio a will be greater than 1.
sb
s
If s a  s b , then the ratio a will be less than 1.
sb
s
If s a  s b , then the ratio a will equal 1.
sb
Let
© Copyright 2003. Do not distribute or copy without permission.
186
Distribution of a Difference in Sample Variances
Let s a and s b standard deviations taken from different populations.
The properties of a ratio of sample standard deviations:
Population ratio 
a
b
s a2
is distributed FN a 1, N b 1
s b2
© Copyright 2003. Do not distribute or copy without permission.
187
Difference in Variances Test
Example:
A recent consumer behavior study was designed to test the “beer goggles” effect. A
group of volunteers was shown pictures (head shots) of members of the opposite sex
and asked to rate the people in the pictures according to attractiveness. Another group of
volunteers was given two units of alcohol, shown the same pictures, and also asked to
rate the people in the pictures according to attractiveness.
Test the hypothesis that, when subjects consume alcohol, they (on average) find pictures
of the opposite sex more attractive.
The straightforward hypothesis test is a difference of means test where:
H0 : drunk  sober  0
Ha : drunk  sober  0
© Copyright 2003. Do not distribute or copy without permission.
188
Difference in Variances Test
Example:
A recent consumer behavior study was designed to test the “beer goggles” effect. A
group of volunteers was shown pictures (head shots) of members of the opposite sex
and asked to rate the people in the pictures according to attractiveness. Another group of
volunteers was given two units of alcohol, shown the same pictures, and also asked to
rate the people in the pictures according to attractiveness.
H0 : drunk  sober  0
Ha : drunk  sober  0
Suppose we collect data, run the appropriate tests and fail to reject the null hypothesis.
Can we conclude (roughly speaking) that, on average, drinking alcohol causes one to find
the opposite sex more attractive?
Yes. However, it may be the case that the alcohol only affects a subset of the population.
For example, perhaps only men are affected; or, perhaps only those who rarely drink are
affected.
The difference in means test does not detect these cases – it only detects differences in
the average of all subjects in the samples.
© Copyright 2003. Do not distribute or copy without permission.
189
Difference in Variances Test
Example:
A recent consumer behavior study was designed to test the “beer goggles” effect. A
group of volunteers was shown pictures (head shots) of members of the opposite sex
and asked to rate the people in the pictures according to attractiveness. Another group of
volunteers was given two units of alcohol, shown the same pictures, and also asked to
rate the people in the pictures according to attractiveness.
Consider the following two scenarios (calculate the means and stdevs for the data sets):
Scenario #1
Scenario #2
Sober
3, 2, 3, 1, 1, 3, 4, 2, 3, 4
Sober
3, 2, 3, 1, 1, 3, 4, 2, 3, 4
Drunk
4, 3, 4, 2, 2, 4, 5, 3, 4, 5
Everyone is affected.
Drunk
3, 2, 3, 1, 1, 5, 6, 4, 5, 6
Only males are affected.
Average rating for sober is 2.6 compared to
an average rating for drunk of 3.6.
Average rating for sober is 2.6 compared to an
average rating for drunk of 3.6.
Standard deviations for both sober and drunk
are 1.07 because all 10 subjects were
affected by the alcohol.
Standard deviation for sober is 1.07, but for drunk is
1.90 because only males (the last 5 observations)
were affected by the alcohol.
© Copyright 2003. Do not distribute or copy without permission.
190
Difference in Variances Test
Example:
A recent consumer behavior study was designed to test the “beer goggles” effect. A
group of volunteers was shown pictures (head shots) of members of the opposite sex
and asked to rate the people in the pictures according to attractiveness. Another group of
volunteers was given two units of alcohol, shown the same pictures, and also asked to
rate the people in the pictures according to attractiveness.
Consider the following two scenarios (calculate the means and stdevs for the data sets):
Implication:
A difference in means test would report the same result for scenarios #1 and #2
(population mean for drunk is greater than population mean for sober).
But, a difference in variances test would show that all of the subjects were affected by
the alcohol in scenario #1, while only some of the subjects were affected by the alcohol
in scenario #2.
© Copyright 2003. Do not distribute or copy without permission.
191
Difference in Variances Test
Example:
A recent consumer behavior study was designed to test the “beer goggles” effect. A
group of volunteers was shown pictures (head shots) of members of the opposite sex
and asked to rate the people in the pictures according to attractiveness. Another group of
volunteers was given two units of alcohol, shown the same pictures, and also asked to
rate the people in the pictures according to attractiveness.
Using the scenario #2 data, test the hypotheses (at the 10% significance level):
H0 :  drunk   sober
Ha :  drunk   sober
F Distribution
Test statistic
df in Numerator
df in Denominator
Pr(F > Test statistic)
Pr(F < Test statistic)
9
9
100.00%
0.00%
Pr(F > Critical value)
Critical Value
95.00%
0.315
F Distribution
Test statistic
df in Numerator
df in Denominator
Pr(F > Test statistic)
Pr(F < Test statistic)
Pr(F > Critical value)
Critical Value
© Copyright 2003. Do not distribute or copy without permission.
9
9
100.00%
0.00%
5.00%
3.179
192
Difference in Variances Test
Example:
A recent consumer behavior study was designed to test the “beer goggles” effect. A
group of volunteers was shown pictures (head shots) of members of the opposite sex
and asked to rate the people in the pictures according to attractiveness. Another group of
volunteers was given two units of alcohol, shown the same pictures, and also asked to
rate the people in the pictures according to attractiveness.
Using the scenario #2 data, test the hypotheses (at the 10% significance level):
H0 :  drunk   sober
Ha :  drunk   sober
N drunk  10, N sober  10
s drunk  1.90, s sober  1.07
2
s drunk
1.902
Test statistic  2

 3.12
s sober 1.072
3.12
Test statistic falls in the null area
 Fail to reject the null hypothesis.
© Copyright 2003. Do not distribute or copy without permission.
193
Difference in Variances Test
Using the scenario #2 data, test the hypotheses (at the 10% significance level):
In scenario #2, we know for certain that only
males were affected, so we should expect to
see a difference in the standard deviations
across the two samples (sober vs. drunk).
Why did we end up failing to reject the null
hypothesis?
2.98 3.12
 The result may be due to the small number
of observations in the samples.
What if we had only one more observation in
each sample, but our sample standard
deviations remained the same?
Sample stdevs don’t change, so the
test statistic doesn’t change.
F Distribution
Test statistic
df in Numerator
df in Denominator
Pr(F > Test statistic)
Pr(F < Test statistic)
Pr(F > Critical value)
Critical Value
© Copyright 2003. Do not distribute or copy without permission.
10
10
100.00%
0.00%
5.00%
2.978
One more observation in each
sample  df = 10
Critical value changes  now we
reject the null hypothesis.
194
Hypothesis Testing: Summary
Procedure for Hypothesis Testing
1. State hypotheses
2. Picture distribution*
3. Identify null and alternative regions
4. Calculate test statistic*
significance level approach
5. Find critical value(s) that define
alternative area(s) equal to the
significance level
6. If test statistic falls in alternative
area, reject null hypothesis. If test
statistic falls in null area, fail to
reject null hypothesis.
p-value approach
5. p-value = area from test statistic
toward alternative tail(s). p-value
is “prob of being wrong in
rejecting the null,” or “prob of
results being due to random
chance rather than due to null.”
*procedure varies depending on the type of test being performed
© Copyright 2003. Do not distribute or copy without permission.
195
Hypothesis Testing: Summary
Hypothesis Test
Mean
Difference in means
Proportion
Difference in proportions
Variance
Difference in variances
© Copyright 2003. Do not distribute or copy without permission.
Test Statistic
x 
sx
(x 1  x 2 )  ( 1  2 )
sx
Distribution
t N 1
t N , where N 
s
2
x1
 s x22
p 
p
Standard Normal provided
Np  5, N (1  p )  5
( p1  p 2 )  ( 1   2 )
Standard Normal provided
N 1 p1  5, N 2 p 2  5,

1  2
(N  1)s 2

2
s 12
s 22
2
s x4
s x4

N1  1 N 2  1
1
1 x 2

2
N 1 (1  p1 )  5, N 2 (1  p 2 )  5
 N2 1
FN1 1, N 2 1
196
Causal vs. Exploratory Analysis
The goal of exploratory analysis is to obtain a measure of a phenomenon.
Example:
Subjects are given a new breakfast cereal to taste and asked to rate the cereal.
The measured phenomenon is taste. Although taste is subjective, by taking the average
of the measures from a large number of subjects, we can measure the underlying
objective components that give rise to the subjective feeling of taste.
© Copyright 2003. Do not distribute or copy without permission.
197
Causal vs. Exploratory Analysis
The goal of causal analysis is to obtain the change in measure of a phenomenon due to the
presence vs. absence of a control variable.
Example:
Two groups of subjects are given the same breakfast cereal to taste and are asked to rate the
cereal. One group is given the cereal in a black and white box. The other in a multi-colored
box.
The two groups of subjects exist under identical conditions (same cereal, same testing
environment, etc.), with the exception of the color of the cereal box. Because the color of the
cereal box is the only difference between the two groups, we call the color of the box the
control variable. If we find a difference in subjects’ reported tastes, then we know that the
difference in perceived taste is due to the color (or lack of color) of the cereal box.
It is possible that, apart from random chance, one group of subjects reports liking the cereal
and the other does not (e.g. one group was tested in the morning and the other in the
evening). We would call this a confound. A confound is the presence of an additional (and
unwanted) difference in the two groups. When a confound is present, it makes it difficult
(perhaps impossible) to determine how much of the difference in reported taste between the
two groups is due to the control and how much is due to the confound.
© Copyright 2003. Do not distribute or copy without permission.
198
Causal vs. Exploratory Analysis
Because the techniques for causal and exploratory analysis are identical (with the
exception that causal analysis includes the use of a control variable whereas exploratory
analysis does not), we will limit our discussion to causal analysis.
© Copyright 2003. Do not distribute or copy without permission.
199
Designing Survey Instruments
The Likert Scale
We use the Likert scale to rate responses to qualitative questions.
Example:
“Which of the following best describes your opinion of the taste of Coke?”
Too Sweet
1
Very Sweet
2
Just Right
3
Slightly Sweet
4
Not Sweet
5
The Likert scale elicits more information than a simple “Yes/No” response  the analyst
can gauge the degree rather than simply the direction of opinion.
© Copyright 2003. Do not distribute or copy without permission.
200
Designing Survey Instruments
Rules for Using the Likert Scale
1.
2.
3.
4.
Use 5 or 7 gradations of response.
 fewer than 5 yields too little information
 more than 7 creates too much difficulty for respondents in distinguishing one
response from another
Always include a mid-point (or neutral) response.
When appropriate, include a separate response for “Not applicable,” or “Don’t
know.”
When possible, include a descriptor with each response rather than simply a single
descriptor on each end of the scale.
Example:
Yes
Very Bad
1
No
Very Bad
1
Bad
2
2
Neutral
3
3
Good
4
Very Good
5
4
Good
5
The presence of the lone words at the ends of the scale will introduce a bias by
causing subjects to shun the center of the scale.
© Copyright 2003. Do not distribute or copy without permission.
201
Designing Survey Instruments
Rules for Using the Likert Scale
5.
Use the same words and (where possible) the same number of words for each
descriptor.
Example:
Yes
Very Bad
1
Bad
2
Neutral
3
Good
4
Very Good
5
No
Bad
1
Poor
2
OK
3
Better
4
Best
5
When using different words for different descriptors, subjects may perceive varying
quantities of difference between points on the scale.
For example, subjects may perceive that the difference between “Bad” and “Poor”
is less than the difference between “Poor” and “OK.”
© Copyright 2003. Do not distribute or copy without permission.
202
Designing Survey Instruments
Rules for Using the Likert Scale
6.
Avoid using zero as an endpoint on the scale.
Example:
Yes
Very Bad
1
Bad
2
Neutral
3
Good
4
Very Good
5
No
Very Bad
0
Bad
1
Neutral
2
Good
3
Very Good
4
On average, subjects will associate the number zero with “bad.” Thus, using zero at
the endpoint of the scale can bias subjects away from the side of the scale with the
zero.
© Copyright 2003. Do not distribute or copy without permission.
203
Designing Survey Instruments
Rules for Using the Likert Scale
7.
Avoid using unbalanced negative numbers.
Example:
Yes
Very Bad
-2
Bad
-1
Neutral
0
Good
1
Very Good
2
No
Very Bad
-3
Bad
-2
Neutral
-1
Good
0
Very Good
1
Subjects associate negative numbers with “bad.” If you have more negative
numbers on one side of the scale than the other, subjects will be biased away from
that side of the scale.
© Copyright 2003. Do not distribute or copy without permission.
204
Designing Survey Instruments
Rules for Using the Likert Scale
8.
Keep the descriptors balanced.
Example:
Yes
Very Bad
1
Bad
2
Neutral
3
Good
4
Very Good
5
No
Very Bad
1
Bad
2
Slightly Good
3
Good
4
Very Good
5
Subjects will be biased toward the side with more descriptors.
© Copyright 2003. Do not distribute or copy without permission.
205
Designing Survey Instruments
Rules for Using the Likert Scale
9.
Arrange the scale so as to maintain (1) symmetry around the neutral point, and (2)
consistency in the intervals between points.
Example:
Yes
Very Bad
1
No
Very Bad
1
No
Very Bad
1
Bad
2
Neutral
3
Good
4
Very Good
5
Bad Neutral Good Very Good
2
3
4
5
Bad
2
Neutral Good
3
4
Very Good
5
In the second example, subjects perceive the difference between “Neutral” and
“Very Bad” to be greater than the difference between “Neutral” and “Very Good.”
Responses will be biased toward the right side of the scale.
In the third example, subjects perceive the difference between “Very Bad” and
“Bad” to be greater than the difference between “Bad” and “Neutral.”
Responses will be biased toward the center of the scale.
© Copyright 2003. Do not distribute or copy without permission.
206
Designing Survey Instruments
Rules for Using the Likert Scale
10. Use multi-item scales for ill-defined constructs.
Example:
“I liked the product.”
Strongly Agree
Agree
1
2
Yes
No
Neutral
3
Disagree
4
Strongly Disagree
5
“I am satisfied with the product.”
Strongly Agree
Agree
Neutral
1
2
3
Disagree
4
Strongly Disagree
5
“I believe that this is a good product.”
Strongly Agree
Agree
Neutral
1
2
3
Disagree
4
Strongly Disagree
5
“I liked the product.”
Strongly Agree
Agree
1
2
Disagree
4
Strongly Disagree
5
© Copyright 2003. Do not distribute or copy without permission.
Neutral
3
207
Designing Survey Instruments
Rules for Using the Likert Scale
10. Use multi-item scales for ill-defined constructs.
Ill-defined constructs may be interpreted differently by different people. Use the
multi-item scale (usually three items) and then average the items to obtain a single
response for the ill-defined construct.
Example:
The ill-defined construct is Product satisfaction
We construct three questions, each of which touch of the idea of product
satisfaction. A subject gives the following responses:
“I liked the product.”
“I am satisfied with the product.”
“I believe that this is a good product.”
4
4
3
Average response for Product satisfaction is 3.67
© Copyright 2003. Do not distribute or copy without permission.
208
Designing Survey Instruments
Rules for Using the Likert Scale
10. Use multi-item scales for ill-defined constructs.
Be careful that the multi-item scales all measure the same ill-defined construct.
Yes
“I liked the product.”
“I am satisfied with the product.”
“I believe that this is a good product.”
No
“I liked the product.”
“I am satisfied with the product.”
“I will purchase the product.”
The statement “I will purchase the product” includes the consideration of “price”
which the other two questions do not.
© Copyright 2003. Do not distribute or copy without permission.
209
Designing Survey Instruments
Rules for Using the Likert Scale
11. Occasionally, it is useful to verify that the subjects are giving considered (as
opposed to random) answers. To do this, ask the same question more than once at
different points in the survey. Look at the variance of the responses across the
multiple instances of the question. If the subject is giving considered answers, the
variance should be small.
© Copyright 2003. Do not distribute or copy without permission.
210
Designing Survey Instruments
Rules for Using the Likert Scale
12. Avoid self-referential questions.
Yes
“How do you perceive that others around you feel right now?”
No
“How do you feel right now?”
Self-referential questions elicit bias because they encourage the respondent to
answer subsequent questions consistently with the self-referential question.
Example:
If we ask the subject how he feels and he responds positively, then his subsequent
answers will be biased in a positive direction. The subject will, unconsciously,
attempt to behave consistently with his reported feelings.
Exception:
You can ask a self-referential question if it is the last question in the survey. As long
as the subject does not go back and change previous answers, there is no
opportunity for the self-reference to bias the subject’s responses.
© Copyright 2003. Do not distribute or copy without permission.
211
Designing Survey Instruments
Example:
We want to test the effect of relevant news on purchase decisions. Specifically, we
want to know if the presence of positive news about a low-cost product increases the
probability of consumers purchasing that product.
Causal Design:
We will expose two subjects to news announcements about aspirin. The control group
will see a neutral announcement that says nothing about the performance of aspirin.
The experimental group will see a positive announcement that says that aspirin has
positive health benefits.
After exposure to the announcements, we will ask each group to rate their attitudes
toward aspirin. Our hypothesis is that there is no difference in the average attitudes
toward aspirin between the two groups.
To account for possible preconceptions about aspirin, before we show the subjects
the news announcements, we will ask how frequently they take aspirin. To account
for possible gender effects, we will also ask subjects to report their genders.
© Copyright 2003. Do not distribute or copy without permission.
212
Designing Survey Instruments
How often do you take aspirin?
Infrequently
1
2
Occasionally
3
4
5
Frequently
6
7
Please identify your gender (M/F).
All subjects are first asked to respond to
these questions.
© Copyright 2003. Do not distribute or copy without permission.
213
Designing Survey Instruments
Subjects in the control
group see this news
announcement. The
analyst reads the
headline and the
introductory paragraph.
Subjects in the control group
are then asked to answer this
question.
Please rate your attitude toward aspirin.
Unfavorable
1
© Copyright 2003. Do not distribute or copy without permission.
2
Neutral
3
4
Favorable
5
6
7
214
Designing Survey Instruments
Subjects in the
experimental group see
this news announcement.
The analyst reads the
headline and the
introductory paragraph.
Subjects in the experimental
group are then asked to
answer this question.
Please rate your attitude toward aspirin.
Unfavorable
1
© Copyright 2003. Do not distribute or copy without permission.
2
Neutral
3
4
Favorable
5
6
7
215
Designing Survey Instruments
Results:
Results for an actual experiment are shown below.
Test the following hypotheses:
The data is in Data Set #3.
H0 :  control  baseline
Attitude
7
4
5
5
4
6
7
4
5
1
5
5
5
3
2
5
5
4
6
4
6
4
5
Use
1
2
3
3
4
4
1
4
4
2
4
2
3
1
1
2
2
1
3
1
2
1
4
Gender (1=male, 0=female)
1
1
1
0
0
0
1
1
0
0
1
0
1
1
1
0
0
1
1
0
1
0
1
© Copyright 2003. Do not distribute or copy without permission.
Group (1=control, 0=baseline)
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
Ha :  control  baseline
H0 :  control   baseline
Ha :  control   baseline
Rejecting the null in the first
set of hypotheses would
indicate that the news did have
an impact on subjects’ attitudes
toward aspirin.
Rejecting the null in the second
set of hypotheses would
indicate that the news had an
impact on the degree of
disparity in subjects’ attitudes
toward aspirin.
216
Designing Survey Instruments
Note: The survey responses are non-negative (the lowest possible response is 1). This
may suggest that a log-normal transformation is appropriate. However, we are
testing the mean of the observations, therefore, by the Central Limit Theorem,
do not need to perform the log-normal transformation.
© Copyright 2003. Do not distribute or copy without permission.
217
Designing Survey Instruments
Test the following hypotheses:
H0 :  experimental   control
Ha :  experimental   control
x control  4.82
s control  1.66
N control  11
x experimental  4.50
s experimental  1.17
X1 bar
Difference in Means Test
4.820 Stdev(X1 bar - X2 bar)
Sx1
N1
1.660 Test statistic (distributed t)
11 df
X2 bar
4.500
Sx2
1.170
N2
N experimental  12
p-value = (30.13%)(2) = 60.26%
0.604
0.530
17.82
12
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
0.530
18
30.13%
Pr(t < Test statistic)
69.87%
There is a 60% chance that we would be incorrect
in believing that the news altered the subjects’
average attitude toward aspirin.
© Copyright 2003. Do not distribute or copy without permission.
218
Designing Survey Instruments
Test the following hypotheses:
H0 :  control   baseline
Ha :  control   baseline
s control  1.66
N control  11
s experimental  1.17
N experimental  12
2
s control
1.662
Test statistic  2

 2.01
2
s baseline 1.17
F Distribution
Test statistic
df in Numerator
df in Denominator
Pr(F > Test statistic)
Pr(F < Test statistic)
2.010
10
11
13.38%
86.62%
p-value = (13.38%)(2) = 26.76%
There is a 27% chance that we would be
incorrect in believing that the news altered
the disparity in subjects’ attitudes toward
aspirin.
Conclusion
In market research, we typically use 10%
as the cut-off for determining “significance”
of results.
Advertising had no significant effect on the
average attitude toward aspirin nor on the
disparity of attitudes toward aspirin.
© Copyright 2003. Do not distribute or copy without permission.
219
Designing Survey Instruments
The results appear to indicate that the news announcement had no effect at all on the
subjects. It is possible that the news announcement does not affect people who do not
take aspirin.
Let us filter the data set, removing all subjects who report that they infrequently use
aspirin. Our filtered data set will include only those subjects who responded with at least 2
to the question regarding frequency of use.
Filtered data set
© Copyright 2003. Do not distribute or copy without permission.
Attitude
Use
Gender (1=male, 0=female)
Group (1=control, 0=baseline)
4
5
5
4
6
2
3
3
4
4
1
1
0
0
0
0
0
0
0
0
4
5
1
5
5
5
4
4
2
4
2
3
1
0
0
1
0
1
0
0
0
0
1
1
5
5
2
2
0
0
1
1
6
3
1
1
6
2
1
1
5
4
1
1
220
Designing Survey Instruments
Test the following hypotheses:
H0 :  experimental   control
Ha :  experimental   control
x control  4.33
s control  1.41
N control  9
x experimental  5.33
X1 bar
Sx1
Difference in Means Test
4.330 Stdev(X1 bar - X2 bar)
1.410 Test statistic (distributed t)
N1
9 df
X2 bar
5.330
Sx2
0.520
N2
s experimental  0.52
N experimental  7
p-value = (3.90%)(2) = 7.8%
0.509
(1.963)
10.61
7
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
Pr(t < Test statistic)
(1.963)
10
96.10%
3.90%
Pr(t > Critical value)
Critical Value
###########
There is a 8% chance that we would be incorrect in
believing that the news altered the subjects’ average
attitude toward aspirin.
© Copyright 2003. Do not distribute or copy without permission.
221
Designing Survey Instruments
Test the following hypotheses:
H0 :  experimental   control
Ha :  experimental   control
s control  1.41
N control  9
s experimental  0.52
N experimental  7
Test statistic 
2
s control
2
s experimental
1.412

 7.35
0.522
F Distribution
Test statistic
df in Numerator
df in Denominator
Pr(F > Test statistic)
Pr(F < Test statistic)
7.350
8
6
1.28%
98.72%
Pr(F > Critical value)
Critical Value
0.00%
##########
© Copyright 2003. Do not distribute or copy without permission.
p-value = (1.28%)(2) = 3.56%
There is a 4% chance that we would be incorrect
in believing that the news altered the disparity in
subjects’ attitudes toward aspirin.
222
Designing Survey Instruments
The results using the filtered data appear to indicate that, for subjects who report using
aspirin more than “infrequently”:
1.
The news announcement significantly changed (increased) subjects’ average
attitude toward aspirin.
2.
The news announcement significantly changed (decreased) the disparity in
subjects attitudes toward aspirin.
The increase in subjects’ attitudes toward aspirin is what the aspirin manufacturer would
hope for.
The decrease in disparity of attitudes is an added bonus. This can be interpreted as a
reduction in the “uncertainty” of the benefit of aspirin.
© Copyright 2003. Do not distribute or copy without permission.
223
A Look Back…
Thus far, we have learned the following statistical techniques…
Calculating probabilities using
Marginal probability
Joint probability
Disjoint probability
Conditional probability
Bayes’ theorem
Estimating probabilities for
Binomial processes
Hypergeometric processes
Poisson processes
Constructing confidence intervals for
Single observations
Population means
Population proportions
Population variances
Conducting hypothesis tests for
Population mean
Population proportion
Population variance
Difference in two population means
Difference in two population proportions
Difference in two population variances
© Copyright 2003. Do not distribute or copy without permission.
224
Regression Analysis
In regression analysis, we look at how one variable (or a group of variables) can affect
another variable.
We use a technique called “ordinary least squares” or OLS. The OLS technique looks at a
sample of two (or more) variables and filters out random noise so as to find the underlying
deterministic relationship among the variables.
Example:
A retailer suspects that monthly sales follow unemployment rate announcements with a onemonth lag. When the Bureau of Labor Statistics announces that the unemployment rate is up,
one month later, sales appear to fall. When the BLS announces that the unemployment rate
is down, one month later, sales appear to rise.
The retailer wants to know if this relationship actually exists. If so, the retailer can use BLS
announcements to help predict future sales.
In linear regression analysis, we assume that the relationship between the two variables (in
this example, sales and unemployment rate) is linear and that any deviation from the linear
relationship must be due to noise (i.e. unaccounted randomness in the data).
© Copyright 2003. Do not distribute or copy without permission.
225
Regression Analysis
Example:
The chart below shows data (see Data Set #4) on sales and the unemployment rate collected
over a 10 month period.
Date
Montly Sales
(current month) (current month)
January
$257,151
February
$219,202
March
$222,187
April
$267,041
May
$265,577
June
$192,566
July
$197,655
August
$200,370
September
$203,730
October
$181,303
Unemployment Rate
(previous month)
4.5%
4.7%
4.6%
4.4%
4.8%
4.9%
5.0%
4.9%
4.7%
4.8%
Notice that the relationship (if there is one)
between the unemployment rate and sales is
subject to some randomness.
Over some months (e.g. May to June), an increase
in the previous month’s unemployment rate
corresponds to a decrease in the current month’s
sales.
But, over other months (e.g. June to July), an
increase in the previous month’s unemployment
rate corresponds to an increase in the current
month’s sales.
© Copyright 2003. Do not distribute or copy without permission.
226
Regression Analysis
Example:
It is easier to picture the relationship between unemployment and sales if we graph the data.
Since we are hypothesizing that changes in the unemployment rate cause changes in sales,
we put unemployment on the horizontal axis and sales on the vertical axis.
$280,000
Sales (current month)
$260,000
$240,000
$220,000
$200,000
$180,000
$160,000
4.3%
4.4%
4.5%
4.6%
4.7%
4.8%
4.9%
5.0%
5.1%
Unemployment Rate (previous month)
© Copyright 2003. Do not distribute or copy without permission.
227
Regression Analysis
Example:
OLS finds the line that most closely fits the data. Because we have assumed that the
relationship is linear, two numbers describe the relationship: (1) the slope, and (2) the
vertical intercept.
$280,000
Sales (current month)
$260,000
$240,000
$220,000
$200,000
$180,000
^
y = -11,648,868x + 771,670
Sales  771,670  11,648, 868 (unemp rate)
$160,000
4.3%
4.4%
4.5%
4.6%
4.7%
4.8%
4.9%
5.0%
5.1%
Unemployment Rate (previous month)
Slope = –11,648,868
Vertical intercept = 771,670
© Copyright 2003. Do not distribute or copy without permission.
228
Regression Analysis
The graph below shows two relationships:
1.
The regression model is the scattering of dots and represents the actual data.
2.
The estimated (or fitted) regression model is the line and represents the regression
model after random noise has been removed.
After eliminating noise, we estimate that sales should
have been 771,670 – (11,648,868)(0.045) = $247,471
Regression model
Salest     (unemp ratet 1 )  ut
…is observed with sales of $257,151
$280,000
True intercept and slope
$260,000
Estimated noise associated with this observation
^
Sales (current month)
 Sales  Sales  $257,151  $247, 471  $9, 680  uˆt
Noise (also called “error term”)
$240,000
Estimated regression model
$220,000
^
Salest  ˆ  ˆ(unemp ratet 1 )
$200,000
Estimated intercept, slope, and sales
after estimating and removing noise
$180,000
^
y = -11,648,868x + 771,670
Sales  771,670  11,648, 868 (unemp rate)
$160,000
4.3%
4.4%
4.5%
4.6%
4.7%
4.8%
4.9%
5.0%
5.1%
Unemployment Rate (previous month)
Unemp rate of 4.5%…
© Copyright 2003. Do not distribute or copy without permission.
229
Regression Analysis
Terminology:
Variables on the right hand side of the regression equation are called exogenous, or
explanatory, or independent variables. They usually represent variables that are assumed to
influence the left hand side variable.
The variable on the left hand side of the regression equation is called the endogenous, or
outcome, or dependent variable. The dependent variable is the variable whose behavior you
are interested in analyzing.
The intercept and slopes of the regression model are called parameters. The intercept and
slopes of the estimated (or fitted) regression model are called estimated parameters.
The noise term in the regression model is called the error or noise. The estimated error is
called the residual, or estimated error.
Regression model
Fitted (estimated) model
Y    X u
Yˆ  ˆ  ˆX
Outcome variable
Fitted (estimated)
Error (noise)
outcome variable
Parameters
© Copyright 2003. Do not distribute or copy without permission.
uˆ  Y Yˆ
Explanatory variable
Residual (estimated error)
Parameter estimates
230
Regression Analysis
OLS estimates the regression model parameters by selecting parameter values that minimize
the variance of the residuals.
= Residual  difference between actual and fitted values of the outcome variable.
$280,000
Sales (current month)
$260,000
$240,000
$220,000
$200,000
$180,000
y = -11,648,868x + 771,670
$160,000
4.3%
4.4%
4.5%
4.6%
4.7%
4.8%
4.9%
5.0%
5.1%
Unemployment Rate (previous month)
© Copyright 2003. Do not distribute or copy without permission.
231
Regression Analysis
OLS estimates the regression model parameters by selecting parameter values that minimize
the variance of the residuals.
= Residual  difference between actual and fitted values of the outcome variable.
Choosing different
parameter values moves
the estimated regression
line away (on average)
from the data points. This
results in increased
variance in the residuals.
$280,000
Sales (current month)
$260,000
$240,000
$220,000
$200,000
$180,000
y = -11,648,868x + 771,670
$160,000
4.3%
4.4%
4.5%
4.6%
4.7%
4.8%
4.9%
5.0%
5.1%
Unemployment Rate (previous month)
© Copyright 2003. Do not distribute or copy without permission.
232
Regression Analysis
To perform regression in Excel: (1) Select TOOLS, then DATA ANALYSIS
(2) Select REGRESSION
© Copyright 2003. Do not distribute or copy without permission.
233
Regression Analysis
To perform regression in Excel: (3) Enter the range of cells containing outcome (“Y”) and
explanatory (“X”) variables
(4) Enter a range of cells for the output
Constant is zero
Check this box to force the vertical intercept to be
zero.
Confidence level
Excel automatically reports 95% confidence intervals.
Check this box and enter a level of confidence if you
want a different confidence interval.
Residuals
Check this box if you want Excel to report the
residuals.
Standardized residuals
Check this box if you want Excel to report the
residuals in terms of standard deviations from the
mean.
© Copyright 2003. Do not distribute or copy without permission.
234
Regression Analysis
Regression results
Vertical intercept
estimate
Slope estimate
95% confidence interval around parameter estimate
Test statistic and p-value for H0: parameter = 0
Standard deviation of slope estimate
Standard deviation of vertical intercept estimate
© Copyright 2003. Do not distribute or copy without permission.
235
Distribution of Regression Parameter Estimates
If we select a different sample of observations from a population and then perform OLS, we
will obtain slightly different parameter estimates.
Thus, regression parameter estimates are random variables.
Let ˆ be a regression parameter estimate.
The properties of a regression parameter estimates:
Population parameter  
Standard deviation of  varies depending on the regression mode
ˆ is distributed t N k , where k = number of parameters in the regression model
© Copyright 2003. Do not distribute or copy without permission.
236
Distribution of Regression Parameter Estimates
Regression demo
Enter population values here.
Spreadsheet selects a sample from
the population and calculates
parameter estimates based on the
sample.
Press F9 to select a new sample.
© Copyright 2003. Do not distribute or copy without permission.
237
Regression Analysis
Example:
Proponents of trade restrictions claim that free trade costs American jobs because of foreign
competition. Free trade advocates claim that free trade creates American jobs because of
foreign demand for American products.
Using regression analysis, test the hypothesis that higher levels of unemployment accompany
lower levels of trade restrictions.
© Copyright 2003. Do not distribute or copy without permission.
238
Regression Analysis
1. State the regression model.
Unemp Ratet  0  1 (Freedom of Tradet )  ut
Problem: We don’t have a measure for freedom of trade.
Solution: Greater trade freedom results in more trade, so use total trade as a proxy for
freedom of trade.
Unemp Ratet  0  1 (Total Tradet )  ut
Problem: Because the economy grows over time, we would expect total trade to grow
over time also.
Solution: Instead of looking at total trade, look at trade as a percentage of GDP. This
measure tells us what percentage of total economic activity is devoted to trade.
 Total Trade
Unemp Ratet  0  1 
GDP
t

© Copyright 2003. Do not distribute or copy without permission.

  ut

239
Regression Analysis
Regression model
 Total Trade
Unemp Ratet  0  1 
GDP
t


  ut

2. Collect the data.
Data Set #5 contains the following information (for the U.S., 1/92 through 3/03):
1.
2.
3.
4.
Unemployment rate
Volume of Exports
Volume of Imports
Gross domestic product (GDP)
 Calculate total trade as a % of GDP
© Copyright 2003. Do not distribute or copy without permission.
240
Regression Analysis
Regression model
 Total Trade
Unemp Ratet  0  1 
GDP
t


  ut

3. State the hypotheses.
Our hypothesis is: “Higher levels of unemployment accompany lower levels of trade
restrictions.”
The explanatory variable we are using is a proxy for freedom of trade, not trade restrictions.
Restating in terms of freedom of trade, our hypothesis becomes: “Higher levels of
unemployment accompany higher levels of freedom of trade.”
In statistical notation, the hypotheses are:
H0 : 1  0
Ha : 1  0
© Copyright 2003. Do not distribute or copy without permission.
241
Regression Analysis
Regression model
 Total Trade
Unemp Ratet  0  1 
GDP
t

H0 : 1  0

  ut

Ha : 1  0
4. Estimate the regression parameters using OLS.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.895688128
R Square
0.802257223
Adjusted R Square
0.800770435
Standard Error
0.004771175
Observations
135
ˆ0
ˆ1
s ˆ
0
s ˆ
1
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
133
134
SS
0.012283307
0.003027626
0.015310933
MS
0.012283307
2.27641E-05
Coefficients
Standard Error
t Stat
0.190194429
0.00586219
32.444264
-7.205346804
0.310186266 -23.22909678
© Copyright 2003. Do not distribute or copy without permission.
F
Significance F
539.5909374
1.19383E-48
P-value
4.82099E-65
1.19383E-48
Lower 95%
Upper 95%
Lower 95.0% Upper 95.0%
0.178599252 0.201789605 0.178599252 0.201789605
-7.818882827 -6.591810782 -7.818882827 -6.591810782
242
Regression Analysis
Regression model
 Total Trade
Unemp Ratet  0  1 
GDP
t

H0 : 1  0

  ut

Ha : 1  0
5. Construct the test statistic.
Test statistic 
Test value  hypothesized value ˆ1  1 7.205  0


 23.23
standard deviation
s ˆ
0.310
1
© Copyright 2003. Do not distribute or copy without permission.
243
Regression Analysis
Regression model
 Total Trade
Unemp Ratet  0  1 
GDP
t

H0 : 1  0

  ut

Ha : 1  0
6. Picture the distribution and identify the null and alternative areas.
t133
© Copyright 2003. Do not distribute or copy without permission.
244
Regression Analysis
Regression model
 Total Trade
Unemp Ratet  0  1 
GDP
t

H0 : 1  0

  ut

Ha : 1  0
7. Insert the test statistic and find the area of the alternative tail (p-value approach).
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
Pr(t < Test statistic)
(23.300)
133
100.00%
0.00%
Pr(t > Critical value)
Critical Value
t133
-23.23
© Copyright 2003. Do not distribute or copy without permission.
###########
p-value = 0.00%
The probability of our being wrong in
believing that higher levels of unemployment
are associated with lower levels of free trade
is virtually 0%.
245
Regression Analysis
Regression model
 Total Trade
Unemp Ratet  0  1 
GDP
t

H0 : 1  0

  ut

Ha : 1  0
7. Insert the test statistic and find the area of the alternative tail (p-value approach).
SUMMARY OUTPUT
Note: The test statistic and p-value (for two-tailed test)
are given in the output.
Regression Statistics
Multiple R
0.895688128
R Square
0.802257223
Adjusted R Square
0.800770435
Standard Error
0.004771175
Observations
135
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
133
134
SS
0.012283307
0.003027626
0.015310933
MS
0.012283307
2.27641E-05
Coefficients
Standard Error
t Stat
0.190194429
0.00586219
32.444264
-7.205346804
0.310186266 -23.22909678
© Copyright 2003. Do not distribute or copy without permission.
F
Significance F
539.5909374
1.19383E-48
P-value
4.82099E-65
1.19383E-48
Lower 95%
Upper 95%
Lower 95.0% Upper 95.0%
0.178599252 0.201789605 0.178599252 0.201789605
-7.818882827 -6.591810782 -7.818882827 -6.591810782
246
Regression Analysis
Regression model
 Total Trade
Unemp Ratet  0  1 
GDP
t

H0 : 1  0

  ut

Ha : 1  0
8. Check results by looking at a graph of the data.
January 1992 - March 2003
9%
8%
Unemployment Rate
7%
6%
5%
4%
3%
2%
1%
0%
1.5%
1.6%
1.7%
1.8%
1.9%
2.0%
2.1%
2.2%
2.3%
Trade as % of GDP
© Copyright 2003. Do not distribute or copy without permission.
247
Correlation vs. Causation
Regression model
 Total Trade
Unemp Ratet  0  1 
GDP
t


  ut

Our results only indicate that higher levels of free trade are associated with lower
levels of unemployment. The results do not say anything about causality.
Example:
The incidence of alarm clocks going off is strongly associated with the rising of the
sun. However, this does not mean that alarm clocks cause the sun to rise. The
relationship is correlational not causational.
Example:
Could it be that the relationship between free trade and the unemployment rate is
reverse causal?
 Perhaps lower levels of unemployment cause higher levels of trade rather than
higher levels of trade causing lower levels of unemployment.
© Copyright 2003. Do not distribute or copy without permission.
248
Correlation vs. Causation
Regression model
 Total Trade
Unemp Ratet  0  1 
GDP
t


  ut

One way to check for causality (though, technically, this is not a rigorous test), is to
look for a relationship that spans time.
Example:
If higher levels of free trade causes lower levels of unemployment, then past trade
levels should be negatively related to future unemployment levels.
To run this (quasi) test for causality, let us alter our regression model as follows:
 Total Trade 
Unemp Ratet  0  1 
  ut
GDP
t

6


The unemployment rate today…
…is a function of trade six months ago.
© Copyright 2003. Do not distribute or copy without permission.
249
Correlation vs. Causation
Regression model
 Total Trade 
Unemp Ratet  0  1 
  ut
GDP
t 6 

H0 : 1  0
Ha : 1  0
Test statistic 

SUMMARY OUTPUT
Test value  hypothesized value
standard deviation
ˆ1  1 6.109  0

 17.01
s ˆ
0.359
1
Regression Statistics
Multiple R
0.833589958
R Square
0.694872218
Adjusted R Square
0.692469637
Standard Error
0.005515358
Observations
129
Probability of wrongly rejecting the null
hypothesis is (virtually) 0%.
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
127
128
SS
0.008797804
0.003863235
0.012661039
MS
0.008797804
3.04192E-05
Coefficients Standard Error
t Stat
0.168489919
0.006784649 24.83399257
-6.109445472
0.359243017 -17.00644184
© Copyright 2003. Do not distribute or copy without permission.
F
Significance F
289.219064
1.55366E-34
P-value
1.39934E-50
1.55366E-34
Lower 95%
Upper 95%
Lower 95.0% Upper 95.0%
0.155064324 0.181915514 0.155064324 0.181915514
-6.820322542 -5.398568401 -6.820322542 -5.398568401
250
Correlation vs. Causation
Regression model
 Total Trade 
Unemp Ratet  0  1 
  ut
GDP
t 6 

Notice that our regression model is expressed in terms of levels. The regression assumes that
the level of the unemployment rate is a function of the level of trade (as a % of GDP).
Another way to test for causality is to look at the relationship between changes instead of
levels of data.
Such a relationship would assume that the change in the unemployment rate is a function of
the change in trade (as a % of GDP).
The level relationship says: “When trade is high, unemployment is low.”
The change relationship says: “When trade increases, unemployment decreases.”
© Copyright 2003. Do not distribute or copy without permission.
251
Correlation vs. Causation
Regression model
 Total Trade 
Unemp Ratet  0  1  
  ut
GDP
t 6 

Change in the unemployment rate from month t –1 to month t.
We use capital delta to signify “change.” By convention, a delta in front of a variable indicates
the change from the previous observation to the current observation.
The regression model shown above assumes that the change in unemployment from time
t –1 to time t is a function of the change in total trade (as a % of GDP) from time t –7 to
time t –6.
© Copyright 2003. Do not distribute or copy without permission.
252
Correlation vs. Causation
We must discard these observations
because there are no matching
observations in the explanatory
variable.
Regression model
 Total Trade 
Unemp Ratet  0  1  
  ut
GDP
t 6 

When computing changes and taking lags, be extremely careful not to make errors in lining
up the data with the dates. The chart below shows the first few rows of data for Data Set
#4 after the appropriate changes and lags have been made.
Date
Jan-92
Feb-92
Mar-92
Apr-92
May-92
Jun-92
Jul-92
Aug-92
Sep-92
Oct-92
Nov-92
Dec-92
Jan-93
Feb-93
Mar-93
Apr-93
Unemploymentt
0.073
0.074
0.074
0.074
0.076
0.078
0.077
0.076
0.076
0.073
0.074
0.074
0.073
0.071
0.07
0.071
Trade
GDP t
0.01651
0.01664
0.01643
0.01649
0.01650
0.01684
0.01701
0.01642
0.01673
0.01697
0.01671
0.01674
0.01664
0.01648
0.01709
0.01706
Unemploymentt 1
0.073
0.074
0.074
0.074
0.076
0.078
0.077
0.076
0.076
0.073
0.074
0.074
0.073
0.071
0.070
Trade
GDP t 1
0.01651
0.01664
0.01643
0.01649
0.01650
0.01684
0.01701
0.01642
0.01673
0.01697
0.01671
0.01674
0.01664
0.01648
0.01709
Outcome variable
© Copyright 2003. Do not distribute or copy without permission.
Unemploymentt
0.001
0.000
0.000
0.002
0.002
-0.001
-0.001
0.000
-0.003
0.001
0.000
-0.001
-0.002
-0.001
0.001

Trade
GDP t
0.00013
-0.00020
0.00005
0.00002
0.00034
0.00017
-0.00059
0.00032
0.00024
-0.00026
0.00002
-0.00010
-0.00017
0.00062
-0.00003

Trade
GDP t 6
0.00013
-0.00020
0.00005
0.00002
0.00034
0.00017
-0.00059
0.00032
0.00024
Explanatory variable
253
Correlation vs. Causation
Regression model
 Total Trade 
Unemp Ratet  0  1  
  ut
GDP
t 6 

H0 : 1  0
Ha : 1  0
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
(2.059)
127
97.92%
Pr(t < Test statistic)
2.08%
Pr(t > Critical value)
ˆ  1 0.978  0
Test statistic  1

 2.059
s ˆ
0.4749
1
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.180447306
R Square
0.03256123
Adjusted R Square
0.024883145
Standard Error
0.001397002
Observations
128
Critical Value
###########
Probability of being incorrect in
rejecting the null hypothesis is
2.1%.
Warning: This is a two-tailed p-value.
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
126
127
SS
8.2764E-06
0.000245903
0.00025418
MS
F
Significance F
8.2764E-06 4.240800722
0.041523461
1.95161E-06
Coefficients Standard Error
t Stat
P-value
-0.000128959
0.00012384 -1.04133302 0.299714994
-0.978057529
0.474941881 -2.059320451 0.041523461
© Copyright 2003. Do not distribute or copy without permission.
Lower 95%
Upper 95%
Lower 95.0% Upper 95.0%
-0.000374035 0.000116117 -0.000374035 0.000116117
-1.917953036 -0.038162022 -1.917953036 -0.038162022
254
Correlation vs. Causation
Regression model
 Total Trade 
Unemp Ratet  0  1  
  ut
GDP
t 6 

H0 : 1  0
Ha : 1  0
Test statistic 
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
(2.059)
127
97.92%
Pr(t < Test statistic)
2.08%
Pr(t > Critical value)
ˆ1  1
s ˆ
1

0.978  0
 2.059
0.4749
Critical Value
###########
Probability of being incorrect in
rejecting the null hypothesis is
2.1%.
Conclusion:
The data support the proposition that an increase in trade (as a % of GDP) today is
associated with a decrease in the unemployment rate six months later.
© Copyright 2003. Do not distribute or copy without permission.
255
Regression Analysis
Applications of regression analysis:
1.
2.
Impact study
Prediction
Impact study:
Impact studies are concerned with measuring the impact of explanatory variables on an
outcome variable. Whether or not the resultant regression model adequately predicts the
outcome variable is (for the most part) inconsequential.
Prediction:
Prediction models are concerned with accounting for as many sources of influence on the
outcome variable as possible. The more sources of influence that can be accounted for, the
better the model is able to predict the outcome variable. To what extent the explanatory
variables impact the outcome variable is (for the most part) inconsequential.
© Copyright 2003. Do not distribute or copy without permission.
256
Regression Analysis
Regression model
 Total Trade 
Unemp Ratet  0  1  
  ut
GDP
t 6 

R2 measures the proportion of variation in the outcome
variable that is accounted for by variations in the
explanatory variables.
SUMMARY OUTPUT
Example:
Regression Statistics
Multiple R
0.180447306
R Square
0.03256123
Adjusted R Square
0.024883145
Standard Error
0.001397002
Observations
128
In our regression model, fluctuations in the change in our
trade measure (lagged 6 months) account for 3.3% of
fluctuations in the change in the unemployment rate.
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
126
127
SS
8.2764E-06
0.000245903
0.00025418
MS
F
Significance F
8.2764E-06 4.240800722
0.041523461
1.95161E-06
Coefficients Standard Error
t Stat
P-value
-0.000128959
0.00012384 -1.04133302 0.299714994
-0.978057529
0.474941881 -2.059320451 0.041523461
© Copyright 2003. Do not distribute or copy without permission.
Lower 95%
Upper 95%
Lower 95.0% Upper 95.0%
-0.000374035 0.000116117 -0.000374035 0.000116117
-1.917953036 -0.038162022 -1.917953036 -0.038162022
257
Regression Analysis
Regression model
 Total Trade 
Unemp Ratet   0  1  
  ut
GDP
t 6 

R 2  0.033
If our model accounts for 3.3% of the fluctuations in changes in the unemployment
rate, then the remaining 96.7% of the fluctuations are unaccounted. Remember that the
error term represents all factors that influence changes in unemployment other than
those explicitly appearing in the model.
We have said two (apparently) contradictory things:
1. The slope coefficient is non-zero  changes in trade significantly affect changes in
unemployment.
2. The R2 is small  fluctuations in changes in trade only account for 3% of
fluctuations in changes in unemployment.
These two statements are not contradictory because the slope coefficient and the R2
measure different things.
What the results tell us is that the influence of trade on unemployment is consistent
enough to be detected against the background noise. However, the background noise is
extremely loud.
© Copyright 2003. Do not distribute or copy without permission.
258
Regression Analysis
 ˆ  5.55
Yt  0  1 X t  ut
 u  0.5
 ˆ  11.09
Yt  0  1 X t  ut
 u  1.0
0
 ˆ  0.08
1
R 2  0.72
7.0
7.0
6.0
6.0
5.0
5.0
4.0
4.0
3.0
3.0
2.0
2.0
1.0
1.0
0
 ˆ  0.16
1
R 2  0.44
0.0
0.0
0.0
0.5
1.0
1.5
© Copyright 2003. Do not distribute or copy without permission.
2.0
2.5
3.0
3.5
4.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
259
Multiple Regression Analysis
In multiple regression analysis the OLS technique finds the linear relationship between an
outcome variable and a group of explanatory variables.
As in simple regression analysis, OLS filters out random noise so as to find the underlying
deterministic relationship. OLS also identifies the individual effects of each of the multiple
explanatory variables.
Simple regression
Yt  0  1X t  ut
Multiple regression
Yt  0  1X 1,t  2 X 2,t  ...  m X m ,t  ut
© Copyright 2003. Do not distribute or copy without permission.
260
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Miles Traveled
500
250
500
500
250
400
375
325
450
450
Deliveries Travel Time (hours)
4
11.3
3
6.8
4
10.9
2
8.5
2
6.2
2
8.2
3
9.4
4
8
3
9.6
2
8.1
Approach #1: Calculate Average Time per Mile
Trucks in the data set required a total of 87 hours to
travel a total of 4,000 miles. Dividing hours by miles,
we find an average of 0.02 hours per mile journeyed.
Problem:
This approach ignores a possible fixed effect. For
example, if travel time is measured starting from the
time that out-bound goods begin loading, then there
will be some fixed time (the time it takes to load the
truck) tacked on to all of the trips. For longer trips this
fixed time will be “amortized” over more miles and will
have less of an impact on the time/mile ratio than for
shorter trips.
This approach also ignores the impact of the number
of deliveries.
© Copyright 2003. Do not distribute or copy without permission.
261
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Approach #2: Calculate Average Time per Mile and Average Time per Delivery
Trucks in the data set averaged 87 / 4,000 = 0.02 hours per mile journeyed,
and 87 / 29 = 3 hours per delivery.
Problem:
Like the previous approach, this approach ignores a possible fixed effect.
This approach does account for the impact of both miles and deliveries, but the approach
ignores the possible interaction between miles and deliveries. For example, trucks that travel
more miles likely also make more deliveries. Therefore, when we combine the time/miles
and time/delivery measures, we may be double-counting time.
© Copyright 2003. Do not distribute or copy without permission.
262
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Miles Traveled
500
250
500
500
250
400
375
325
450
450
Deliveries Travel Time (hours)
4
11.3
3
6.8
4
10.9
2
8.5
2
6.2
2
8.2
3
9.4
4
8
3
9.6
2
8.1
Timei  0  1 (milesi )  u i
Approach #3: Regress Time on Miles
The regression model will detect and isolate any fixed effect.
Problem:
The model ignores the impact of the number of deliveries. For example, a 500 mile journey
with 4 deliveries will take longer than a 500 mile journey with 1 delivery.
© Copyright 2003. Do not distribute or copy without permission.
263
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Miles Traveled
500
250
500
500
250
400
375
325
450
450
Deliveries Travel Time (hours)
4
11.3
3
6.8
4
10.9
2
8.5
2
6.2
2
8.2
3
9.4
4
8
3
9.6
2
8.1
Timei  0  1 (deliveriesi )  u i
Approach #4: Regress Time on Deliveries
The regression model will detect and isolate any fixed effect and will account for the impact
of the number of deliveries.
Problem:
The model ignores the impact of miles traveled. For example, a 500 mile journey with 4
deliveries will take longer than a 200 mile journey with 4 deliveries.
© Copyright 2003. Do not distribute or copy without permission.
264
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Miles Traveled
500
250
500
500
250
400
375
325
450
450
Deliveries Travel Time (hours)
4
11.3
3
6.8
4
10.9
2
8.5
2
6.2
2
8.2
3
9.4
4
8
3
9.6
2
8.1
Timei  0  1 (milesi )  2 (deliveriesi )  u i
Approach #5: Regress Time on Both Miles and Deliveries
The multiple regression model (1) will detect and isolate any fixed effect, (2) will account
for the impact of the number of deliveries, (3) will account for the impact of miles, and (4)
will eliminate out the overlapping effects of miles and deliveries.
© Copyright 2003. Do not distribute or copy without permission.
265
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Regression model:
Timei  0  1 (milesi )  2 (deliveriesi )  u i
Estimated regression model:
^
Timei  ˆ0  ˆ1 (miles i )  ˆ2 (deliveries i )
SUMMARY OUTPUT
ˆ0  1.13 (0.952) [0.2732]
Regression Statistics
Multiple R
0.950678166
R Square
0.903788975
Adjusted R Square
0.876300111
Standard Error
0.573142152
Observations
10
ˆ1  0.01 (0.002) [0.0005]
ˆ2  0.92 (0.221) [0.0042]
R 2  0.90
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
2
7
9
SS
MS
F
Significance F
21.60055651 10.80027826 32.87836743
0.00027624
2.299443486 0.328491927
23.9
Coefficients Standard Error
t Stat
P-value
1.131298533
0.951547725 1.188903619 0.273240329
0.01222692
0.001977699 6.182396959 0.000452961
0.923425367
0.221113461 4.176251251 0.004156622
© Copyright 2003. Do not distribute or copy without permission.
Lower 95%
Upper 95%
-1.118752683 3.38134975
0.007550408 0.016903431
0.400575489 1.446275244
Standard deviations of
parameter estimates and pvalues are typically shown in
parentheses and brackets,
respectively, near the
parameter estimates.
266
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Estimated regression model:
^
Timei  ˆ0  ˆ1 (miles i )  ˆ2 (deliveries i )
ˆ0  1.13 (0.952) [0.2732]
ˆ1  0.01 (0.002) [0.0005]
ˆ2  0.92 (0.221) [0.0042]
R 2  0.90
Notes on results:
1. Constant is not significantly
different from zero.
2. Slope coefficients are significantly
different from zero.
3. Variation in miles and deliveries,
together, account for 90% of the
variation in time.
© Copyright 2003. Do not distribute or copy without permission.
The parameter estimates are measures of
the marginal impact of the explanatory
variables on the outcome variable.
Marginal impact measures the impact of
one explanatory variable after the impacts
of all the other explanatory variables are
filtered out.
Marginal impacts of explanatory variables
0.01 = increase in time given increase of
1 mile traveled.
0.92 = increase in time given increase of
1 delivery.
267
Prediction
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict the round-trip travel time for a truck that is
traveling 600 miles and making 1 delivery.
Approach #1: Prediction based on average time-per-mile
^
Timei  (Average hours per mile)(miles i )
^
Timei  0.02(600)  12 hours
16
Predicted Travel Time (hours)
14
12
10
8
6
4
2
0
Approach #1
© Copyright 2003. Do not distribute or copy without permission.
Approach #2
268
Prediction
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict the round-trip travel time for a truck that is
traveling 600 miles and making 1 delivery.
Approach #2: Prediction based on average time-per-mile and time-per-delivery
^
Timei  (Average hours per mile)(miles i )  (Average hours per delivery)(deliveriesi )
^
Timei  0.02(600)  3(1)  15 hours
16
Predicted Travel Time (hours)
14
12
10
8
6
4
2
0
Approach #1
© Copyright 2003. Do not distribute or copy without permission.
Approach #2
269
Prediction
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict the round-trip travel time for a truck that is
traveling 600 miles and making 1 delivery.
Approach #3: Prediction based on simple regression of time on miles
^
Timei  ˆ0  ˆ1 (milesi )
^
Timei  3.27  0.01(600)  9.3 hours
16
Predicted Travel Time (hours)
14
12
10
8
6
4
2
0
Approach #1
© Copyright 2003. Do not distribute or copy without permission.
Approach #2
Approach #3
270
Prediction
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict the round-trip travel time for a truck that is
traveling 600 miles and making 1 delivery.
Approach #4: Prediction based on simple regression of time on deliveries
^
Timei  ˆ0  ˆ1 (deliveries i )
^
Timei  5.38  1.14(1)  6.5 hours
16
Predicted Travel Time (hours)
14
12
10
8
6
4
2
0
Approach #1
© Copyright 2003. Do not distribute or copy without permission.
Approach #2
Approach #3
Approach #4
271
Prediction
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict the round-trip travel time for a truck that is
traveling 600 miles and making 1 delivery.
Approach #5: Prediction based on multiple regression of time on miles and deliveries
^
Timei  ˆ0  ˆ1 (milesi )  ˆ2 (deliveries i )
^
Timei  1.13  0.01(600)  0.92(1)  8.1 hours
16
Predicted Travel Time (hours)
14
12
10
8
6
4
2
0
Approach #1
© Copyright 2003. Do not distribute or copy without permission.
Approach #2
Approach #3
Approach #4
Approach #5
272
Prediction and Goodness of Fit
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict the round-trip travel time for a truck that is
traveling 600 miles and making 1 delivery.
Compare the R2 (goodness of fit) from the three regression models (approaches #3,
#4, and #5)
Approach #3
Timei   0  1 (miles i )  u i
R 2  0.66
Approach #4
Timei   0  1 (deliveries i )  u i
R 2  0.38
Approach #5
Timei   0  1 (miles i )   2 (deliveries i )  u i
R 2  0.90
© Copyright 2003. Do not distribute or copy without permission.
In approach #3, 66% of the variation in time
is explained. This leaves 34% of the variation
in time unexplained and, therefore,
unpredictable.
In approach #4, only 38% of the variation in
time is explained. This leaves 62% of the
variation in time unexplained and, therefore,
unpredictable.
In approach #5, 90% of the variation in time
is explained. This leaves only 10% of the
variation in time unexplained and
unpredictable.
273
Prediction and Goodness of Fit
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
In the table below, we have added a new explanatory variable (called Random – see
Data Set #7) that contains randomly derived numbers. Because the numbers are
random, they have no impact on the dependent variable.
Miles Traveled
500
250
500
500
250
400
375
325
450
450
Deliveries
4
3
4
2
2
2
3
4
3
2
Random
0.087
0.002
0.794
0.910
0.606
0.239
0.265
0.842
0.662
0.825
Travel Time (hours)
11.3
6.8
10.9
8.5
6.2
8.2
9.4
8
9.6
8.1
Estimate the following regression model:
Timei  0  1 (milesi )  2 (deliveriesi )  3 (randomi )  u i
© Copyright 2003. Do not distribute or copy without permission.
274
Prediction and Goodness of Fit
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Notice that the goodness of fit measure has increased from 0.904
(in Approach #5) to 0.909. This would seem to indicate that this
model provides a better fit than did Approach #5.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.95328585
R Square
0.908753912
Adjusted R Square
0.863130868
Standard Error
0.60287941
Observations
10
It turns out that, every time you add an explanatory variable, the
R2 increases. This is because OLS looks for any portion of the
remaining noise that the new variable can explain. At the very
worst, OLS will find no explanatory power to attribute to the new
variable and so the R2 will not change – but adding another
explanatory variable never causes R2 to fall.
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
X Variable 3
3
6
9
SS
21.7192185
2.1807815
23.9
Coefficients Standard Error
1.455892534
1.150895704
0.012185716
0.002081561
0.894281428
0.238113014
-0.394347384
0.690166078
© Copyright 2003. Do not distribute or copy without permission.
MS
F
Significance F
7.2397395 19.91874794
0.001603903
0.363463583
t Stat
1.265008226
5.85412499
3.755701594
-0.57138042
P-value
0.252777599
0.001097066
0.009446164
0.588489374
Lower 95%
-1.360249864
0.007092317
0.311639445
-2.083124175
Upper 95%
4.272034931
0.017279115
1.476923411
1.294429407
275
Prediction and Goodness of Fit
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.95328585
R Square
0.908753912
Adjusted R Square
0.863130868
Standard Error
0.60287941
Observations
10
To determine whether or not a new explanatory variable adds
anything of substance, we look at the adjusted R2. The adjusted
R2 includes a penalty for adding more explanatory variables.
Approach #5 had an adjusted R2 of 0.876. When we added the
random explanatory variable, the R2 dropped to 0.863. This
indicates that the extra explanatory power the new variable adds
does not make up for the loss in degrees of freedom from adding
the variable to the model. Therefore, your model is actually
improved by leaving the new variable out.
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
X Variable 3
3
6
9
SS
21.7192185
2.1807815
23.9
Coefficients Standard Error
1.455892534
1.150895704
0.012185716
0.002081561
0.894281428
0.238113014
-0.394347384
0.690166078
© Copyright 2003. Do not distribute or copy without permission.
MS
F
Significance F
7.2397395 19.91874794
0.001603903
0.363463583
t Stat
1.265008226
5.85412499
3.755701594
-0.57138042
P-value
0.252777599
0.001097066
0.009446164
0.588489374
Lower 95%
-1.360249864
0.007092317
0.311639445
-2.083124175
Upper 95%
4.272034931
0.017279115
1.476923411
1.294429407
276
Prediction and Goodness of Fit
Technical notes on R2 and adjusted R2
1.
2.
3.
Regardless of the number of explanatory variables, R2 always measures the
proportion of variation in the outcome variable explained by variations in the
explanatory variables.
You cannot compare R2’s or adjusted R2’s from two models that use different
outcome variables.
Adjusted R2 is often written as R 2.
© Copyright 2003. Do not distribute or copy without permission.
277
Properties of OLS Parameter Estimates
Provided the data you are analyzing is well behaved, the parameter estimates that you
obtain via the OLS procedure have the the following properties:
1. Unbiasedness
2. Consistency
3. Efficiency
© Copyright 2003. Do not distribute or copy without permission.
278
Properties of OLS Parameter Estimates
Provided the data you are analyzing is well behaved, the parameter estimates that you
obtain via the OLS procedure have the following properties:
1.
The parameter estimates are unbiased.
An estimate is unbiased when the expected value of the estimate is equal to the
parameter the estimate intends to measure.
Example:
Consider rolling a die. The population mean of the die rolls is 3.5. Suppose we take
a sample of N rolls of the die. Let Xi be the i th die roll. We then estimate the
population mean via the equation
Parameter Estimator #1 
1
N
N
Xi

i
1
Parameter Estimator #1 is unbiased because, on average, it will equal 3.5.
Suppose we use a different equation, called Parameter Estimator #2, to estimate
the population mean of the die rolls.
1 N
Parameter Estimator #2 
Xi
N  1 i 1
Parameter Estimator #2 is biased because, on average, it will be less than 3.5.
© Copyright 2003. Do not distribute or copy without permission.
279
Properties of OLS Parameter Estimates
Provided the data you are analyzing is well behaved, the parameter estimates that you
obtain via the OLS procedure have the following properties:
2.
The parameter estimates are consistent.
An estimate is consistent when the expected difference between the estimate and
the population parameter decreases as the sample size increases.
Example:
Parameter Estimator #1 
1
N
N
Xi

i
1
Parameter Estimator #1 is unbiased because, on average, it will equal 3.5. It is also
consistent because the estimate comes closer to 3.5 (on average) as N increases.
Similarly, Parameter Estimate #2 is biased but it is consistent. Parameter Estimate
#2 is, on average, less than 3.5, but as the number of observations increases,
Parameter Estimate #2 becomes closer (on average) to 3.5.
1 N
Parameter Estimator #2 
Xi
N  1 i 1
© Copyright 2003. Do not distribute or copy without permission.
280
Properties of OLS Parameter Estimates
Provided the data you are analyzing is well behaved, the parameter estimates that you
obtain via the OLS procedure have the following properties:
2.
The parameter estimates are consistent.
An estimate is consistent when the expected difference between the estimate and
the population parameter decreases as the sample size increases.
Example:
Suppose we use a different equation, called Parameter Estimator #3, to estimate
the population mean of the die rolls. For the i th die roll:
 1 if i is odd
Parameter Estimator #3  
6 if i is even
Parameter Estimator #3 is unbiased because, on average, it will equal 3.5. But,
Parameter Estimator #3 is inconsistent because, as the sample size increases, the
parameter estimator does not come closer to the population parameter of 3.5.
© Copyright 2003. Do not distribute or copy without permission.
281
Properties of OLS Parameter Estimates
Provided the data you are analyzing is well behaved, the parameter estimates that you
obtain via the OLS procedure have the following properties:
3.
The parameter estimates are efficient.
An estimate is efficient when it has the lowest achievable standard deviation
(among all linear, unbiased estimators).
Example:
Suppose we use Parameter Estimator #4, to estimate the population mean of the
die rolls. Parameter Estimator #4 multiplies the N observations and then takes the
N th root of the product.
Parameter Estimator #4  0.5  N
N
Xi

i
1
Parameter Estimator #4 is unbiased because, on average, it will equal 3.5.
Parameter Estimator #4 is consistent because, as the sample size increases, the
parameter estimator comes closer (on average) to the population parameter of 3.5.
Parameter Estimator #4 is inefficient because the standard deviation of Parameter
Estimator #4 is not the minimum achievable standard deviation. Parameter
Estimator #1 has a lower standard deviation.
© Copyright 2003. Do not distribute or copy without permission.
282
Properties of OLS Parameter Estimates
Summary of properties of OLS parameter estimates (assuming well-behaved data):
Unbiasedness: E ( ˆ)  
Consistency:
plim( ˆ)  
Efficiency:
s ˆ  minimum of all linear, unbiased estimators of 
Let X be a sample estimator for the population mean, .
Unbiased and consistent
X 
1
N
N
Xi

i
1
Unbiased and inconsistent
 1 if i is odd
6 if i is even
X 
E (X )  
E (X )  
E (| X   |) approaches zero as N increases
E (| X   |) does not approach zero as N increases
Biased and consistent
Biased and inconsistent
1 N
X 
Xi
N  1 i 1
X 3
E (X )  
E (X )  
E (| X   |) approaches zero as N increases
E (| X   |) does not approach zero as N increases
© Copyright 2003. Do not distribute or copy without permission.
1
N
N
Xi

i
1
283
Properties of OLS Parameter Estimates
What does well behaved mean?
Well behaved is short-hand term meaning “The data conform to all the applicable
assumptions.”
The full scope of the OLS assumptions are beyond the scope of this course. Some of the
assumptions are:
1.
2.
3.
4.
5.
6.
7.
8.
9.
The error term is normally distributed.
The error term has a population mean of zero.
The error term has a population variance that is constant and finite.
Past values of the error term are unrelated to future values of the error term.
The underlying relationship between the outcome and explanatory variables is
linear.
The explanatory variables are not measured with error.
There are no relevant explanatory variables excluded from the regression model.
There are no irrelevant explanatory variables included in the regression model.
The regression parameters do not change over the sample.
© Copyright 2003. Do not distribute or copy without permission.
284
Statistical Anomalies
We will look at a few of the more egregious violations of the OLS assumptions (called
statistical anomalies).
Statistical anomalies cause OLS parameter estimates to no longer be unbiased,
consistent, and efficient.
Our goal is to:
1.
2.
3.
Recognize the impact of the anomalies on the regression results.
Test for the presence of statistical anomalies.
Correct for the statistical anomalies.
We will cover the anomalies in their (approximate) order of severity.
Note that some of these anomalies are specific to either time-series or cross-sectional
data.
Time-series:
Data is indexed by time. The order of the data matters.
Cross-sectional: Data is not indexed by time. The order of the data does not matter.
© Copyright 2003. Do not distribute or copy without permission.
285
Non-Stationarity
Non-stationarity (also called unit root) occurs when at least one of the variables in a
time-series model has an infinite population variance.
Example:
Stock prices are non-stationary. If you plot the Dow-Jones Industrial Average (see Data
Set #8), you will see that stock prices follow a trend. Data series that follow trends have
infinite population variances.
Dow Jones Industrial Average
12000
10000
8000
6000
4000
2000
18
96
19
00
19
04
19
08
19
12
19
16
19
20
19
24
19
28
19
32
19
36
19
40
19
44
19
48
19
52
19
56
19
60
19
64
19
68
19
72
19
76
19
80
19
84
19
88
19
92
19
96
20
00
0
© Copyright 2003. Do not distribute or copy without permission.
286
Non-Stationarity
The chart below shows the standard deviation of the DJIA from 1896 to the indicated
date. Because the DJIA follows a trend, the standard deviation increases over time. This
means that the population standard deviation is infinite.
Standard Deviation 1896 to Indicated Date
2500
2000
1500
1000
500
0
00 04 08 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 00
19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20
© Copyright 2003. Do not distribute or copy without permission.
287
Non-Stationarity
Implications of non-stationarity:
1.
2.
3.
4.
Parameter estimates are biased and inconsistent.
Standard deviations of parameter estimates are biased and inconsistent.
R2 measure is biased and inconsistent.
These results hold for all parameter estimates, regardless of which variable(s) is
(are) non-stationary.
The implications indicate that, in the presence of non-stationarity, none of the OLS
results are useful to us. This makes non-stationarity one of the most severe of the
statistical anomalies.
© Copyright 2003. Do not distribute or copy without permission.
288
Non-Stationarity
Example of the implications of non-stationarity:
Using Data Set #8, estimate the following regression model.
DJIAt  0  1 (DJIAt 1 )  ut
You should get the results shown below. Note that the results seem too good to be true.
1.
2.
The R2 measure is very close to 1.
Some of the p-values are exceptionally close to zero.
SUMMARY OUTPUT
1. The model explains virtually all of the
variation in the DJIA.
Regression Statistics
Multiple R
0.985124455
R Square
0.970470193
Adjusted R Square
0.970183495
Standard Error
392.8090624
Observations
105
2. The probability of the slope coefficient
equaling zero is about the same as the
probability of six killer asteroids all
hitting the Earth within the next 60
seconds.
ANOVA
df
SS
522302143.5
15892792.83
538194936.4
MS
522302143.5
154298.9595
F
Significance F
3385.001074
1.31605E-80
Coefficients Standard Error
25.36791839
42.79835355
1.067973529
0.018356128
t Stat
0.592731175
58.18076206
P-value
0.554660204
1.31605E-80
Regression
Residual
Total
Intercept
X Variable 1
1
103
104
© Copyright 2003. Do not distribute or copy without permission.
Lower 95%
Upper 95%
-59.51244428 110.2482811
1.031568512 1.104378547
289
Non-Stationarity
Example of the implications of non-stationarity:
Using Data Set #8, estimate the following regression model.
DJIAt  0  1 (DJIAt 1 )  ut
To see the impact of non-stationarity, split the data set into three parts:
1.
2.
3.
1897 through 1931
1897 through 1966
1897 through 2002
Estimate the regression model for each of the three subsets and compare the results.
© Copyright 2003. Do not distribute or copy without permission.
290
Non-Stationarity
SUMMARY OUTPUT
1896 through 1931
As we add observations, R2 is
approaching one, and
uncertainty is approaching zero.
1896 through 1966
Regression Statistics
Multiple R
0.835430571
R Square
0.69794424
Adjusted R Square
0.688791035
Standard Error
32.42075194
Observations
35
Intercept
X Variable 1
Coefficients Standard Error
19.81139741
10.86212922
0.826611264
0.094662408
t Stat
1.823896311
8.732201973
P-value
0.077235343
4.30227E-10
t Stat
0.227616894
39.31338637
P-value
0.820627082
1.71371E-48
t Stat
0.592731175
58.18076206
P-value
0.554660204
1.31605E-80
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.978701538
R Square
0.9578567
Adjusted R Square
0.957236945
Standard Error
45.10223435
Observations
70
Intercept
X Variable 1
Coefficients Standard Error
1.743586516
7.660180603
1.051148157
0.026737665
SUMMARY OUTPUT
1896 through 2002
Regression Statistics
Multiple R
0.985124455
R Square
0.970470193
Adjusted R Square
0.970183495
Standard Error
392.8090624
Observations
105
Intercept
X Variable 1
© Copyright 2003. Do not distribute or copy without permission.
Coefficients Standard Error
25.36791839
42.79835355
1.067973529
0.018356128
291
Non-Stationarity
The implication is that, eventually, we will be able to predict perfectly next year’s DJIA
given this year’s DJIA with absolute certainty.
We call these results spurious because they appear reasonable but are really the result
of a statistical anomaly, not an underlying statistical relationship.
© Copyright 2003. Do not distribute or copy without permission.
292
Non-Stationarity
Detecting non-stationarity:
1.
2.
For each variable in the model: (a) regress the variable on itself lagged one period,
(b) regress the variable on a constant term and itself lagged one period, and (c)
regress the variable on a constant term, a time trend, and itself lagged one period.
Test the null hypothesis that the absolute value of the coefficient on the lagged
variable is greater than or equal to 1. If the slope coefficient greater than or equal
to one (in absolute value) for any of the three tests, then the variable is nonstationary.
Note: This is only an approximate test. Because this test assumes non-stationarity, the
test statistic is not t-distributed, but tau-distributed. As the tau-distribution is
beyond the scope of this course, you can use the t-distribution as an
approximation.
Note also that the tails on the tau-distribution are fatter than the tails on the tdistribution. Therefore, if you fail to reject the null (in step 2 above) using the
t-distribution, then you would also fail to reject the null using the tau-distribution.
© Copyright 2003. Do not distribute or copy without permission.
293
Non-Stationarity
Example:
Test the DJIA for non-stationarity.
DJIAt  1 (DJIAt 1 )  ut
A test of the null hypothesis that the slope is greater than or equal to one yields a test
statistic of 4.439 and a p-value of (virtually) zero. We therefore conclude that the DJIA is
non-stationary.
Because the DJIA is non-stationary, any regression including the DJIA contains biased
and inconsistent results.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.985073331
R Square
0.970369467
Adjusted R Square
0.960754083
Standard Error
391.5821301
Observations
105
ANOVA
df
Regression
Residual
Total
1
104
105
Coefficients
Intercept
X Variable 1
0
1.072811655
© Copyright 2003. Do not distribute or copy without permission.
SS
MS
522247933.7 522247933.7
15947002.72 153336.5646
538194936.4
Standard Error
t Stat
#N/A
#N/A
0.016390124 65.45476184
F
Significance F
3405.893011
9.67537E-81
P-value
#N/A
2.41925E-86
Lower 95%
Upper 95%
#N/A
#N/A
1.040309466 1.105313844
294
Non-Stationarity
Correcting for non-stationarity:
1.
2.
3.
Remove the trend from the non-stationary variable by: (a) taking the first
difference, (b) taking the natural log, (c) taking the percentage change, or (d)
taking the second difference.
Test the transformed version of the variable to verify that the transformed variable
is now stationary.
Re-run the regression using the transformed version of the variable.
Note: If you have a model in which one of the variables is non-stationary and another is
not, you need only perform this transformation on the non-stationary variable.
However, often it is easier to interpret the results if you perform the same
transformation on all the variables in the model.
© Copyright 2003. Do not distribute or copy without permission.
295
Non-Stationarity
Correct the DJIA model for non-stationarity:
1.
Transform the DJIA into the growth rate in the DJIA. The transformation is:
DJIAt  DJIAt 1
DJIAt 1
Test the growth rate in the DJIA to verify that the non-stationarity has been
removed (test 1 of 3 – regress dependent on lagged dependent)
GDJIAt 
2.
GDJIAt  1 (GDJIAt 1 )  ut
SUMMARY OUTPUT
Regression Statistics
Multiple R
65535
R Square
-0.12944963
Adjusted R Square
-0.13915837
Standard Error
0.236589292
Observations
104
A test of the null hypothesis that the slope is
greater than or equal to one yields a test statistic of
–10 and a p-value of (virtually) 100%. We therefore
conclude that GDJIA is stationary.
ANOVA
df
Regression
Residual
Total
1
103
104
Coefficients
Intercept
X Variable 1
© Copyright 2003. Do not distribute or copy without permission.
0
0.001742434
SS
MS
F
Significance F
-0.660786781 -0.66078678 -11.80514097
#NUM!
5.765372783 0.055974493
5.104586002
Standard Error
t Stat
#N/A
#N/A
0.098580251 0.017675287
P-value
#N/A
0.985932089
Lower 95%
Upper 95%
#N/A
#N/A
-0.193768065 0.197252934
296
Non-Stationarity
Correct the DJIA model for non-stationarity:
3.
Test the growth rate in the DJIA to verify that the non-stationarity has been
removed (test 2 of 3 – regress dependent on lagged dependent and constant)
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.128401676
R Square
0.01648699
Adjusted R Square
0.006844706
Standard Error
0.221855516
Observations
104
A test of the null hypothesis that the slope is greater
than or equal to one yields a test statistic of –11.5
and a p-value of (virtually) 100%. We therefore
conclude that GDJIA is stationary.
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
102
103
SS
0.08415926
5.020426742
5.104586002
MS
0.08415926
0.04921987
Coefficients Standard Error
t Stat
0.090018858
0.023138826 3.890381371
-0.128568196
0.098322481 -1.307617496
© Copyright 2003. Do not distribute or copy without permission.
F
Significance F
1.709863516
0.193942749
P-value
0.000178584
0.193942749
Lower 95%
Upper 95%
0.044123129 0.135914587
-0.323590273 0.06645388
297
Non-Stationarity
Correct the DJIA model for non-stationarity:
4.
Test the growth rate in the DJIA to verify that the non-stationarity has been
removed (test 3 of 3 – regress dependent on lagged dependent, constant, and time
trend)
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.148095661
R Square
0.021932325
Adjusted R Square
0.002564648
Standard Error
0.222333051
Observations
104
A test of the null hypothesis that the slope is greater
than or equal to one yields a test statistic of –11.5
and a p-value of (virtually) 100%. We therefore
conclude that GDJIA is stationary.
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
2
101
103
SS
0.111955438
4.992630564
5.104586002
MS
0.055977719
0.049431986
Coefficients Standard Error
t Stat
0.06182561
0.044173173 1.399618949
-0.134769286
0.098880518 -1.362950854
0.000546484
0.000728767 0.749874369
© Copyright 2003. Do not distribute or copy without permission.
F
Significance F
1.132418977
0.326309523
P-value
0.164691437
0.175928971
0.455073571
Lower 95%
-0.025802071
-0.330921607
-0.000899194
Upper 95%
0.149453291
0.061383035
0.001992162
298
Non-Stationarity
Now that we know that GDJIA is stationary, we can estimate our transformed model:
GDJIAt  0  1 (GDJIAt 1 )  ut
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.128401676
R Square
0.01648699
Adjusted R Square
0.006844706
Standard Error
0.221855516
Observations
104
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
102
103
SS
0.08415926
5.020426742
5.104586002
MS
0.08415926
0.04921987
Coefficients Standard Error
t Stat
0.090018858
0.023138826 3.890381371
-0.1285682
0.098322481
-1.3076175
F
Significance F
1.709863516
0.193942749
P-value
0.000178584
0.193942749
Lower 95%
Upper 95%
0.044123129 0.135914587
-0.323590273 0.06645388
Our fitted model is:
^
GDJIAt  0.09  0.129(GDJIAt 1 )
© Copyright 2003. Do not distribute or copy without permission.
299
Non-Stationarity
Using the fitted model, predict DJIA for 2003:
^
GDJIAt  0.09  0.129(GDJIAt 1 )
DJIA 2001  11005
DJIA 2002  10104
GDJIA 2002 
DJIA 2002  DJIA 2001 10104  11005
 0.082

11005
DJIA 2001
^
GDJIA 2003  0.09  0.1286(GDJIA 2002 )  0.09  0.1286(0.082)  0.101
^
GDJIA 2003 
^
DJIA 2003  DJIA 2002
DJIA 2002
^
 0.101 
DJIA 2003  10104
10104

^
DJIA 2003  11125
As of today, the DJIA for 2003 is 9,710. This is significantly different from the
prediction of 11,125. Note that the regression model has an R2 of less than 0.02. This
means that the model fails to explain 98% of the variation in the growth rate of the
DJIA.
© Copyright 2003. Do not distribute or copy without permission.
300
Non-Stationarity
Using the spurious model, predict DJIA for 2003:
^
DJIAt  25.4  1.07(DJIAt 1 )
DJIA 2002  10104
^
DJIA 2003  25.4  1.07(10104)  10837
Although the spurious prediction is closer to the actual than was the prediction using
the stationary model, the prediction is extremely far from the actual considering the
(reported) R2 of 0.97.
Prediction from stationary model:
Prediction from non-stationary model:
Actual:
11,125
10,837
9,710
(15% overestimated)
(12% overestimated)
Note that it is simply random chance that the non-stationary model gave a (slightly)
closer prediction. We would not necessarily expect the non-stationary model to give a
better (or worse) prediction.
What is important is that the non-stationary model mislead us (via the high R2 and low
standard deviations) into thinking that it would produce good predictions.
© Copyright 2003. Do not distribute or copy without permission.
301
Non-Linearity
Non-linearity occurs when the relationship between the outcome and explanatory
variables is non-linear.
Example:
Suppose that the true relationship between two variables is:
Yi  0  1X i2  u i
OLS assumes (incorrectly in this case) that the relationship between the outcome and
explanatory variables is linear. When OLS attempts to find the best fitting linear
relationship, it will end up with something like that shown in the figure below.
35.0
30.0
Non-linear data can cause the fitted model to
be biased in one direction at the extremes and
biased in the other direction in the center.
25.0
20.0
Y
15.0
10.0
5.0
0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
-5.0
-10.0
X
© Copyright 2003. Do not distribute or copy without permission.
302
Non-Linearity
Implications of non-linearity:
1.
2.
3.
Parameter estimates are biased and inconsistent.
Standard deviations of parameter estimates are biased and inconsistent.
R2 measure is biased and inconsistent.
The implications indicate that, in the presence of non-linearity, none of the OLS results
are useful to us. Like non-stationarity, this makes non-linearity one of the most severe of
the statistical anomalies.
© Copyright 2003. Do not distribute or copy without permission.
303
Non-Linearity
In the regression demo, enter a 2 for the “X Exponent.” The demo now generates data
according to the model:
Yi  0  1X i2  u i
Repeatedly press F9 and notice that the confidence interval for the slope coefficient does
not include the population value. OLS is producing biased results.
© Copyright 2003. Do not distribute or copy without permission.
304
Non-Linearity
Example:
As Director of Human Resources, you are charged with generating estimates of the cost
of labor for a firm that is opening an office in Pittsburgh. To estimate the cost of labor,
you need two numbers for each job description: (1) base salary, and (2) benefits.
You are comfortable with your base salary estimates. You need to generate estimates
for benefits. Data Set #9 contains median salary and benefits numbers for a random
sampling of white-collar jobs in the Pittsburgh area.
Using this data, generate a model that can be used to predict the cost of benefits given
base salary.
Estimate the following model:
Benefitsi  0  1 (Salaryi )  u i
© Copyright 2003. Do not distribute or copy without permission.
305
Non-Linearity
Fitted model:
^
Benefitsi  6332  0.5976(Salary i )
According to the output, this model accounts for 82% of the variation in benefits.
Example:
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.905889157
R Square
0.820635165
Adjusted R Square
0.805688095
Standard Error
12339.93518
Observations
14
You expect someone earning $90,000 base salary to cost
the firm an additional –6332+(0.5976)($90,000) = $47,452
in benefits.
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
12
13
SS
8360260745
1827288004
10187548749
MS
8360260745
152274000.3
Coefficients Standard Error
t Stat
-6331.559158
7782.474687 -0.813566303
0.597573741
0.080648162 7.409638715
© Copyright 2003. Do not distribute or copy without permission.
F
Significance F
54.90274589
8.16758E-06
P-value
0.431740633
8.16758E-06
Lower 95%
Upper 95%
-23288.11456 10624.99625
0.421856495 0.773290987
306
Non-Linearity
Now, create a plot of the data and overlay the fitted regression model. Notice that the line appears
to overestimate the center observations and underestimate the end observations.
Warning: The apparent over and under estimation may be due to a few outliers in the data. That is
 if you obtained more data, this apparent non-linearity may go away.
So, do we have non-linearity or not?
Can you find a theoretical justification for non-linearity?
$120,000
Yes.
$100,000
The value of most benefits are tied to
pay (e.g. firm contributes 5% of gross
salary to 401k). But, as salary rises, the
number of benefits also increases (e.g.
basic health, retirement, dental, stock
options, car, expense account, use of
corporate jet, …).
Benefits
$80,000
$60,000
$40,000
$20,000
$0
$0
$20,000
$40,000
$60,000
$80,000
$100,000
Base Salary
© Copyright 2003. Do not distribute or copy without permission.
$120,000
$140,000
$160,000
$180,000
Because both the value and the number
of benefits increase with salary, we
should expect a non-linear relationship.
307
Non-Linearity
Example:
What is the form of the non-linearity?
We don’t know, but we can try different forms and compare the R2.
Note: We can compare the R2’s from the different models because the outcome variable
is the same in all the models.
Benefits i   0  1 (Salary i )  u i
Benefits i   0  1 ln (Salary i )  u i
Benefits i   0  1 eSalaryi  u i
Benefits i   0  1 (Salary i1 )  u i
Benefits i   0  1 (Salary i2 )  u i
Note: In this model, the value of exp(salary) is too
large. Therefore, for this model, we first divide
salary by 100,000, then take the exponential. This
will cause the slope coefficient and the stdev of the
slope coefficient to scale down, but the ratio of the
estimate to the stdev and the other regression
results will not change.
Benefits i   0  1 (Salary i3 )  u i
© Copyright 2003. Do not distribute or copy without permission.
308
Non-Linearity
Model
Squared Correlation
Benefits i   0  1 (Salary i )  u i
R 2  0.821
Benefits i   0  1 ln (Salary i )  u i
R 2  0.750
Benefits i   0  1 eSalaryi  u i
R 2  0.844
Benefits i   0  1 (Salary i1 )  u i
R 2  0.622
Benefits i   0  1 (Salary i2 )  u i
R 2  0.843
Benefits i   0  1 (Salary i3 )  u i
R 2  0.828
The squared and exponential models explain more of the variation in benefits
than does the linear model, and the two yield almost identical R2’s. Since the
squared model is less complicated, we’ll use that model to predict benefits.
© Copyright 2003. Do not distribute or copy without permission.
309
Non-Linearity
Regression model:
Benefitsi  0  1 (Salaryi2 )  u i
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.918129382
R Square
0.842961561
Adjusted R Square
0.829875025
Standard Error
11546.41629
Observations
14
ANOVA
df
SS
8587712000
1599836749
10187548749
MS
8587712000
133319729.1
Coefficients Standard Error
14734.7501
4959.966734
3.34675E-06
4.16996E-07
t Stat
2.970735671
8.025858976
Regression
Residual
Total
Intercept
X Variable 1
1
12
13
F
Significance F
64.4144123
3.63765E-06
P-value
0.011685201
3.63765E-06
Lower 95%
Upper 95%
3927.911135 25541.58907
2.43819E-06
4.2553E-06
Estimated regression model:
^
Benefitsi  14735  0.00000347(Salaryi2 )
© Copyright 2003. Do not distribute or copy without permission.
310
Non-Linearity
Estimate the cost of benefits for the following salaries using both the linear model and
the preferred non-linear model.
Salaries:
$20,000, $40,000, $60,000, $80,000, $100,000.
Linear model:
Preferred non-linear model:
6332  0.5976(20000)  $5, 620
14735  0.00000347(200002 )  $16,123
6332  0.5976(40000)  $17,571
14735  0.00000347(400002 )  $20, 090
6332  0.5976(60000)  $29,523
14735  0.00000347(600002 )  $26, 783
6332  0.5976(80000)  $41, 474
14735  0.00000347(800002 )  $36,154
6332  0.5976(100000)  $53, 426
14735  0.00000347(1000002 )  $48, 202
Compared to the preferred non-linear model, the linear model is biased downward at
low salary levels and biased upward at high salary levels.
© Copyright 2003. Do not distribute or copy without permission.
311
Regime Change
Regime change occurs when the parameters change value at one (or more) points in the
data set.
Example:
Conventional wisdom says that (Reagan aside) Democrats contribute to greater deficits
(i.e. smaller surpluses) than do Republicans. Data Set #10 contains relevant macroeconomic data and data on political parties in power from 1929 through 2001. Test the
hypothesis (at 5% significance) that a change in control of the Congress by Democrats
corresponds to a change in Federal government surplus (as a % of GDP).
1. Generate the Federal budget surplus as a % of GDP.
2. State the regression model.
Budget Surplus
    (% Congressional Seats Held by Democrats)t  u t
GDP
t
3. Test the hypothesis.
H0 :   0
Ha :   0
© Copyright 2003. Do not distribute or copy without permission.
312
Regime Change
Regime change occurs when the parameters change value at one (or more) points in the
data set.
Example:
Budget Surplus
    (% Congressional Seats Held by Democrats)t  u t
GDP
t
H0 :   0
Ha :   0
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
SUMMARY OUTPUT
71
Pr(t < Test statistic)
Regression Statistics
Multiple R
0.124324727
R Square
0.015456638
Adjusted R Square
0.00158983
Standard Error
0.050239746
Observations
73
Pr(t > Critical value)
Critical Value
2.50%
1.9939
Fail to reject the null.
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
71
72
SS
0.002813412
0.179206277
0.182019689
MS
F
Significance F
0.002813412 1.114650028
0.294652679
0.002524032
Coefficients Standard Error
t Stat
P-value
0.014202688
0.040708434 0.348888091 0.728205865
-0.073210394
0.069343136 -1.055769874 0.294652679
© Copyright 2003. Do not distribute or copy without permission.
Lower 95%
-0.066967664
-0.211476747
Upper 95%
0.095373039
0.06505596
313
Regime Change
Regime change occurs when the parameters change value at one (or more) points in the
data set.
Example:
This analysis ignores the effect of war. If the country is involved in a war, it is forced to
run greater deficits – regardless of which party controls the Congress. How can we
account for this “war effect?”
 The war effect is a regime change. We are hypothesizing that, during war, the
regression model changes.
Let us propose two regression models:
Regression model when there is peace
The models assume different
“baseline” surpluses.
Budget Surplus
 1   (% Congressional Seats Held by Democrats)t  u t
GDP
t
The models assume the same
marginal surpluses.
Regression model when there is war
Budget Surplus
  2   (% Congressional Seats Held by Democrats)t  u t
GDP
t
© Copyright 2003. Do not distribute or copy without permission.
314
Regime Change
Example:
If we run two separate regressions, we not only fail to hold the marginal effects constant,
but we lose information from the observations that we have removed from the
regression.
SUMMARY OUTPUT
Estimated peace year model
Regression Statistics
Multiple R
0.543091875
R Square
0.294948784
Adjusted R Square
0.280559984
Standard Error
0.020616858
Observations
51
Intercept
X Variable 1
Coefficients Standard Error
0.060045099
0.017698364
-0.136171953
0.030076456
t Stat
3.392692117
-4.52752662
P-value
0.001377543
3.82563E-05
Lower 95%
0.024478927
-0.196612817
Upper 95%
0.09561127
-0.07573109
SUMMARY OUTPUT
Estimated war year model
Regression Statistics
Multiple R
0.299629119
R Square
0.089777609
Adjusted R Square
0.04426649
Standard Error
0.079192076
Observations
22
Intercept
X Variable 1
Coefficients Standard Error
-0.325767374
0.197127821
0.47422938
0.337647235
© Copyright 2003. Do not distribute or copy without permission.
t Stat
-1.652569243
1.404511366
P-value
0.114031776
0.175507104
Lower 95%
-0.736968613
-0.230090084
Upper 95%
0.085433864
1.178548844
315
Regime Change
Another way to solve the problem is to think of the change in the baseline (the constant
term) as a regime change. In this regime change, the value of the constant term is
different over some subset of the data than it is over other subsets.
Let us define a dummy variable as follows:
1 if year t is a war year
0 otherwise
Dt  
Using the dummy variable, we can combine our two models into one (avoiding the
information loss that comes from splitting the data) and hold the marginal effect
constant.
Budget Surplus
   Dt   (% Congressional Seats Held by Democrats)t  u t
GDP
t
For peace years, Dt is zero. The term Dt disappears, and we are left with our "peace year" model.
For war years, Dt is one. The term Dt becomes  , so the constant term is   . Therefore,   
is the constant term for the "war year" model. For both models, the marginal effect,  , is the same.
© Copyright 2003. Do not distribute or copy without permission.
316
Regime Change
Let us test our hypothesis accounting for a possible regime shift in the constant term
between war and peace years.
Budget Surplus
   Dt   (% Congressional Seats Held by Democrats)t  u t
GDP
t
1 if year t is a war year
0 otherwise
Dt  
H0 :   0
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
Pr(t < Test statistic)
Ha :   0
Pr(t > Critical value)
Critical Value
2.50%
1.9944
Fail to reject the null.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.309328104
R Square
0.095683876
Adjusted R Square
0.069846272
Standard Error
0.048492023
Observations
73
Intercept
X Variable 1
X Variable 2
70
The estimated regression model is
Budget Surplus
 0.023  0.031Dt  0.072(% Democrats)t
GDP
t
Coefficients Standard Error
t Stat
P-value
0.022906038
0.039447193 0.580676002 0.56332367
-0.030824288
0.012369247 -2.492010132 0.015073295
-0.07220134
0.066932075 -1.07872555 0.284413228
© Copyright 2003. Do not distribute or copy without permission.
Lower 95%
Upper 95%
-0.055768843
0.10158092
-0.055493952 -0.006154624
-0.205693045 0.061290366
317
Regime Change
Is there a regime change from war to peace years? If there is no regime change, then
the coefficient attached to the dummy variable will be (statistically) zero.
Budget Surplus
   Dt   (% Congressional Seats Held by Democrats)t  u t
GDP
t
1 if year t is a war year
0 otherwise
Dt  
H0 :   0
Ha :   0
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
70
Pr(t < Test statistic)
Pr(t > Critical value)
Critical Value
2.50%
1.9944
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.309328104
R Square
0.095683876
Adjusted R Square
0.069846272
Standard Error
0.048492023
Observations
73
Intercept
X Variable 1
X Variable 2
Reject the hypothesis that there is no regime change.
Coefficients Standard Error
t Stat
P-value
0.022906038
0.039447193 0.580676002 0.56332367
-0.030824288
0.012369247 -2.492010132 0.015073295
-0.07220134
0.066932075 -1.07872555 0.284413228
© Copyright 2003. Do not distribute or copy without permission.
Lower 95%
Upper 95%
-0.055768843
0.10158092
-0.055493952 -0.006154624
-0.205693045 0.061290366
318
Regime Change
We designed our regression model to account for a possible regime change in the
constant term. It is possible that there is a regime change in the slope coefficient.
The slope coefficient measures the marginal effect on the budget surplus of increasing
the percentage of the Congress that is controlled by Democrats. It is possible that the
marginal effect of Democrats in Congress changes in war vs. peace years.
Consider the following model:
Budget Surplus
    (% Democrats)t   (Dt )(% Democrats)t  u t
GDP
t
In peace years, Dt = 0, so the model becomes
In war years, Dt = 1, so the model becomes
Budget Surplus
    (% Democrats)t  u t
GDP
t
Budget Surplus
   (    )(% Democrats)t  u t
GDP
t
© Copyright 2003. Do not distribute or copy without permission.
319
Regime Change
To test for a regime change in the slope coefficient, we generate a new regressor that is
% Democrats multiplied by the dummy variable. We include this new regressor in our
model.
Budget Surplus
    (% Democrats)t   (Dt )(% Democrats)t  u t
GDP
t
1 if year t is a war year
0 otherwise
Dt  
H0 :   0
H0 :   0
Ha :   0
Ha :   0
t Distribution
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.28024201
R Square
0.078535584
Adjusted R Square
0.05220803
Standard Error
0.048949634
Observations
73
Intercept
X Variable 1
X Variable 2
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
Pr(t < Test statistic)
Pr(t > Critical value)
Critical Value
Coefficients Standard Error
t Stat
P-value
0.019037674
0.039724529 0.479242286 0.633260099
-0.067417636
0.06761427 -0.99709183 0.322154281
-0.046776582
0.021368623 -2.189031199 0.031931708
© Copyright 2003. Do not distribute or copy without permission.
70
2.50%
1.9944
Lower 95%
Upper 95%
-0.060190336 0.098265684
-0.202269935 0.067434663
-0.089394921 -0.004158243
320
Regime Change
We can account for both possible regime changes in the baseline (constant term) and
marginal effect (slope) in the same model as follows:
Budget Surplus
   Dt   (% Democrats)t   (Dt )(% Democrats)t  u t
GDP
t
1 if year t is a war year
0 otherwise
Dt  
H0 :   0
H0 :   0
H0 :   0
Ha :   0
Ha :   0
Ha :   0
Conclusion:
t Distribution
Test statistic
Degrees of Freedom
Pr(t > Test statistic)
SUMMARY OUTPUT
69
Pr(t < Test statistic)
Regression Statistics
Multiple R
0.443267307
R Square
0.196485905
Adjusted R Square
0.16155051
Standard Error
0.046039584
Observations
73
Intercept
X Variable 1
X Variable 2
X Variable 3
Pr(t > Critical value)
Critical Value
2.50%
1.9949
Reject the null.
Coefficients Standard Error
t Stat
P-value
0.060045099
0.039522283
1.51927203 0.13326319
-0.385812473
0.121226874 -3.182565561 0.002189546
-0.136171953
0.067163847 -2.027459116 0.04647992
0.610401333
0.207468917 2.942133892 0.004435278
© Copyright 2003. Do not distribute or copy without permission.
Adjusting for the impact of war on
the budget, evidence suggests that
an increase in Democratically
controlled seats increases the
budget surplus in war and
increases the budget deficit in
peace.
Lower 95%
Upper 95%
-0.018799674 0.138889871
-0.627653394 -0.143971552
-0.27016012 -0.002183787
0.196512296
1.02429037
321
Regime Change
We can split our results into two estimated regression models. We use one to predict the
impact of political party on the budget surplus in war years, and the other to predict the
impact in peace years.
The estimated regression model is
Budget Surplus
 0.060  0.386Dt  0.136(% Democrats)  0.610 Dt  (% Democratst )
GDP
t
For war years (Dt  1), the estimated regression model is
Budget Surplus
 0.326  0.474(% Democratst )
GDP
t
For peace years (Dt  0), the estimated regression model is
Budget Surplus
 0.060  0.136(% Democratst )
GDP
t
© Copyright 2003. Do not distribute or copy without permission.
322
Regime Change
The estimated regression model is
Budget Surplus
 0.060  0.386Dt  0.136(% Democrats)  0.610 Dt  (% Democratst )
GDP
t
We interpret the slope coefficient as follows:
“After accounting for a baseline impact of war, in peace time every 1 percentage point (0.01)
increase in Democrat-controlled seats is associated with a 0.136 percentage point (0.00136)
decrease in the budget surplus (relative to GDP). In war time, every 1 percentage point (0.01)
increase in Democrat-controlled seats is associated with a 0.61 percentage point (0.0061)
increase in the budget surplus.”
To put the numbers in perspective:
1. There are currently 440 members of Congress.
2. Replacing one Republican with a Democrat increases the number of Democrats by
0.23 percentage points (0.0023).
3. In peace time, we expect this to be associated with a -0.03% [-0.03% = (-0.136)(0.23%)]
change in the surplus (relative to GDP).
4. GDP is currently $12 trillion.
5. So, we expect that every replacement of a Republican with a Democrat will cost the
Federal government (0.03%)($12 trillion) = $3.6 billion.
© Copyright 2003. Do not distribute or copy without permission.
323
Regime Change
The estimated regression model is
Budget Surplus
 0.060  0.386Dt  0.136(% Democrats)  0.610 Dt  (% Democratst )
GDP
t
We interpret the slope coefficient as follows:
“After accounting for a baseline impact of war, in peace time every 1 percentage point (0.01)
increase in Democrat-controlled seats is associated with a 0.136 percentage point (0.00136)
decrease in the budget surplus (relative to GDP). In war time, every 1 percentage point (0.01)
increase in Democrat-controlled seats is associated with a 0.61 percentage point (0.0061)
increase in the budget surplus.”
To put the numbers in perspective:
1. In war time, the replacement of one Republican with a Democrat is associated with a 0.34%
[0.34% = (-0.136+0.610)(0.23%)] change in the surplus (relative to GDP).
2. GDP is currently $12 trillion.
3. In war, we expect that every replacement of a Republican with a Democrat will save the
Federal government (0.34%)($12 trillion) = $40.6 billion.
© Copyright 2003. Do not distribute or copy without permission.
324
Regime Change
Implications of regime change:
1.
2.
Parameter estimates may be biased and inconsistent.
Standard deviations of parameter estimates may be biased and inconsistent.
If the data is time-series and the regime shift will not occur again, then parameter
estimates will be biased but consistent.  As more data is added, the regime shift is
pushed further into the past and becomes increasingly insignificant.
Unlike the cases of non-stationarity and non-linearity, the R2 is a reliable estimator.
Therefore, you can compare the adjusted R2’s in models with and without regime
change corrections to decide whether or not it is necessary to account for a regime
change.
© Copyright 2003. Do not distribute or copy without permission.
325
Regime Change
Detecting regime change:
1.
2.
3.
4.
5.
Create a dummy variable that is 1 in one state and 0 in the other state.
Include the dummy itself as a regressor.
For each regressor, X, create a new regressor that is the dummy multiplied by X.
Include all of these new regressors in the regression.
Test the hypotheses that the coefficients attached to the dummy and the new
regressors is zero.
A parameter estimate that fails the “zero test” indicates the presence of a regime
shift for that regressor (or the constant term).
© Copyright 2003. Do not distribute or copy without permission.
326
Regime Change
Correcting for regime change:
1.
2.
3.
4.
After determining which regressors (and/or the constant) are subject to regime
changes, include dummies for those regressors (and/or the constant).
You can correct for regime change using the level or deviation approach. You can
use different approaches for different regressors (and/or the constant).
For the deviation approach: For each regressor, X, associated with a regime
change, generate a new regressor: (D)(X). Include this new regressor in the
regression model.
For the level approach: For each regressor, X, associated with a regime change,
generate two new regressors: (D)(X) and (1–D)(X). Remove the original regressor
X from the regression and replace it with these two new regressors.
© Copyright 2003. Do not distribute or copy without permission.
327
Omitted Variables
An omitted variable is an explanatory regressor that belongs in the regression model (i.e.
the explanatory variable has a significant impact on the outcome variable) but which
does not appear in the regression model.
Example:
Suppose an outcome variable, Y, is determined by two explanatory variables, X and W.
This results in the true regression model:
Yi  0  1X i  2Wi  u i
Suppose we hypothesize a different regression model that excludes W.
Yi  0  1X i  u i
When we estimate the hypothesized model, OLS will assign some of the impact that
should have gone to W to the constant term and to X. This will result in the parameter
estimates being biased and inconsistent.
© Copyright 2003. Do not distribute or copy without permission.
328
Omitted Variables
Example:
Data Set #11 contains voter demographics and the percentage of voters (by voting
district) who claim to have voted for a candidate in the last election. Your goal is to
attempt to use the voter demographics to predict what percentage of the vote your
candidate will garner in other districts.
You hypothesize the following regression model:
Votes Garneredi  0  1 (Incomei )  u i
H0 : 1  0
Ha : 1  0
SUMMARY OUTPUT
Looking at the marginal effect:
Regression Statistics
Multiple R
0.235498805
R Square
0.055459687
Adjusted R Square
0.014392717
Standard Error
0.079740721
Observations
25
Every $1,000 increase in average income in a district implies a
projected (1,000)(0.000002) = 0.2% increase in garnered votes.
Intercept
X Variable 1
Coefficients Standard Error
t Stat
P-value
0.471662433
0.115189263 4.094673574 0.000444586
2.32719E-06
2.00258E-06 1.162096993 0.257113527
© Copyright 2003. Do not distribute or copy without permission.
Lower 95%
0.233375611
-1.81546E-06
Upper 95%
0.709949255
6.46984E-06
329
Omitted Variables
Suppose that garnered votes are not only a function of average income within a district,
but also the disparity of income across households. Unknown to you, the true regression
model is:
Votes Garneredi  0  1 (Incomei )  2 (Income Disparityi )  u i
Your hypothesized model excludes Income Disparity therefore your model suffers from
the omitted variable problem. The results below are those you would have obtained had
you included Income Disparity in the model.
Looking at the marginal effect:
Every $1,000 increase in average income in a district implies a
projected (1,000)(0.000004) = 0.4% increase in garnered votes.
SUMMARY OUTPUT
This is twice the impact that you estimated.
Regression Statistics
Multiple R
0.541404531
R Square
0.293118866
Adjusted R Square
0.228856945
Standard Error
0.07053354
Observations
25
Intercept
X Variable 1
X Variable 2
By excluding Income Disparity from your model, you force OLS to
attribute some of the negative impact of Income Disparity to
Income. This causes your estimate of the coefficient on Income to
be biased downward.
Coefficients
Standard Error
t Stat
P-value
0.448319072
0.102249939 4.384541224 0.000235831
3.9192E-06
1.86557E-06 2.100806052 0.047339969
-6.31882E-06
2.32338E-06 -2.719665179 0.012513327
© Copyright 2003. Do not distribute or copy without permission.
Lower 95%
0.23626545
5.02412E-08
-1.11372E-05
Upper 95%
0.660372694
7.78817E-06
-1.50042E-06
330
Omitted Variables
Implications of omitted variables:
1.
2.
Parameter estimates may be biased and inconsistent.
Standard deviations of parameter estimates may be biased and inconsistent.
The higher the correlation between the omitted variable and the other variables in the
model, the greater will be the bias and inconsistency. If the omitted variable is not
correlated with one or more of the included variables, then those variables will be
unbiased and consistent.
Unlike the cases of non-stationarity and non-linearity, the R2 is a reliable estimator.
Therefore, you can compare the adjusted R2’s in models with and without regime
change corrections to decide whether or not it is necessary to account for a regime
change.
© Copyright 2003. Do not distribute or copy without permission.
331
Omitted Variables
Detecting and correcting omitted variables:
1.
2.
If you have reason to believe that a given explanatory variable is excluded, include
the variable and test if its coefficient is non-zero.
If the coefficient is non-zero, the variable should be included in the regression
model.
Warning: It is possible that, by random chance, a given explanatory variable will pass
the test for a non-zero coefficient when, in fact, the variable does not belong in the
equation. Therefore, you should first have a theoretically justifiable reason why the
variable should be included before considering inclusion.
© Copyright 2003. Do not distribute or copy without permission.
332
Extraneous Variables
An extraneous variable is an explanatory regressor that does not belong in the regression
model but which does appear in the regression model.
Example:
Suppose an outcome variable, Y, is determined by one explanatory variable: X. This
results in the true regression model:
Yi  0  1X i  u i
Suppose we hypothesize a different regression model that includes both X and another
variable, W.
Yi  0  1X i  2Wi  u i
When we estimate the hypothesized model, OLS will pick up some (randomly occurring)
relationship between W and Y, and will attribute that relationship to W when, in fact, it
should be attributed to the error term, u. This will result is the parameter estimates being
inefficient.
© Copyright 2003. Do not distribute or copy without permission.
333
Extraneous Variables
Example:
Applying the following regression model to the data in Data Set #11, we obtain the
results shown below.
Votes Garneredi  0  1 (Incomei )  2 (Income Disparityi )  u i
Intercept
X Variable 1
X Variable 2
Coefficients
Standard Error
t Stat
P-value
0.448319072
0.102249939 4.384541224 0.000235831
3.9192E-06
1.86557E-06 2.100806052 0.047339969
-6.31882E-06
2.32338E-06 -2.719665179 0.012513327
Lower 95%
0.23626545
5.02412E-08
-1.11372E-05
Upper 95%
0.660372694
7.78817E-06
-1.50042E-06
We can generate a third variable consisting of randomly selected numbers and include
this in the regression. Because this third variable does not impact the outcome variable,
the third variable is extraneous. The results of this regression are shown below.
Votes Garneredi  0  1 (Incomei )  2 (Income Disparityi )  3 (Randomi )  u i
Intercept
X Variable 1
X Variable 2
X Variable 3
Coefficients
Standard Error
t Stat
P-value
0.448888291
0.105133686 4.269690417 0.000340931
3.93864E-06
1.94016E-06 2.030056577 0.05520916
-6.33939E-06
2.40567E-06 -2.635186476 0.015473615
-0.00262222
0.046489209 -0.05640492 0.955552469
Lower 95%
0.230250784
-9.61497E-08
-1.13423E-05
-0.099301839
Upper 95%
0.667525797
7.97342E-06
-1.33652E-06
0.094057399
The presence of an extraneous variable
increases the standard errors of the
parameter estimates.
© Copyright 2003. Do not distribute or copy without permission.
334
Extraneous Variables
Implications of extraneous variables:
1.
2.
Parameter estimates are unbiased and consistent.
Parameter estimates are inefficient.
Because the implications of extraneous variables are much less onerous than those of
omitted variables, when in doubt as to whether to include a given explanatory variables
in a model, it is usually wise to err on the side of including rather than excluding.
© Copyright 2003. Do not distribute or copy without permission.
335
Extraneous Variables
Detecting and correcting extraneous variables:
1.
2.
If you have reason to believe that a given explanatory variable is extraneous, test
whether the coefficient attached to the variable is (statistically) zero.
If the coefficient is zero, the variable should be excluded from the regression
model.
Warning: It is possible that, by random chance, a given explanatory variable will pass
the test for a zero coefficient when, in fact, the variable does belong in the equation.
Therefore, if you have a theoretically justifiable reason for why the variable should be
included in the model, you may want to leave the variable in the model even if its
coefficient is zero. If the variable truly does influence the outcome variable, the
coefficient may come up as non-zero with different sample data.
© Copyright 2003. Do not distribute or copy without permission.
336
Multicollinearity
Multicollinearity occurs when two or more of the explanatory variables are correlated.
Example:
Data Set #12 contains clinical trial data for a new blood pressure drug. Using the data,
estimate the following regression model.
Blood Pressurei  0  1 (Dosagei )  2 (Reported Stressi )  3 (Daily Caffeine Intakei )  u i
Dosage of the drug appears to have a strongly significant impact
on blood pressure (p = 0.02).
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.675097429
R Square
0.455756539
Adjusted R Square
0.4202624
Standard Error
17.26241704
Observations
50
Intercept
X Variable 1
X Variable 2
X Variable 3
Stress appears to have a slightly significant affect on blood
pressure (p = 0.08).
Caffeine intake appears not to affect blood pressure (p = 0.39).
Coefficients Standard Error
t Stat
111.7045551
10.91943115 10.22988777
-0.10538001
0.042907405 -2.455986577
3.549658078
1.950808713
1.81958285
1.557211753
1.780563939 0.874560985
© Copyright 2003. Do not distribute or copy without permission.
P-value
1.96572E-13
0.01788627
0.075334043
0.386356235
Lower 95%
Upper 95%
89.72490125
133.684209
-0.191748054 -0.019011967
-0.37711244 7.476428595
-2.026874137 5.141297643
337
Multicollinearity
Multicollinearity occurs when two or more of the explanatory variables are correlated.
Example:
Now estimate the model with Daily Caffeine Intake removed.
Blood Pressurei  0  1 (Dosagei )  2 (Reported Stressi )  u i
Dosage of the drug appears to have a strongly significant impact
on blood pressure (p = 0.02).
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.668361599
R Square
0.446707226
Adjusted R Square
0.423162853
Standard Error
17.21918057
Observations
50
Intercept
X Variable 1
X Variable 2
Stress appears to have a remarkably significant affect on blood
pressure (p = 0.00).
The results for the marginal impact of stress on blood pressure
changed dramatically when we dropped Caffeine Intake from the
model.
Coefficients Standard Error
t Stat
115.0315253
10.20971156 11.26687318
-0.105731825
0.042798055 -2.470481988
5.046727412
0.933290929 5.407453621
© Copyright 2003. Do not distribute or copy without permission.
P-value
5.93644E-15
0.017177119
2.09574E-06
Lower 95%
Upper 95%
94.49225435 135.5707963
-0.191830325 -0.019633324
3.169190011 6.924264813
338
Multicollinearity
Multicollinearity occurs when two or more of the explanatory variables are correlated.
Example:
Now estimate the model with Daily Caffeine Intake included and Reported Stress
removed.
Blood Pressurei  0  1 (Dosagei )  3 (Daily Caffeine Intakei )  u i
Dosage of the drug appears to have a strongly significant impact
on blood pressure (p = 0.02).
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.645433374
R Square
0.41658424
Adjusted R Square
0.391758038
Standard Error
17.6817017
Observations
50
Intercept
X Variable 1
X Variable 2
Caffeine Intake appears to have a remarkably significant affect on
blood pressure (p = 0.00).
The results for the marginal impact of caffeine on blood pressure
changed dramatically when we dropped Reported Stress from the
model.
Coefficients Standard Error
t Stat
110.3315158
11.15791351 9.888185256
-0.107984588
0.043925114 -2.458379208
4.400144179
0.874724943 5.030317487
© Copyright 2003. Do not distribute or copy without permission.
P-value
4.59794E-13
0.017696906
7.59199E-06
Lower 95%
Upper 95%
87.88471037 132.7783213
-0.196350436 -0.019618739
2.640426232 6.159862125
339
Multicollinearity
Example:
The results you are seeing are typical of multicollinearity. It is likely that Caffeine Intake
and Reported Stress are correlated. Because they are correlated, they (at least in part)
reflect the same information. When you include only one (either one) of the regressors in
the model, you get a significant marginal effect. But, when you include both, OLS
attempts to allocate an amount of explanatory that is worthy of only one regressor to
two regressors. As a result, neither of them appear overly significant.
Coefficients
Standard Error
P-value
ß0
ß1
111.705
10.919
0.000
-0.105
0.043
0.018
ß2
ß3
3.550
1.951
0.075
1.557
1.781
0.386
Coefficients Standard Error
10.210
115.032
P-value
0.000
Reported Stress included
-0.106
0.043
0.017
Caffeine Intake excluded
5.047
0.933
0.000
ß0
ß1
ß2
Coefficients
Standard Error
All regressors included
P-value
ß0
110.332
11.158
0.000
Reported Stress excluded
ß1
ß3
-0.108
0.044
0.018
Caffeine Intake included
4.400
0.875
0.000
© Copyright 2003. Do not distribute or copy without permission.
340
Multicollinearity
Implications of multicollinearity:
1.
2.
Parameter estimates are unbiased and consistent.
Parameter estimates are inefficient.
The higher the correlation between the multicollinear regressors, the greater the
inefficiency (i.e. the greater the standard errors associated with the parameter
estimates).
In the extreme case of perfect multicollinearity (one explanatory regressor is an exact
linear function of another), the regression will fail. Either the software will return an
error or the results will show an R2 of one and standard errors of zero or infinity.
© Copyright 2003. Do not distribute or copy without permission.
341
Multicollinearity
Detecting multicollinearity:
1.
2.
To detect multicollinearity, calculate the Variance Inflation Factor (VIF) for each
explanatory variable.
A VIF greater than 4 indicates detectable multicollinearity. A VIF greater than 10
indicates severe multicollinearity.
Correcting multicollinearity:
The correction for multicollinearity often introduces worse anomalies than the
multicollinearity. The correction is to drop from the model the explanatory variable with
the greatest VIF. However, if the offending explanatory variable does affect the outcome
variable, then by dropping the variable you eliminate multicollinearity but create an
omitted variable.
As the implications of the omitted variable anomaly are more onerous than those of
multicollinearity, it is usually desirable to just live with the multicollinearity.
An exception is in the case of severe multicollinearity (a VIF greater than 10). In this
case, the bias and inconsistency caused by omitting the variable may be of less
consequence than the inefficiency caused by the multicollinearity.
© Copyright 2003. Do not distribute or copy without permission.
342
Multicollinearity
Variance Inflation Factor:
To compute the VIF for explanatory regressor j, regress explanatory variable j on a
constant term and all of the other explanatory regressors.
VIFj 
1
1  R 2j
© Copyright 2003. Do not distribute or copy without permission.
343
Multicollinearity
Example:
Calculate the VIF’s for Dosage, Reported Stress and Caffeine Intake.
Dosagei  0  2 (Reported Stressi )  3 (Daily Caffeine Intakei )  u i
VIFDosage 
1
 1.01
1  0.0076
Reported Stressi  0  1 (Dosagei )  3 (Daily Caffeine Intakei )  u i
VIFReported Stress 
1
 4.38
1  0.7717
Daily Caffeine Intakei  0  1 (Dosagei )  2 (Reported Stressi )  u i
VIFDaily Caffeine Intake 
1
 4.38
1  0.7715
© Copyright 2003. Do not distribute or copy without permission.
344
Multicollinearity
The VIF’s indicate that there is detectable multicollinearity for Reported Stress and Daily
Caffeine Intake. However, because the VIF’s are well less than 10, we would not drop
either variable from the model.
VIFDosage 
1
 1.01
1  0.0076
VIFReported Stress 
1
 4.38
1  0.7717
VIFDaily Caffeine Intake 
1
 4.38
1  0.7715
© Copyright 2003. Do not distribute or copy without permission.
345
Summary of Statistical Anomalies
Anomaly
Properties of OLS Parameter Estimates
Non-stationarity
Biased, inconsistent, inefficient
Non-linearity
Biased, inconsistent, inefficient
Regime change
Biased, (possibly) inconsistent, inefficient
Omitted variables
Biased, inconsistent, inefficient
Extraneous variables
Unbiased, consistent, inefficient
Multicollinearity
Unbiased, consistent, inefficient
© Copyright 2003. Do not distribute or copy without permission.
346