Example

Marginal Probability Probability of a single event occurring. Event A = “price of IBM stock rises by at least $1 in one day” Pr(A) = 0.04 = 4% © Copyright 2003. Do not distribute or copy without permission. 1 Joint Probability Probability of all of multiple events occurring. Event A = “price of IBM stock rises by at least $1 in one day” Event B = “price of GE stock rises by at least $1 in one day” Pr(A) = 0.04 = 4% Pr(B) = 0.01 = 1% “Probability of both IBM and GE rising by at least $1 in one day” = Pr(A and B) = 0.02 = 2% © Copyright 2003. Do not distribute or copy without permission. 2 Joint Probability Two events are independent if the occurrence of one is not contingent on the occurrence of the other. A = “Price of IBM rises by at least $1 in one day.” B = “Price of IBM rises by at least $1 in one week.” The events are not independent because, the an increase in the probability of A implies an increase in the probability of B. For independent events: Pr(A and B) = Pr(A) Pr(B) © Copyright 2003. Do not distribute or copy without permission. 3 Disjoint Probability Probability of any of multiple events occurring. Event A = “price of IBM stock rises by at least $1 in one day” Event B = “price of GE stock rises by at least $1 in one day” Pr(A) = 0.04 = 4% Pr(B) = 0.01 = 1% Pr(A or B) = P(A) + P(B) – P(A and B) “Probability of either IBM or GE rising by at least $1 in one day” = Pr(A or B) = 0.04 + 0.01 – 0.02 = 0.03 © Copyright 2003. Do not distribute or copy without permission. 4 Venn Diagram Area 1 B A 2 © Copyright 2003. Do not distribute or copy without permission. 3 4 Meaning 1 A’ and B’ 2 A and B’ 3 A and B 4 A’ and B 23 A 34 B 124 A’ or B’ 123 A or B’ 234 A or B 134 A’ or B 1234 A or A’ 1234 B or B’ Empty A and A’ Empty B and B’ 5 Venn Diagram A = “Price of IBM stock rises by at least $1 in one day.” 1 – 0.02 – 0.02 – 0.01 = 0.95 B A 0.02 0.02 0.01 B = “Price of GE stock rises by at least $1 in one day.” Pr(A and B) = 0.02 Pr(A) = 0.04 Pr(B) = 0.03 © Copyright 2003. Do not distribute or copy without permission. 6 Venn Diagram A = “Price of IBM stock rises by at least $1 in one day.” B = “Price of GE stock rises by at least $1 in one day.” 0.95 B A What is the probability of the price of GE rising by at least $1 and the price of IBM not rising by at least $1? Pr(B and A’) = Pr(A’ and B) 0.02 0.02 0.01 = 0.01 What is the probability of neither the price of IBM rising by at least $1 nor the price of GE rising by at least $1? Pr(A’ and B’) = 0.95 © Copyright 2003. Do not distribute or copy without permission. 7 Conditional Probability Probability of an event occurring given that another event has already occurred. Event A = “price of IBM stock rises by at least $1 in one day” Event B = “price of IBM stock rises by at least $1 in same week” Pr(B|A) = Pr(A and B) / Pr(A) Pr(A|B) = Pr(A and B) / Pr(B) Pr(A) = 0.04 Pr(B) = 0.02 Pr(A and B) = 0.01 Pr(B|A) = 0.01 / 0.04 = 0.25 Pr(A|B) = 0.01 / 0.02 = 0.50 © Copyright 2003. Do not distribute or copy without permission. 8 Conditional Probability A = “Price of IBM stock rises by at least $1 in one day.” B A 0.03 0.01 0.01 B = “Price of IBM stock rises by at least $1 in same week.” Pr(A and B) = 0.01 Pr(A) = 0.04 Pr(B) = 0.02 Pr(A|B) = 0.01 / (0.01+0.01) Pr(B|A) = 0.01 / (0.01+0.03) © Copyright 2003. Do not distribute or copy without permission. 9 Conditional Probability Table shows number of NYC police officers promoted and not promoted. Question: Did the force exhibit gender discrimination in promoting? Promoted Not Promoted Male 288 672 Female 36 204 Define the events. There are two events… 1. An officer can be male. 2. An officer can be promoted. Being female is not a separate event  it is “not being a male.” Events M = Being a male P = Being promoted © Copyright 2003. Do not distribute or copy without permission. M’ = Being a female P’ = Not being promoted 10 Conditional Probability Promoted Not Promoted Male 288 672 Female 36 204 204 17% P M 672 288 36 56% 24% 3% © Copyright 2003. Do not distribute or copy without permission. Divide all areas by 1,200 to find the probability associated with each area. 11 Conditional Probability Promoted Not Promoted Male 288 672 Female 36 204 What is the probability of being male and being promoted? 17% P M 56% 24% 3% Pr(M and P) = 0.24 What is the probability of being female and being promoted? Pr(M’ and P) = 0.03  Males appear to be promoted at 8 times the frequency of females. © Copyright 2003. Do not distribute or copy without permission. 12 Conditional Probability Promoted Not Promoted Male 288 672 But, perhaps Pr(M and P) is greater than Pr(M’ and P) simply because there are more males on the force. Female 36 204 17% P M 56% 24% 3% The comparison we want to make is Pr(P|M) vs. Pr(P|M’). Pr(P|M) = Pr(P and M) / Pr(M) = 0.24 / (0.56 + 0.24) = 0.3 Pr(P|M’) = Pr(P and M’) / Pr(M’) = 0.03 / (0.03 + 0.17) = 0.15  Males are promoted at 2 times the frequency of females. © Copyright 2003. Do not distribute or copy without permission. 13 Mutually Exclusive and Jointly Exhaustive Events A set of events is mutually exclusive if no more than one of the events can occur.  A = IBM stock rises by at least $1, B = IBM stock falls by at least $1 A and B are mutually exclusive but not jointly exhaustive A set of events is jointly exhaustive if at least one of the events must occur.  A = IBM stock rises by at least $1, B = IBM stock rises by at least $2, C = IBM stock rises by less than $1 (or falls) A, B, and C are jointly exhaustive but not mutually exclusive A set of events is mutually exclusive and jointly exhaustive if exactly one of the events must occur.  A = IBM stock rises, B = IBM stock falls, C = IBM stock does not change A, B, and C are mutually exclusive and jointly exhaustive © Copyright 2003. Do not distribute or copy without permission. 14 Bayes’ Theorem Pr(A|B) = Pr(B|A) Pr(A) / Pr(B) For N mutually exclusive and jointly exhaustive events, Pr(B) = Pr(B|A1) Pr(A1) + Pr(B|A2) Pr(A2) + … + Pr(B|AN) Pr(AN) © Copyright 2003. Do not distribute or copy without permission. 15 Bayes’ Theorem Your firm purchases steel bolts from two suppliers: #1 and #2. 65% of the units come from supplier #1; the remaining 35% come from supplier #2. Inspecting the bolts for quality is costly, so your firm only inspects periodically. Historical data indicate that 2% of supplier #1’s units fail, and 5% of supplier #2’s units fail. During production, a bolt fails causing a production line shutdown. What is the probability that the defective bolt came from supplier #1? The naïve answer is that there is a 65% chance that the bolt came from supplier #1 since 65% of the bolts come from supplier #1. The naïve answer ignores the fact that the bolt failed. We want to know not Pr(bolt came from #1), but Pr(bolt came from #1 | bolt failed). © Copyright 2003. Do not distribute or copy without permission. 16 Bayes’ Theorem Define the following events: S1 = bolt came from supplier #1, S2 = bolt came from supplier #2 F = bolt fails Solution: We know: Pr(F | S1) = 2%, Pr(S1) = 65%, Pr(F | S2) = 5% We want to know: Pr(S1 | F) Bayes’ Theorem: Pr(S1 | F) = Pr(F | S1) Pr(S1) / Pr(F) Because S1 and S2 are mutually exclusive and jointly exhaustive: Pr(F) = Pr(F | S1) Pr(S1) + Pr(F | S2) Pr(S2) = (2%)(65%) + (5%)(35%) = 3.1% Therefore: © Copyright 2003. Do not distribute or copy without permission. Pr(S1 | F) = (2%) (65%) / (3.1%) = 42% 17 Probability Measures: Summary Pr(A and B) = Pr(A) Pr(B) where A and B are independent events Pr(A or B) = Pr(A) + Pr(B) – Pr(A and B) Pr(A|B) = Pr(A and B) / Pr(B) = Pr(B|A) Pr(A) / Pr(B) Pr(A) = Pr(A|B1) Pr(B1) + Pr(A|B2) Pr(B2) + … + Pr(A|Bn) Pr(Bn) where B1 through Bn are mutually exclusive and jointly exhaustive © Copyright 2003. Do not distribute or copy without permission. 18 Probability Measures: Where We’re Going Random events Probabilities Given Simple Probability Joint Probability Disjoint Probability Conditional Probability © Copyright 2003. Do not distribute or copy without permission. Probabilities not Given Random event is discrete Random event is continuous Binomial Hypergeometric Poisson Negative binomial Exponential Normal t Log-normal Chi-square F 19 Probability Distributions So far, we have seen examples in which the probabilities of events are known (e.g. probability of a bolt failing, probability of being male and promoted). The behavior of a random event (or “random variable”) is summarized by the variable’s probability distribution. A probability distribution is a set of probabilities, each associated with a different event for all possible events. Example: A die is a random variable. There are 6 possible events that can occur. The probability of each event occurring is the same (1/6) for all the events. We call this distribution a uniform distribution. © Copyright 2003. Do not distribute or copy without permission. 20 Probability Distributions Mechanism that selects one event out of all possible events. Example: Let X be the random variable defined as the roll of a die. There are six possible events: X = {1, 2, 3, 4, 5, 6}. Pr(X Pr(X Pr(X Pr(X Pr(X Pr(X = = = = = = 1) 2) 3) 4) 5) 6) = = = = = = 1/6 1/6 1/6 1/6 1/6 1/6 = = = = = = 16.7% 16.7% 16.7% 16.7% 16.7% 16.7% Function that gives the probability of each event occurring. In general, we say that the probability distribution function for X is: Pr(X = k) = 0.167 and the cumulative distribution function for X is: Pr(X  k) = 0.167 k Function that gives the probability of any one of a set of events occurring. © Copyright 2003. Do not distribute or copy without permission. 21 Discrete vs. Continuous Distributions Discrete vs. Continuous Distributions In discrete distributions, the random variable takes on specific values. For example: If X can take on the values {1, 2, 3, 4, 5, …}, then X is a discrete random variable.  Number of profitable quarters is a discrete random variable. If X can take on any value between 0 and 10, then X is a continuous random variable.  P/E ratio is a continuous random variable. © Copyright 2003. Do not distribute or copy without permission. 22 Discrete Distributions Terminology Trial Success An opportunity for an event to occur or not occur. The occurrence of an event. © Copyright 2003. Do not distribute or copy without permission. 23 Binomial Distribution Binomial Distribution The binomial distribution gives the probability of an event occurring multiple times. N x p Number of trials Number of successes Probability of a single success N  Pr(x successes out of N trials)    p x (1  p )N x x  where N  N!     x  x !(N  x )! © Copyright 2003. Do not distribute or copy without permission. mean  Np variance  Np 1  p  24 Binomial Distribution N  Pr(x successes out of N trials)    p x (1  p )N x x  N  N!     x  x !(N  x )! Example A CD manufacturer produces CD’s in batches of 10,000. On average, 2% of the CD’s are defective. A retailer purchases CD’s in batches of 1,000. The retailer will return any shipment if 3 or more CD’s are found to be defective. For each batch received, the retailer inspects thirty CD’s. What is the probability that the retailer will return the batch? N = 30 trials  30  x = 3 successes Pr(3 successes out of 30 trials)    0.023 (1  0.02)303  0.019  1.9% 3 p = 0.02 © Copyright 2003. Do not distribute or copy without permission. 25 Binomial Distribution Example A CD manufacturer produces CD’s in batches of 10,000. On average, 2% of the CD’s are defective. A retailer purchases CD’s in batches of 1,000. The retailer will return any shipment if 3 or more CD’s are found to be defective. For each batch received, the retailer inspects thirty CD’s. What is the probability that the retailer will return the batch? N = 30 trials  30  x = 3 successes Pr(3 successes out of 30 trials)    0.023 (1  0.02)303  0.019  1.9% 3 p = 0.02 Error The formula gives us the probability of exactly 3 successes out of 30 trials. But, the retailer will return the shipment if it finds at least 3 defective CD’s. What we want is Pr(3 out of 30) + Pr(4 out of 30) + … + Pr(30 out of 30) © Copyright 2003. Do not distribute or copy without permission. 26 Binomial Distribution N = 30 trials  30  x = 3 successes Pr(3 successes out of 30 trials)    0.023 (1  0.02)303  0.019  1.9% 3 p = 0.02 N = 30 trials  30  x = 4 successes Pr(4 successes out of 30 trials)    0.024 (1  0.02)304  0.003  0.3% 4 p = 0.02 N = 30 trials  30  x = 5 successes Pr(5 successes out of 30 trials)    0.025 (1  0.02)305  0.0003  0.03% 5 p = 0.02 Etc. out to x = 30 successes. Alternatively Because Pr(0 or more successes) = 1, we have an easier path to the answer: Pr(3 or more successes) = 1 – Pr(2 or fewer successes) © Copyright 2003. Do not distribute or copy without permission. 27 Binomial Distribution N = 30 trials x = 0 successes p = 0.02  30  Pr(0 successes out of 30 trials)    0.020 (1  0.02)300  0.545 0 N = 30 trials x = 1 successes p = 0.02  30  Pr(1 successes out of 30 trials)    0.021 (1  0.02)301  0.334 1 N = 30 trials x = 2 successes p = 0.02  30  Pr(2 successes out of 30 trials)    0.022 (1  0.02)302  0.099 2  Pr(2 or fewer successes) = 0.545 + 0.334 + 0.099 = 0.978  Pr(3 or more successes) = 1 – 0.978 = 0.022 = 2.2% © Copyright 2003. Do not distribute or copy without permission. 28 Binomial Distribution Using the Probabilities worksheet: 1. 2. 3. 4. 5. Find the section of the worksheet titled “Binomial Distribution.” Enter the probability of a single success. Enter the number of trials. Enter the number of successes. For “Cumulative?” enter FALSE to obtain Pr(x successes out of N trials); enter TRUE to obtain Pr( x successes out of N trials).  Example: Binomial Distribution Prob of a Single Success Number of Trials Number of Successes Cumulative? P(# of successes) 1 - P(# of successes) 0.02 30 2 TRUE 0.978 0.022 TRUE yields Pr(x  2) instead of Pr(x = 2) Pr(x  2) 1 – Pr(x  2) = Pr(x  3) © Copyright 2003. Do not distribute or copy without permission. 29 Binomial Distribution Application: Management proposes tightening quality control so as to reduce the defect rate from 2% to 1%. QA estimates that the resources required to implement the additional quality controls will cost the firm an additional $70,000 per year. Suppose the firm ships 10,000 batches of CD’s annually. It costs the firm $1,000 every time a batch is returned. Is it worth it for the firm to implement the additional quality controls? Low QA: Defect rate = 2% Pr(batch will be returned) = Pr(3 or more defects out of 30) = 2.2% Expected annual cost of product returns = (2.2%)($1,000 per batch)(10,000 batches shipped annually) = $220,000 Going with improved QA results in cost savings of $190,000 at a High QA: cost of $70,000 for a net gain of $120,000. Defect rate = 1% Pr(batch will be returned) = Pr(3 or more defects out of 30) = 0.3% Expected annual cost of product returns = (0.3%)($1,000 per batch)(10,000 batches shipped annually) = $30,000 © Copyright 2003. Do not distribute or copy without permission. 30 Binomial Distribution Application: Ford suspects that the tread on Explorer tires will separate from the tire causing a fatal accident. Tests indicate that this will happen on one set of (four) tires out of 5 million. As of 2000, Ford had sold 875,000 Explorers. Ford estimated the cost of a general recall to be $30 million. Ford also estimated that every accident involving separated treads would cost Ford $3 million to settle. Should Ford recall the tires? What we know: Success = tread separation Pr(a single success) = 1 / 5 million = 0.0000002 Number of trials = 875,000 Employing the pdf for the binomial distribution, we have: Pr(0 Pr(1 Pr(2 Pr(3 successes) success) successes) successes) = = = = © Copyright 2003. Do not distribute or copy without permission. 83.9% 14.7% 1.3% 0.1% 31 Binomial Distribution Expectation: An expectation is the sum of the probabilities of all possible events multiplied by the outcome of each event. If there are three mutually exclusive and jointly exhaustive events: A, B, and C. The costs to a firm of events A, B, and C occurring are, respectively, TCA, TCB, and TCC. The probabilities of events A, B, and C occurring are, respectively, pA, pB, and pC.  The expected cost to the firm is: E(cost) = (TCA)(pA) +(TCB)(pB) + (TCC)(pC) Should Ford issue a recall? Issue recall: Cost = $30 million Do not issue recall: E(cost) = Pr(0 incidents)(Cost of 0 incidents) + Pr(1 incident)(Cost of 1 incident) + …  (83.9%)($0 m) + (14.7%)($3 m) + (1.3%)($6 m) + (0.1%)($9 m)  $528,000 © Copyright 2003. Do not distribute or copy without permission. 32 Hypergeometric Distribution Hypergeometric Distribution The hypergeometric distribution gives the probability of an event occurring multiple times when the number of possible successes is fixed. N n X x Number Number Number Number of of of of possible trials actual trials possible successes actual successes  X  N  X     x  n  x   Pr(x successes out of n trials)  N    n  where N  N!    x  x !(N  x )! © Copyright 2003. Do not distribute or copy without permission. 33 Hypergeometric Distribution  X  N  X     x  n  x   Pr(x successes out of n trials)  N    n  Example A CD manufacturer ships a batch of 1,000 CD’s to a retailer. The manufacturer knows that 20 of the CD’s are defective. The retailer will return any shipment if 3 or more CD’s are found to be defective. For each batch received, the retailer inspects thirty CD’s. What is the probability that the retailer will return the batch? N n X x = = = = 1,000 possible trials 30 actual trials 20 possible successes 3 actual successes © Copyright 2003. Do not distribute or copy without permission.  20  1000  20     3   30  3   Pr(3 successes out of 30 trials)   0.017 1000     30  34 Hypergeometric Distribution Example A CD manufacturer ships a batch of 1,000 CD’s to a retailer. The manufacturer knows that 20 of the CD’s are defective. The retailer will return any shipment if 3 or more CD’s are found to be defective. For each batch received, the retailer inspects thirty CD’s. What is the probability that the retailer will return the batch? N n X x = = = = 1,000 possible trials 30 actual trials 20 possible successes 3 actual successes  20  1000  20     3   30  3   Pr(3 successes out of 30 trials)   0.017 1000     30  Error The formula gives us the probability of exactly 3 successes. The retailer will return the shipment if there are 3 or more defects. Therefore, we want Pr(return shipment) = Pr(3 defects) + Pr(4 defects) + … + Pr(20 defects) Note: There are a maximum of 20 defects. © Copyright 2003. Do not distribute or copy without permission. 35 Hypergeometric Distribution N n X x = = = = 1,000 possible trials 30 actual trials 20 possible successes 0 actual successes  20  1000  20     0   30  0   Pr(0 successes out of 30 trials)   0.541 1000     30  N n X x = = = = 1,000 possible trials 30 actual trials 20 possible successes 1 actual successes  20  1000  20     1   30  1   Pr(1 successes out of 30 trials)   0.341 1000     30  N n X x = = = = 1,000 possible trials 30 actual trials 20 possible successes 2 actual successes  20  1000  20     2   30  2   Pr(2 successes out of 30 trials)   0.099 1000     30  Pr(return shipment) = 1 – (0.541 + 0.341 + 0.099 = 0.019 = 1.9% © Copyright 2003. Do not distribute or copy without permission. 36 Hypergeometric Distribution Using the Probabilities worksheet: 1. 2. 3. 4. 5. Find the section of the worksheet titled “Hypergeometric Distribution.” Enter the number of possible trials. Enter the number of possible successes. Enter the number of actual trials. Enter the number of actual successes. Note: Excel does not offer the option of calculating the cumulative distribution function. You must do this manually. Example: Hypergeometric Distribution Number of Possible Trials Number of Possible Successes Number of Actual Trials Number of Actual Successes P(# of successes in sample) 1 - P(# successses in sample) 1,000 20 30 3 0.017 0.983 Pr(x = 3) 1 – Pr(x = 3) = Pr(x  3) © Copyright 2003. Do not distribute or copy without permission. 37 Hypergeometric Distribution If we erroneously use the binomial distribution, what is our estimate of the probability that the retailer will return the batch? Results using hypergeometric distribution Possible Trials = 1,000 Actual Trials = 30 Possible Successes = 20 Actual Successes = 0, 1, 2 Pr(return shipment) = 1 – (0.541 + 0.341 + 0.099 = 0.019 = 1.9% Results using binomial distribution Trials = 30 Successes = 0, 1, 2 Probability of a single success = 20 / 1000 = 0.02 Pr(return shipment) = 2.2% © Copyright 2003. Do not distribute or copy without permission. 38 Hypergeometric Distribution Using the incorrect distribution underestimates the probability of return by only 0.7%  who cares? Suppose each return costs us $1,000 and we ship 10,000 cases per year. Estimated cost of returns using hypergeometric distribution ($1,000)(10,000)(1.9%) = $190,000 Estimated cost of returns using binomial distribution ($1,000)(10,000)(2.2%) = $220,000  Using the incorrect distribution resulted in a $30,000 overestimation of costs. © Copyright 2003. Do not distribute or copy without permission. 39 Hypergeometric Distribution How does hypergeometric distribution differ from binomial distribution? With binomial distribution, the probability of a success does not change as trials are realized. With hypergeometric distribution, the probabilities of subsequent successes change as trials are realized. Binomial Example: Suppose the probability of a given CD being defective is 50%. You have a shipment of 2 CD’s. You inspect one of the CD’s. There is a 50% chance that it is defective. You inspect the other CD. There is a 50% chance that it is defective. On average, you expect 1 defective CD. However, it is possible that there are no defective CD’s. It is also possible that both CD’s are defective. Because the probability of defect is constant, this process is binomial. © Copyright 2003. Do not distribute or copy without permission. 40 Hypergeometric Distribution How does hypergeometric distribution differ from binomial distribution? With binomial distribution, the probability of a success does not change as trials are realized. With hypergeometric distribution, the probabilities of subsequent successes change as trials are realized. Hypergeometric Example: Suppose there is one defective CD in a shipment of two CD’s. You inspect one of the CD’s. There is a 50% chance that it is defective. You inspect the second CD. Even without inspecting, you know for certain whether the second CD will be defective or not.  Because you know that one of the CD’s is defective, if the first one is not defective, then the second one must be defective.  If the first one is defective, then the second one cannot be defective. Because the probability of the second CD being defective depends on whether or not the first CD was defective, the process is hypergeometric. © Copyright 2003. Do not distribute or copy without permission. 41 Hypergeometric Distribution Example Andrew Fastow, former CFO of Enron, was tried for securities fraud. As is usual in these cases, if the prosecution requests documents, then the defense is obligated to surrender those documents – even if the documents contain information that is damaging to the defense. One tactic is for the defense to submit the requested documents along with many other documents (called “decoys”) that are not damaging to the defense. The point is to bury the prosecution under a blizzard of paperwork so that it becomes difficult for the prosecution to find the few incriminating documents among the many decoys. Suppose that the prosecutor requests all documents related to Enron’s financial status. Fastow’s lawyers know that there are 10 incriminating documents among the set requested. Fastow’s lawyers also know that the prosecution will be able to examine only 50 documents between now and the trial date. If the prosecution finds no incriminating documents, it is likely that Fastow will be found not guilty. Assuming that each document requires the same amount of time to examine, and assuming that the prosecution will randomly select 50 documents out of the total for examination, how many documents (decoys plus the 10 incriminating documents) should Fastow’s lawyers submit so that the probability of the prosecution finding no incriminating documents is 90%? © Copyright 2003. Do not distribute or copy without permission. 42 Hypergeometric Distribution Example Success = an incriminating document N = unknown n = 50 X = 10 x =0 N = 4775  Pr(0 successes out of 50 trials) = 0.900 Hypergeometric Distribution Number of Possible Trials Number of Possible Successes Number of Actual Trials Number of Actual Successes P(# of successes in sample) 1 - P(# successses in sample) © Copyright 2003. Do not distribute or copy without permission. 4,775 10 50 0.900 0.100 43 Poisson Distribution Poisson Distribution The Poisson distribution gives the probability of an event occurring multiple times within a given time interval. δ x e Average number of successes per unit time. Number of successes 2.71828… e   x Pr(x successes per unit time)  x! © Copyright 2003. Do not distribute or copy without permission. 44 Poisson Distribution e   x Pr(x successes per unit time)  x! Example Over the course of a typical eight hour day, 100 customers come into a store. Each customer remains in the store for 10 minutes (on average). One salesperson can handle no more than three customers in 10 minutes. If it is likely that more than three customers will show up in a single 10-minute interval, then the store will have to hire another salesperson. What is the probability that more than 3 customers will arrive in a single 10minute interval? Time interval = 10 minutes There are 48 ten-minute intervals during an 8 hour work day. 100 customers per day / 48 ten-minute intervals = 2.08 customers per interval. δ = 2.08 successes per interval (on average) x = 4, 5, 6, … successes © Copyright 2003. Do not distribute or copy without permission. 45 Poisson Distribution Time interval = 10 minutes δ = 2.08 successes per interval x = 4, 5, 6, … successes Pr(x  4)  1  Pr(x  0)  Pr(x  1)  Pr(x  2)  Pr(x  3) e 2.08 2.080 Pr(0 successes)   0.125 0! e 2.08 2.081 Pr(1 successes)   0.260 1! e 2.08 2.082 Pr(2 successes)   0.270 2! e 2.08 2.083 Pr(3 successes)   0.187 3! © Copyright 2003. Do not distribute or copy without permission. Pr(x  4) = 1 – (0.125 + 0.260 + 0.270 + 0.187) = 0.158 = 15.8% 46 Poisson Distribution Using the Probabilities worksheet: 1. 2. 3. 4. Find the section of the worksheet titled “Poisson Distribution.” Enter the average number of successes per time interval. Enter the number of successes per time interval. For “Cumulative?” enter FALSE to obtain Pr(x successes out of N trials); enter TRUE to obtain Pr( x successes out of N trials).  Example: Poisson Distribution E(Successes / time interval) Successes / time interval Cumulative? 2.08 3 TRUE yields Pr(x  3) instead of Pr(x = 3) TRUE P(# successes in a given interval) 0.842 1 - P(# successes in a given interval) 0.158 Pr(x  3) 1 – Pr(x  3) = Pr(x  4) © Copyright 2003. Do not distribute or copy without permission. 47 Poisson Distribution Suppose you want to hire an additional salesperson on a part-time basis. On average, for how many hours per week will you need this person? (Assume a 40hour work week.) There is a 15.8% probability that, in any given 10-minute interval, more than 3 customers will arrive. During these intervals, you will need another salesperson. In one work day, there are 48 ten-minute intervals. In a 5-day work week, there are (48)(5) = 240 ten-minute intervals. On average, you need a part-time worker for 15.8% of these, or (0.158)(240) = 37.92 intervals. 37.92 ten-minute intervals = 379 minutes = 6.3 hours, or 6 hours 20 minutes. Note: An easier way to arrive at the same answer is: (40 hours)(0.158) = 6.3 hours. © Copyright 2003. Do not distribute or copy without permission. 48 Negative Binomial Distribution Negative Binomial Distribution The binomial distribution gives the probability of the xth occurrence of an event happening on the Nth trial. N x p Number of trials Number of successes Probability of a single success occurring N  1 x N x Pr(x th success occurring on the N th trials)    p (1  p )  x 1 where  N  1 ! N  1     x  1   x  1 !(N  x )! © Copyright 2003. Do not distribute or copy without permission. 49 Discrete Distributions: Summary Pertinent Information Distribution Probability of a single success Number of trials Number of successes Binomial Number Number Number Number Hypergeometric of of of of possible trials actual trials possible successes actual successes Average successes per time interval Number of successes per time interval © Copyright 2003. Do not distribute or copy without permission. Poisson 50 Continuous Distributions While the discrete distributions are useful for describing phenomena in which the random variable takes on discrete (e.g. integer) values, many random variables are continuous and so are not adequately described by discrete distributions. Example: Income, Financial Ratios, Sales. Technically, financial variables are discrete because they measure in discrete units (cents). However, the size of the discrete units is so small relative to the typical values of the random variable, that these variables behaves like continuous random variables. E.g. A firm that typically earns $10 million has an income level that is 1 billion times the size of the discrete units in which the income is measured. © Copyright 2003. Do not distribute or copy without permission. 51 Continuous Distributions The continuous uniform distribution is a distribution in which the probability of the random variable taking on a given range of values is equal for all ranges of the same size. Example: X is a uniformly distributed random variable that can take on any value in the range [1, 5]. Pr(1 Pr(2 Pr(3 Pr(4 < < < < X X X X < < < < 2) 3) 4) 5) = = = = 1/4 1/4 1/4 1/4 = = = = 0.25 0.25 0.25 0.25 Note: The probability of X taking on a specific value is zero. © Copyright 2003. Do not distribute or copy without permission. 52 Continuous Uniform Distribution The continuous uniform distribution is a distribution in which the probability of the random variable taking on a given range of values is equal for all ranges of the same size. Example: X is a uniformly distributed random variable that can take on any value in the range [1, 5]. Pr(1 Pr(2 Pr(3 Pr(4 < < < < X X X X < < < < 2) 3) 4) 5) = = = = 1/4 1/4 1/4 1/4 = = = = 0.25 0.25 0.25 0.25 Note: The probability of X taking on a specific value is zero. © Copyright 2003. Do not distribute or copy without permission. 53 Continuous Uniform Distribution Example: Pr(1 Pr(2 Pr(3 Pr(4 < < < < X X X X < < < < 2) 3) 4) 5) = = = = 1/4 1/4 1/4 1/4 = = = = 0.25 0.25 0.25 0.25 In general, we say that the probability density function for X is: pdf(X) = 0.2 for all k (note: Pr(X = k) = 0 for all k) and the cumulative density function for X is: Pr(X  k) = (k – 1) / 4 mean  a b 2 variance  b  a  b  a  12 a  minimum value of the random variable b  maximum value of the random variable © Copyright 2003. Do not distribute or copy without permission. 54 Exponential Distribution Exponential Distribution The exponential distribution gives the probability of the maximum amount of time required until the next occurrence of an event. λ x Average number of time intervals between the occurrence of successes. Maximum time intervals until the next success occurs. Pr(the next success occuring in x or fewer time intervals)  1  ex mean   1 variance   2 © Copyright 2003. Do not distribute or copy without permission. 55 Normal Distribution Many continuous random processes are normally distributed. Among them are: 1. Proportions (provided that the proportion is not close to the extremes of 0 or 1). 2. Sample Means (provided that the means are computed based on a large enough sample size). 3. Differences in Sample Means (provided that the means are computed based on a large enough sample size). 4. Mean Differences (provided that the means are computed based on a large enough sample size). 5. Most natural processes (including many economic, and financial processes). © Copyright 2003. Do not distribute or copy without permission. 56 Normal Distribution There are an infinite number of normal distributions, each with a different mean and variance. We describe a normal distribution by its mean and variance: µ = Population mean σ2 = Population variance The normal distribution with a mean of zero and a variance of one is called the standard normal distribution. µ =0 σ2 = 1 © Copyright 2003. Do not distribute or copy without permission. 57 Normal Distribution The pdf (probability density function) for normal distributions are bell-shaped. This means that the random variable can take on any value over the range +  to – , but the probability of the random variable straying from its mean decreases as the distance from the mean increases. © Copyright 2003. Do not distribute or copy without permission. 58 Normal Distribution For all normal distributions, approximately: 50% of the observations lie within   2 / 3  68% of the observations lie within    95% of the observations lie within   2 99% of the observations lie within   3 Example: Suppose the return on a firm’s stock price is normally distributed with a mean of 10% and a standard deviation of 6%. We would expect that, at any given point in time: 1. There is a 50% probability that the return on the stock is between 6% and 14% 2. There is a 68% probability that the return on the stock is between 4% and 16%. 3. There is a 95% probability that the return on the stock is between –2% and 22%. 4. There is a 99% probability that the return on the stock is between –8% and 28%. © Copyright 2003. Do not distribute or copy without permission. 59 Normal Distribution Population Measures: Population mean  Calculated using all possible observations. 2 Population variance  Sample Measures (estimates of population measures): Sample mean x Sample variance s Calculated using a subset of all possible observations. 2 Variance measures the square of the average dispersion of observations around a mean. 2 1 N Sample Variance  s  x i  x  N  1 i 1 2 Population Variance   2  1 N x i  N i 1 © Copyright 2003. Do not distribute or copy without permission. 2   60 Problem of Unknown Population Parameters If we do not have all possible observations, then we cannot compute the population mean and variance. What to do?  Take a sample of observations and use the sample mean and sample variance as estimates of the population parameters. Problem: If we use the sample mean and sample variance instead of the population mean and population variance, then we can no longer say that “50% of observations lie within   2 / 3 , etc.” In fact, the normal distribution no longer describes the distribution of observations. We must use the t-distribution. The t-distribution accounts for the fact that (1) the observations are normally distributed, and (2) we aren’t sure what the mean and variance of the distribution is. © Copyright 2003. Do not distribute or copy without permission. 61 t-Distribution There are an infinite number of t-distributions, each with different degrees of freedom. Degrees of freedom is a function of the number of observations in a data set. The more degrees of freedom (i.e. observations) exist, the closer the tdistribution is to the standard normal. For most purposes, degrees of freedom = N – 1, where N is the number of observations in the sample. The more degrees of freedom that exist, the closer the t-distribution is to the standard normal distribution. © Copyright 2003. Do not distribute or copy without permission. 62 t-Distribution The standard normal distribution is the same as the t-distribution with an infinite number of degrees of freedom. © Copyright 2003. Do not distribute or copy without permission. 63 t-Distribution Degrees of Freedom Standard Deviations © Copyright 2003. Do not distribute or copy without permission. 5 10 20 30 ∞ 2/3 47% 48% 49% 49% 50% 1 64% 66% 67% 68% 68% 2 90% 93% 94% 95% 95% 3 97% 98% 99% 99% 99% 64 t-Distribution Example: Consumer reports tests the gas mileage of seven SUV’s. They find that the sample of SUV’s has a mean mileage of 15 mpg with a standard deviation of 3 mpg. Assuming that the population of gas mileages is normally distributed, based on this sample, what percentage of SUV’s get more than 20 mpg? We don’t know the area indicated because we don’t know the properties of a t-distribution with a mean of 15 and a standard deviation of 3. s = 3 mpg area = ? However, we can convert this distribution to a distribution whose properties we do know. The formula for conversion is: 15 mpg 20 mpg Test statistic  Test value  mean standard deviation “Test value” is the value we are examining (in this case, 20 mpg), “mean” is the mean of the sample observations (in this case, 15 mpg), and “standard deviation” is the standard deviation of the sample observations (in this case, 3 mpg). © Copyright 2003. Do not distribute or copy without permission. 65 t-Distribution Example: Consumer reports tests the gas mileage of seven SUV’s. They find that the sample of SUV’s has a mean mileage of 15 mpg with a standard deviation of 3 mpg. Assuming that the population of gas mileages is normally distributed, based on this sample, what percentage of SUV’s get more than 20 mpg? s=1 s = 3 mpg area = ? t6 15 mpg 0 20 mpg Test value  mean  Test statistic standard deviation 20  15  1.67 3 © Copyright 2003. Do not distribute or copy without permission. 1.67 We can look up the area to the right of 1.67 on a t6 distribution. 66 t-Distribution s=1 s = 3 mpg area = ? t6 15 mpg 0 20 mpg 1.67 Test value  mean  Test statistic standard deviation 20  15  1.67 3 area = 0.073 t Distribution © Copyright 2003. Do not distribute or copy without permission. Test statistic Degrees of Freedom Pr(t > Test statistic) 1.670 6 7.30% Pr(t < Test statistic) 92.70% 67 t-Distribution Example: A light bulb manufacturer wants to monitor the quality of the bulbs it produces. To monitor product quality, inspectors test one bulb out of every thousand to find its burn-life. Since the production machinery was installed, inspectors have tested 30 bulbs and found an average burn-life of 1,500 hours with a standard deviation of 200. Management wants to recalibrate its machines anytime a particularly short-lived bulb is discovered. Management defines “short-lived” as a burn-life so short that 999 out of 1,000 bulbs burns longer. What is the minimum number of hours a test bulb must burn for production not to be recalibrated? s=1 s = 200 hrs area = 0.001 area = 0.001 area = 1 – 0.001 = 0.999 X hrs 1,500 hrs -3.3963 t29 0 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) 29 Pr(t < Test statistic) © Copyright 2003. Do not distribute or copy without permission. Pr(t > Critical value) 99.90% Critical Value -3.3963 68 t-Distribution Example: A light bulb manufacturer wants to monitor the quality of the bulbs it produces. To monitor product quality, inspectors test one bulb out of every thousand to find its burn-life. Since the production machinery was installed, inspectors have tested 30 bulbs and found an average burn-life of 1,500 hours with a standard deviation of 200. Management wants to recalibrate its machines anytime a particularly short-lived bulb is discovered. Management defines “short-lived” as a burn-life so short that 999 out of 1,000 bulbs burns longer. What is the minimum number of hours a test bulb must burn for production not to be recalibrated? s=1 s = 200 hrs area = 0.001 area = 0.001 t29 821 hrs 1,500 hrs -3.3963 0 Test value  mean  Test statistic standard deviation X  1500 200 © Copyright 2003. Do not distribute or copy without permission.  3.3963  X  821 69 t-Distribution Example: Continuing with the previous example, suppose we had used the normal distribution instead of the t-distribution to answer the question. The probabilities spreadsheet gives us the following results. Standard Normal Distribution (Z) Test statistic Pr(Z > Test statistic) Pr(Z < Test statistic) Pr(Z > Critical value) Critical Value Test statistic  3.09  0.10% 3.0902 Test value  mean standard deviation Test value  1,500 200 Test value  (3.09)(200)  1,500  882 © Copyright 2003. Do not distribute or copy without permission. 70 t-Distribution vs. Normal Distribution Correct distribution Using the t-distribution, we recalibrate production whenever we observe a light bulb with a life of 821 or fewer hours. Incorrect distribution Using the standard normal distribution, we recalibrate production whenever we observe a light bulb with a life of 882 or fewer hours.  By incorrectly using the standard normal distribution, we would recalibrate production too frequently. When can we use the normal distribution?  As an approximation, when the number of observations is large enough that the difference in results is negligible. The difference starts to become negligible at 30 or more degrees of freedom. For more accurate results, use the t-distribution. © Copyright 2003. Do not distribute or copy without permission. 71 Test Statistic vs. Critical Value Terminology We have been using the terms “test statistic” and “critical value” somewhat interchangeably. Which term is appropriate depends on whether the number described is being used to find an implied probability (test statistic), or represents a known probability (critical value). When we wanted to know the probability of an SUV getting more than 20 mpg, we constructed the test statistic and asked “what is probability of observing the test statistic?” When we wanted to know what cut-off to impose for recalibrating production of light bulbs, we found the critical value that gave us the probability we wanted, and then asked “what test value has the probability implied by the critical value?” © Copyright 2003. Do not distribute or copy without permission. 72 Test Statistic vs. Critical Value Example The return on IBM stock has averaged 19.3% over the past 10 years with a standard deviation of 4.5%. Assuming that past performance is indicative of future results and assuming that the population of rates of return is normally distributed, what is the probability that the return on IBM next year will be between 10% and 20%? 1. Picture the problem with respect to the appropriate distribution. 2. Determine what area(s) represents the answer to the problem. 3. Determine what area(s) you must find (this depends on how the probability table or function is defined). 4. Perform computations to find desired area based on known areas. Question asks for this area. Look up these areas. © Copyright 2003. Do not distribute or copy without permission. 73 t-Distribution Example The return on IBM stock has averaged 19.3% over the past 10 years with a standard deviation of 4.5%. Assuming that past performance is indicative of future results and assuming that the population of rates of return is normally distributed, what is the probability that the return on IBM next year will be between 10% and 20%? Convert question to form that can be analyzed. © Copyright 2003. Do not distribute or copy without permission. 74 t-Distribution Example The return on IBM stock has averaged 19.3% over the past 10 years with a standard deviation of 4.5%. Assuming that past performance is indicative of future results and assuming that the population of rates of return is normally distributed, what is the probability that the return on IBM next year will be between 10% and 20%? Test statistic  Test value  mean standard deviation Left Test statistic  10%  19.3%  2.07 4.5% Right Test statistic  © Copyright 2003. Do not distribute or copy without permission. 20%  19.3%  0.16 4.5% 75 t-Distribution Example The return on IBM stock has averaged 19.3% over the past 10 years with a standard deviation of 4.5%. Assuming that past performance is indicative of future results and assuming that the population of rates of return is normally distributed, what is the probability that the return on IBM next year will be between 10% and 20%? t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) (2.070) 9.0 96.58% 0.160 9.0 43.82% Pr(t > Critical value) 0.50% Pr(t > Critical value) 0.50% Critical Value 3.250 Critical Value 3.250 100% – 96.58% = 3.42% -2.07 © Copyright 2003. Do not distribute or copy without permission. 0.16 76 t-Distribution Example The return on IBM stock has averaged 19.3% over the past 10 years with a standard deviation of 4.5%. Assuming that past performance is indicative of future results and assuming that the population of rates of return is normally distributed, what is the probability that the return on IBM next year will be between 10% and 20%? 3.42% 43.82% 3.42% + 43.82% = 47.24% 100% – 47.24% = 52.76% There is a 53% chance that IBM will yield a return between 10% and 20% next year. © Copyright 2003. Do not distribute or copy without permission. 77 t-Distribution Example Your firm has negotiated a labor contract that requires that the firm provide annual raises no less than the rate of inflation. This year, the total cost of labor covered under the contract will be $38 million. Your CFO has indicated that the firm’s current financing can support up to a $2 million increase in labor costs. Based on the historical inflation numbers below, calculate the probability of labor costs increasing by at least $2 million next year. Year Inflation Rate Year Inflation Rate 1982 6.2% 1993 3.0% 1983 3.2% 1994 2.6% 1984 4.3% 1995 2.8% 1985 3.6% 1996 3.0% 1986 1.9% 1997 2.3% 1987 3.6% 1998 1.6% 1988 4.1% 1999 2.2% 1989 4.8% 2000 3.4% 1990 5.4% 2001 2.8% 1991 4.2% 2002 1.6% 1992 3.0% 2003 1.8% © Copyright 2003. Do not distribute or copy without permission. Calculate the mean and standard deviation for inflation. Sample mean = 3.2% Sample stdev = 1.2% 78 t-Distribution Example Your firm has negotiated a labor contract that requires that the firm provide annual raises no less than the rate of inflation. This year, the total cost of labor covered under the contract will be $38 million. Your CFO has indicated that the firm’s current financing can support up to a $2 million increase in labor costs. Based on the historical inflation numbers below, calculate the probability of labor costs increasing by at least $2 million next year. A $2 million increase on a $38 million base is a 2/38 = 5.26% increase Sample mean = 3.2% Sample stdev = 1.2% N = 22 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) t21 Pr(t < Test statistic) Test Statistic  © Copyright 2003. Do not distribute or copy without permission. 1.717 21 5.03% 94.97% 5.26%  3.2%  1.717 1.2% 79 t-Distribution Example Your firm has negotiated a labor contract that requires that the firm provide annual raises no less than the rate of inflation. This year, the total cost of labor covered under the contract will be $38 million. Your CFO has indicated that the firm’s current financing can support up to a $2 million increase in labor costs. The CFO wants to know what the magnitude of a possible “worst-case” scenario. Answer the following: “There is a 90% chance that the increase in labor costs will be no more than what amount?” 1.3232  Test Value  3.2%  Test Value  4.79% 1.2% t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) 21 50.00% Pr(t < Test statistic) 50.00% Pr(t > Critical value) 10.00% Critical Value 1.3232 A 4.79% increase on a $38 million base is (4.79%)($38 million) = $1.82 million. © Copyright 2003. Do not distribute or copy without permission. 80 t-Distribution The government has contracted a private firm to produce hand grenades. The specifications call for the grenades to have 10 second fuses. The government has received a shipment of 100,000 grenades and will test a sample of 20 grenades. If, based on the sample, the government determines that the probability of a grenade going off in less than 8 seconds exceeds 1%, then the government will reject the entire shipment. The test results are as follows. Time to Detonation 8 seconds 9 seconds 10 seconds 11 seconds 12 seconds 13 seconds Number of Grenades 2 3 10 3 1 1 In general, one would not expect time measures to be normally distributed (because time cannot be negative). However, if the ratio of the mean to the standard deviation is large enough, we can use the normal distribution as an approximation. Should the government reject the shipment? © Copyright 2003. Do not distribute or copy without permission. 81 t-Distribution Time to Detonation 8 seconds 9 seconds 10 seconds 11 seconds 12 seconds 13 seconds Number of Grenades 2 3 10 3 1 1 First: What is the ratio of the mean to the standard deviation? Mean = 10.05 seconds Standard deviation = 1.20 seconds  Ratio is 8.375. A ratio of greater than 8 is a decent heuristic. This is not a rigorous test for the appropriateness of the normal distribution. But, it is not too bad for a “quick and dirty assessment.” © Copyright 2003. Do not distribute or copy without permission. 82 t-Distribution Should the government reject the shipment? Naïve answer: Don’t reject the shipment because none of the grenades detonated in less than 8 seconds  Pr(detonation in less than 8 seconds) = 0. 12 No grenades detonated in less than 8 seconds. Number of Grenades 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Seconds to Detonation Histogram: Shows number of observations according to type © Copyright 2003. Do not distribute or copy without permission. 83 t-Distribution Should the government reject the shipment? Correct answer: We use the sample data to infer the shape of the population distribution. Inferred population distribution shows a positive probability of finding detonation times of less than 8 seconds. © Copyright 2003. Do not distribute or copy without permission. 84 t-Distribution Should the government reject the shipment? Correct answer: 1. Find the test statistic that corresponds to 8 seconds. Test statistic  2. 8  10.05  1.71 1.2 Find the area to the left of the test statistic. t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) (1.710) 19 94.82% Pr(t < Test statistic) 5.18% Pr(detonation < 8 seconds) = 5.2% -1.71 3. Reject the shipment because probability of early detonation is too high. © Copyright 2003. Do not distribute or copy without permission. 85 Lognormal Distribution In the previous example, we noted that the normal distribution may not properly describe the behavior of random variables that are bounded. A normally distributed random variable can take on any value from negative infinity to positive infinity. If the random variable you are analyzing is bounded (i.e. it cannot cover the full range from negative to positive infinity), then using the normal distribution to predict the behavior of the random variable can lead to erroneous results. Example: Using the data from the hand grenade example, the probability of a single grenade detonating in less than zero seconds is 0.0001. That means that, on average, we can expect one grenade out of every 10,000 to explode after a negative time interval. Since this is logically impossible, we must conclude that the normal distribution is not the appropriate distribution for describing time-to-detonation. © Copyright 2003. Do not distribute or copy without permission. 86 Lognormal Distribution In instances in which a random variable must take on a positive value, it is often the case that the random variable has a lognormal distribution. A random variable is lognormally distributed when the natural logarithm of the random variable is normally distributed. Example: Return to the hand grenade example. Time to Detonation 8 seconds 9 seconds 10 seconds 11 seconds 12 seconds 13 seconds Log of Time to Detonation 2.0794 2.1972 2.3026 2.3979 2.4849 2.5649 Number of Grenades 2 3 10 3 1 1 As time approaches positive infinity, ln(time) approaches positive infinity. As time approaches zero, ln(time) approaches negative infinity. © Copyright 2003. Do not distribute or copy without permission. 87 Lognormal Distribution Assuming that the times-to-detonation were normally distributed, we found a 2.6% probability of detonation occurring in under 8 seconds. Assuming that the times-to-detonation are lognormally distributed, what is the probability of detonation occurring in under 8 seconds? Log of Time to Detonation 2.0794 2.1972 2.3026 2.3979 2.4849 2.5649 Number of Grenades 2 3 t Distribution 10 Test statistic Degrees of Freedom 3 Pr(t > Test statistic) 1 Pr(t < Test statistic) 1 (1.886) 19 96.27% 3.73% Mean = 2.3010 Standard deviation = 0.1175 Test statistic  ln(8)  2.3010  1.8856 0.1175 Pr(detonation < 8 seconds) = 3.7% © Copyright 2003. Do not distribute or copy without permission. 88 Lognormal Distribution Example You are considering buying stock in a small cap firm. The firm’s sales over the past nine quarters are shown below. You expect your investment to appreciate in value next quarter provided that the firm’s sales next quarter exceed $27 million. Based on this assumption, what is the probability that your investment will appreciate in value? Quarter 1 2 3 4 5 6 7 8 9 Sales (millions) $25.2 $12.1 $27.9 $28.9 $32.0 $29.9 $34.4 $29.8 $23.2 © Copyright 2003. Do not distribute or copy without permission. Because sales cannot be negative, it may be more appropriate to model the firm’s sales as lognormal rather than normal. 89 Lognormal Distribution Example What is the probability that the firm’s sales will exceed $27 million next quarter? Quarter 1 2 3 4 5 6 7 8 9 Sales (millions) $25.2 $12.1 $27.9 $28.9 $32.0 $29.9 $34.4 $29.8 $23.2 ln(Sales) 3.227 2.493 3.329 3.364 3.466 3.398 3.538 3.395 3.144 Mean = 3.261 Standard deviation = 0.311 Test statistic  ln(27)  3.261  0.1106 0.311 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) 0.1106 8.0 45.73% Pr(t > Critical value) 1.00% Critical Value 2.896 Pr(sales exceeding $27 million next quarter) = 46%  Odds are that the investment will decline in value. © Copyright 2003. Do not distribute or copy without permission. 90 Lognormal Distribution Example Suppose we, incorrectly, assumed that sales were normally distributed. Quarter 1 2 3 4 5 6 7 8 9 Sales (millions) $25.2 $12.1 $27.9 $28.9 $32.0 $29.9 $34.4 $29.8 $23.2 Mean = 27.044 Standard deviation = 6.520 Test statistic  27  27.044  0.007 6.520 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) (0.007) 8.0 50.27% Pr(t > Critical value) 1.00% Critical Value 2.896 Pr(sales exceeding $27 million next quarter) > 50%  Odds are that the investment will increase in value.  Incorrect distribution yields opposite conclusion. © Copyright 2003. Do not distribute or copy without permission. 91 Lognormal Distribution Warning: The “mean of the logs” is not the same as the “log of the mean.”  Mean sales = 27.04  ln(27.044) = 3.298 But:  Mean of log sales = 3.261 The same is true for standard deviation  the “standard deviation of the logs” is not the same as the “log of the standard deviations.” In using the lognormal distribution, we need the “mean of the logs,” and the “standard deviation of the logs.” © Copyright 2003. Do not distribute or copy without permission. 92 Lognormal Distribution When should one use the lognormal distribution? You should use the lognormal distribution if the random variable is either non-negative or non-positive Can one use the normal distribution as an approximation of the lognormal distribution? Yes, but only when the ratio of the mean to the standard deviation is large (e.g. greater than 8). Note: If the random variable is only positive (or only negative), then you are always better off using the lognormal distribution vs. the normal or t-distributions. The rules above give guidance for using the normal or t-distributions as approximations. Hand grenade example  Mean / Standard deviation = 10.05 / 1.20 = 8.38  Normal distribution underestimated probability of early detonation by 1.1% (3.7% for lognormal vs. 2.6% for t-distribution) Quarterly sales example  Mean / Standard deviation = 27.04 / 6.52 = 4.15  Normal distribution overestimated probability of appreciation by 4.6% (45.7% for lognormal vs. 50.3% for t-distribution) © Copyright 2003. Do not distribute or copy without permission. 93 Distribution of Sample Means So far, we have looked at the distribution of individual observations.       Gas mileage for a single SUV. Burn life for a single light bulb. Return on IBM stock next quarter. Inflation rate next year. Time to detonation for a single hand grenade. Firm’s sales next quarter. In each case, we had sample means and sample standard deviations and asked, “What is the probability of the next observation lying within some range?” Note: Although we drew on information contained in a sample of many observations, the probability questions we asked always concerned a single observation.  In these cases, the random variable we analyzed was a “single draw” from the population. © Copyright 2003. Do not distribute or copy without permission. 94 Distribution of Sample Means We now want to ask probability questions about sample means. Example: EPA standards require that the mean gas mileage for a manufacturer’s cars be at least 20 mpg. Every year, the EPA takes a sampling of the gas mileages of a manufacturer’s cars. If the mean of the sample is below 20 mpg, the manufacturer is fined. In 2001, GM produced 145,000 cars. Suppose five EPA analysts each select 10 cars and measures their mileages. The analysts obtain the following results. Analyst #1 17 16 19 21 19 21 16 16 19 22 Analyst #2 22 22 19 22 25 18 16 24 18 15 © Copyright 2003. Do not distribute or copy without permission. Analyst #3 16 20 17 17 23 23 19 22 20 15 Analyst #4 21 20 22 20 18 22 19 23 17 21 Analyst #5 24 24 20 22 17 23 22 15 19 15 95 Distribution of Sample Means Analyst #1 17 16 19 21 19 21 16 16 19 22 Analyst #2 22 22 19 22 25 18 16 24 18 15 Analyst #3 16 20 17 17 23 23 19 22 20 15 Analyst #4 21 20 22 20 18 22 19 23 17 21 Analyst #5 24 24 20 22 17 23 22 15 19 15 Notice that each analyst obtained a different sample mean. The sample means are: Analyst Analyst Analyst Analyst Analyst #1: #2: #3: #4: #5: 18.6 20.1 19.2 20.3 20.1  The analysts obtain different sample means because their samples consist of different observations. Which is correct?  Each sample mean is an estimate of the population mean. The sample means vary depending on the observations picked.  The sample means are, themselves, random variables. © Copyright 2003. Do not distribute or copy without permission. 96 Distribution of Sample Means Notice that we have identified two distinct random variables: 1. The process that generates the observations is one random variable (e.g. the mechanism that determines each car’s mpg). 2. The mean of a sample of observations is another random variable (e.g. the average mpg of a sample of cars). The distribution of sample means is governed by the central limit theorem. Central Limit Theorem Regardless of the distribution of the random variable generating the observations, the sample means of the observations are t-distributed. Example: It doesn’t matter whether mileage is distributed normally, lognormally, or according to any other distribution, the sample means of gas mileages are t-distributed. © Copyright 2003. Do not distribute or copy without permission. 97 Distribution of Sample Means Example: The following slides show sample means taking from a uniformly distributed random variable. The random variable can take on any number over the range 0 through 1 with equal probability. For each slide, we see the mean of a sample of observations of this uniformly distributed random variable. © Copyright 2003. Do not distribute or copy without permission. 98 Distribution of Sample Means One Thousand Sample Means: Each Derived from 1 Observation 35 Number of Sample Means Observed 30 25 20 15 10 5 0.92 0.88 0.84 0.80 0.76 0.72 0.68 0.64 0.60 0.56 0.52 0.48 0.44 0.40 0.36 0.32 0.28 0.24 0.20 0.16 0.12 0.08 0.04 0.00 0 Value of Sample Mean © Copyright 2003. Do not distribute or copy without permission. 99 Distribution of Sample Means One Thousand Sample Means: Each Derived from 2 Observations 60 Number of Sample Means Observed 50 40 30 20 10 0.92 0.88 0.84 0.80 0.76 0.72 0.68 0.64 0.60 0.56 0.52 0.48 0.44 0.40 0.36 0.32 0.28 0.24 0.20 0.16 0.12 0.08 0.04 0.00 0 Value of Sample Mean © Copyright 2003. Do not distribute or copy without permission. 100 Distribution of Sample Means One Thousand Sample Means: Each Derived from 5 Observations 80 Number of Sample Means Observed 70 60 50 40 30 20 10 0.92 0.88 0.84 0.80 0.76 0.72 0.68 0.64 0.60 0.56 0.52 0.48 0.44 0.40 0.36 0.32 0.28 0.24 0.20 0.16 0.12 0.08 0.04 0.00 0 Value of Sample Mean © Copyright 2003. Do not distribute or copy without permission. 101 Distribution of Sample Means One Thousand Sample Means: Each Derived from 20 Observations 160 Number of Sample Means Observed 140 120 100 80 60 40 20 0.92 0.88 0.84 0.80 0.76 0.72 0.68 0.64 0.60 0.56 0.52 0.48 0.44 0.40 0.36 0.32 0.28 0.24 0.20 0.16 0.12 0.08 0.04 0.00 0 Value of Sample Mean © Copyright 2003. Do not distribute or copy without permission. 102 Distribution of Sample Means One Thousand Sample Means: Each Derived from 200 Observations 400 Number of Sample Means Observed 350 300 250 200 150 100 50 0.92 0.88 0.84 0.80 0.76 0.72 0.68 0.64 0.60 0.56 0.52 0.48 0.44 0.40 0.36 0.32 0.28 0.24 0.20 0.16 0.12 0.08 0.04 0.00 0 Value of Sample Mean © Copyright 2003. Do not distribute or copy without permission. 103 Distribution of Sample Means Notice two things that occur as we increase the number of observations that feed into each sample. 1. The distribution of sample means very quickly becomes “bell shaped.” This is the result of the central limit theorem – basing a sample mean on more observations causes the sample mean’s distribution to approach the normal distribution. 2. The variance of the distribution decreases. This is the result of our next topic – the variance of a sample mean. The variance of a sample mean decreases as the number of observations comprising the sample increases. Standard deviation of the observations Standard deviation of sample means  Number of observations comprising the sample means (called the “standard error”) © Copyright 2003. Do not distribute or copy without permission. 104 Distribution of Sample Means Example: In the previous slides, we saw sample means of observations drawn from a uniformly distributed random variable. The variance of a uniformly distributed random variable that ranges from 0 to 1 is 1/12. Therefore: 1 /12  0.0833 1 1 /12 2 observations   0.0417 2 1 /12 5 observations   0.0167 5 1 /12 20 observations   0.0042 20 1 /12 200 observations   0.0004 200 Variance of sample means based on 1 observation  Variance of sample means based on Variance of sample means based on Variance of sample means based on Variance of sample means based on © Copyright 2003. Do not distribute or copy without permission. 105 Distribution of Sample Means Example: Let us return to the EPA analysts. Analyst #1 looked at 10 cars and found an average mileage of 18.6 and a standard deviation of 2.271. Analyst #1’s data was: 17, 16, 19, 21, 19, 21, 16, 16, 19, 22 Based on this sample, GM can expect that 95% of cars to have mileages between what two extremes? t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) -2.262 © Copyright 2003. Do not distribute or copy without permission. Pr(t > Critical value) 2.50% Critical Value 2.262 2.262 Left  18.6 2.271 Right  18.6 2.262  2.271 2.262  9.0 50.00%  Left  (2.262)(2.271)  18.6 13.5  Right  (2.262)(2.271)  18.6 23.7 106 Distribution of Sample Means Example: Let us return to the EPA analysts. Analyst #1 looked at 10 cars and found an average mileage of 18.6 and a standard deviation of 2.271. Analyst #1’s data was: 17, 16, 19, 21, 19, 21, 16, 16, 19, 22 Based on this sample, GM can expect that 95% of analysts who look at 10 cars each will find average mileages between what two extremes? t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) -2.262 9.0 50.00% Pr(t > Critical value) 2.50% Critical Value 2.262 2.262 Standard error  2.271  0.718 10 Left  18.6 2.262   Left  (2.262)(0.718)  18.6 17.0 0.718 Right  18.6 2.262   Right  (2.262)(0.718)  18.6 20.2 0.718 © Copyright 2003. Do not distribute or copy without permission. 107 Distribution of Sample Means Example: Let us return to the EPA analysts. Analyst #1 looked at 10 cars and found an average mileage of 18.6 and a standard deviation of 2.271. Analyst #1’s data was: 17, 16, 19, 21, 19, 21, 16, 16, 19, 22 Based on this sample, GM can expect that 95% of analysts who look at 20 cars each will find average mileages between what two extremes? t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) -2.262 2.262 9.0 50.00% Pr(t > Critical value) 2.50% Critical Value 2.262 Standard error  2.271  0.508 20 Left  18.6 2.262  0.508  Left  (2.262)(0.508)  18.6 17.5 Right  18.6 0.508  Right  (2.262)(0.508)  18.6 19.7 2.262  © Copyright 2003. Do not distribute or copy without permission. 108 Distribution of Sample Means Example: Let us return to the EPA analysts. Analyst #1 looked at 10 cars and found an average mileage of 18.6 and a standard deviation of 2.271. Analyst #1’s data was: 17, 16, 19, 21, 19, 21, 16, 16, 19, 22 95% of cars will have mileages between 13.5 mpg and 23.7 mpg. 95% of analysts who look at 10 cars each should find average mileages between 17.0 mpg and 20.2 mpg. 95% of analysts who look at 20 cars each should find average mileages between 17.5 mpg and 19.7 mpg. © Copyright 2003. Do not distribute or copy without permission. 109 Confidence Intervals While we can’t know the values of the population parameters (unless we have the entire population of data), we can make statements about how likely it is to find the population parameters within certain ranges. We construct confidence intervals to describe ranges over which population parameters are likely to exist. Example: Suppose EPA analyst #1 found the following data: Sample mean = 18.6 mpg Sample standard deviation = 2.271 mpg Sample size = 20 Standard error = 0.508 From the t9 distribution, we know that: 50% of the sample means lie within 0.7027 standard deviations of the population mean 75% of the sample means lie within 1.2297 standard deviations of the population mean 95% of the sample means lie within 2.2622 standard deviations of the population mean © Copyright 2003. Do not distribute or copy without permission. 110 Confidence Intervals We can use this information to construct confidence intervals around the population mean, where a confidence interval is: Measure  (critical value)(standard deviation of the measure) The measure and the standard deviation are found in the data. What critical value we select depends on the level of confidence we desire. …widens the range of focus. There is a 50% chance that the population mean is found within the range: 18.6  (0.7027)(0.508)  [18.2, 19.0] There is a 75% chance that the population mean is found within the range: 18.6  (1.2297)(0.508)  [18.0, 19.2] There is a 95% chance that the population mean is found within the range: 18.6  (2.2622)(0.508)  [17.5, 19.7] Increasing the level of confidence… © Copyright 2003. Do not distribute or copy without permission. 111 Confidence Intervals At the extremes, we can say… 1. There is a 100% chance that the population mean is between negative infinity and positive infinity. 2. There is a 0% chance that the population mean is exactly 18.60000000000000… The first statement gives perfect certainty about an infinitely unfocused range. The second statement gives zero certainty about an infinitely focused range. Usually, when statisticians mention “error,” they are referring to the range on a 95% confidence interval. © Copyright 2003. Do not distribute or copy without permission. 112 Confidence Intervals Example: You take a sample of 40 technology companies. The average P/E for the sample is 71.8. The standard deviation of the P/E’s of the 40 companies is 22.4. What is the measurement error associated with this average (at the 95% confidence level)? The confidence interval is: Measure  (critical value)(standard deviation of the measure) Sample mean  71.8 Standard error  22.4  3.54 40 Critical value (from t 39 )  2.0227 Measurement error 71.8  (2.0227)(3.54)  71.8  7.16 Average P/E ratio for all tech companies © Copyright 2003. Do not distribute or copy without permission. 113 Confidence Intervals Example: Your firm solicits estimates for constructing a new building. You receive the following seven estimates: $10 million, $12 million, $15 million, $13 million, $11 million, $14 million, $12 million Based on this information, construct a 90% confidence interval for the estimated cost of the building. Measure  (critical value)(standard deviation of the measure) Sample mean  $12.4 million Standard deviation  $1.7 million Critical value (from t 6 )  1.9432 $12.4 million  (1.9432)($1.7 million)  [$9.1 million, $15.7 million] © Copyright 2003. Do not distribute or copy without permission. 114 Confidence Intervals Standard deviation  $1.7 million Critical value (from t 6 )  1.9432 $12.4 million  (1.9432)($1.7 million)  [$9.1 million, $15.7 million] The difference lies in the choice of standard deviations. This is a 90% confidence interval for the cost of the building. What if we had used the standard deviation of the sample mean (the “standard error”) instead of the standard deviation of the observations? Measure  (critical value)(standard deviation of the measure) Sample mean  $12.4 million Standard deviation of the sample mean  $1.7 million 7  $643, 000 Critical value (from t 6 )  1.9432 $12.4 million  (1.9432)($643, 000)  [$11.2 million, $13.6 million] © Copyright 2003. Do not distribute or copy without permission. This is not a 90% confidence interval for the cost of the building, but a 90% confidence interval for the average cost of seven buildings. 115 Confidence Intervals Confidence interval for the cost of the building There is a 90% probability that the cost of a single building will be between $9.1 million and $15.7 million. Confidence interval for the average cost of the buildings There is a 90% probability that, when constructing seven buildings, the average cost per building will be between $11.2 million and $13.6 million. © Copyright 2003. Do not distribute or copy without permission. 116 Distribution of Proportions Proportions are means of categorical data. Categorical data is usually non-numeric and represents a state or condition rather than a value. Example: In a vote between George Bush and Al Gore, the data are categorical. E.g. “Bush, Gore, Gore, Bush, Gore, Bush, Bush, Bush, Gore, etc.” A proportion measures the frequency of a single category relative to all categories. For example, if the data set includes 10 “Bush’s” and 12 “Gore’s” then the category “Gore” represents 12 / (10 + 12) = 55% of all the categories. A population proportion (usually denoted as  ) is calculated based on the entire population of data. A sample proportion (usually denoted as p) is calculated based on a sample of the data. The properties of the sample proportion are: Population mean =  Sample standard deviation = p (1  p ) N Distribution = normal (provided Np > 5 and N(1–p) > 5) © Copyright 2003. Do not distribute or copy without permission. 117 Distribution of Proportions Example: There are 8.3 million registered voters in Florida. Within the first few hours after the polls closed in the 2000 election, the count showed 50.5% of the vote going to George Bush. This estimate was based on only 200,000 votes. Build a 99% confidence interval for the population proportion of votes for Bush. Measure  p  0.505 N  200,000 Standard deviation of the measure  0.0011 © Copyright 2003. Do not distribute or copy without permission. Standard Deviation of a Sample Proportion Proportion 0.505 Sample 200,000 Stdev(proportion) 0.0011 118 Distribution of Proportions Example: There are 8.3 million registered voters in Florida. Within the first few hours after the polls closed in the 2000 election, the count showed 50.5% of the vote going to George Bush. This estimate was based on only 200,000 votes. Build a 99% confidence interval for the population proportion of votes for Bush. Measure  (critical value)(standard deviation of the measure) Measure  p  0.505 Standard deviation of the measure  0.505(1  0.505)  0.00112 200, 000 Standard Normal Distribution (Z) Test statistic Pr(Z > Test statistic) 50.00% Pr(Z < Test statistic) 50.00% Pr(Z > Critical value) Critical Value Half of 1% is 0.5%. 0.50% 2.5758 Critical value (standard normal)  2.5758 There is a 99% probability that the population proportion of votes for Bush is between 50.2% and 50.8%. © Copyright 2003. Do not distribute or copy without permission. 119 Distribution of Proportions Example: Given that a sample of voters shows one candidate with a 1% lead (50.5% vs. 49.5%), what is the minimal number of votes that can be cast such that a 99.99% confidence interval for the candidate’s population proportion is greater than 50%? Measure  (critical value)(standard deviation of the measure) Measure  p  0.505 Standard deviation of the measure  0.505(1  0.505) N  0.249975 N Critical value (standard normal)  3.8906 Left end of confidence interval  0.505  (3.8906) 0.249975 N Left end of confidence interval  0.50  0.505  (3.8906) 0.249975 N  0.50  N  151,353 For elections in which the winner wins by at least 1%, one can poll (approximately) 150,000 voters and get, with a margin of error of 0.01%, the same result as that obtained by polling all voters. This margin of error implies 1 miscalled election out of every 10,000 elections. © Copyright 2003. Do not distribute or copy without permission. 120 Sampling Bias Given these results, why were the political parties so concerned with counting “every vote” in Florida? Polling 150,000 people works only if the people are selected randomly. In Florida, political parties were advocating recounts only for subsets of voters (i.e. states and counties) that were predominantly aligned with one or the other party. The argument in the Florida election ultimately revolved around attempts to introduce and block sampling biases. © Copyright 2003. Do not distribute or copy without permission. 121 Sampling Bias Sampling bias is a systematic tendency for samples to misrepresent the population from which they are drawn. A sample is not biased if it fails to represent the population – there is a measurable probability that a given sample will fail to represent the population. Rather, the data selection process is biased if repeated samples consistently misrepresent the population. Types of sampling biases: Selection bias: Researcher excludes atypical subsets of the population from the data. E.g. Estimate the average rate of return on low P/B stocks. Problem: Firms with low P/B fail at a higher rate than firms with high P/B. Failed firms do not appear in the data set. Result: Sample mean return is greater than population mean return. Non-response bias: Atypical subsets of subjects exclude themselves from the data. E.g. Estimate the standard deviation of household incomes. Problem: Individuals at the high and low extremes will be less likely to respond. Result: Sample standard deviation is less than the population standard deviation. Measurement bias: The measurement applied to the sample atypically approximates the population. E.g. Estimate average purchasing power by measuring income over time. Problem: As prices rise, incomes rise, but purchasing power does not. Result: Sample mean of income exceeds population mean of purchasing power. © Copyright 2003. Do not distribute or copy without permission. 122 Hypothesis Testing Thus far, we have 1. 2. 3. Estimated the probability of finding single observations that are certain distances away from the population mean. Estimated the probability of finding sample means that are certain distances away from the population mean. Estimated left and right boundaries that contain the population mean at varying degrees of confidence. We now want to test statements about the population mean. Procedure for testing a hypothesis: • • • State a null hypothesis concerning the population parameter. The null hypothesis is what we will assume is true. State an alternative hypothesis concerning the population parameter. The alternative hypothesis is what we will assume to be true if the null hypothesis is false. Calculate the probability of observing a sample that disagrees with the null at least as much as the sample you observed. © Copyright 2003. Do not distribute or copy without permission. 123 Hypothesis Testing Example: Suppose we want to test the hypothesis that Bush obtained more than 50% of the vote in Florida. 1. 2. 3. Our null hypothesis is   0.5 Our alternative hypothesis is   0.5 Based on a sample of 200,000 votes, we observed p = 0.505. Calculate the probability of observing p = 0.505 (or less) when, in fact,   0.5. H0 :   0.5 Ha :   0.5 Since we are assuming that   0.5, or (in the most conservative case),   0.5, we are also assuming that the standard deviation of p is (0.5)(1  0.5)  0.001118. 200,000 We now ask the question: “Assuming that the null hypothesis is true, what is the probability of observing a sample that disagrees with the null at least as much as the sample we observed?” © Copyright 2003. Do not distribute or copy without permission. 124 Hypothesis Testing We now ask the question: “Assuming that the null hypothesis is true, what is the probability of observing a sample that disagrees with the null at least as much as the sample we observed?” The area to the right of 0.505 is the probability of finding a sample proportion of at least 0.505 when, in fact, the population proportion is 0.5. The sample proportion we found was 0.505. The area to the left of 0.505 is the probability of finding a sample proportion of at most 0.505 when, in fact, the population proportion is 0.5. © Copyright 2003. Do not distribute or copy without permission. According to the null hypothesis, we assume that the center of the distribution is 0.5. 125 Hypothesis Testing We now ask the question: “Assuming that the null hypothesis is true, what is the probability of observing a sample that disagrees with the null at least as much as the sample we observed?” Because the setup of the distribution assumes that the population proportion is at least 0.5, we are more concerned with the alternative tail. The area of the alternative tail tells us the probability of observing a sample “as good or worse” than the one we observed when, in fact, the null hypothesis is true. The area to the left of 0.505 is the probability of finding a sample proportion of at most 0.505 when, in fact, the population proportion is 0.5. © Copyright 2003. Do not distribute or copy without permission. Using the formula for the test statistic, we find that the area of the alternative tail is 0.9996. We say: “Assuming that Bush would gain at least 50% of the vote, there is a 99.96% chance that a sample of 200,000 votes would show at most 50.5% for Bush.” 126 Hypothesis Testing “Assuming that Bush would gain at least 50% of the vote, there is a 99.96% chance that a sample of 200,000 votes would show at most 50.5% for Bush.” Notice that this statement is not very enlightening. What it says (in effect) is: “If you assume that Bush wins, then the sample results we see are reasonable.” This sounds like a circular argument. Example: 1. You buy a new house and, although you have seen no termites in the house, you assume that the house is in danger of termite infestation. 2. You spend $5,000 on a new treatment that is supposed to guarantee that termites will never infest your house. 3. Following the treatment, you see no termites. 4. You conclude that the treatment was worth the $5,000. The problem with this line of reasoning is that your belief that the expensive treatment works is based on the (possibly false) assumption that you had termites. © Copyright 2003. Do not distribute or copy without permission. 127 Hypothesis Testing Example: 1. You buy a new house and, although you have seen no termites in the house, you assume that the house is in danger of termite infestation. 2. You spend $5,000 on a new treatment that is supposed to guarantee that termites will never infest your house. 3. Following the treatment, you see no termites. 4. You conclude that the treatment was worth the $5,000. Following the termite treatment, two things can happen: • You don’t see termites in the house. • You do see termites in the house. If you don’t see termites, you can conclude nothing. It could be the case that the treatment works, or it could be the case that the treatment doesn’t work but you’ll never know because you don’t have termites. If you do see termites, you can conclude that the treatment doesn’t work. © Copyright 2003. Do not distribute or copy without permission. 128 Hypothesis Testing Returning to the election example, finding a sample proportion of 0.505 does not tell us that the population proportion is greater than 0.5 because we began the analysis assuming that the population proportion was greater than 0.5. However, if we found a sample proportion of (for example) 49.8%, this may tell us something. H0 :   0.5 Ha :   0.5 Assuming that (in the most conservative case)   0.5, stdev(p )  Test statistic  (0.5)(1  0.5)  0.001118. 200,000 Test value  mean 0.498  0.5   1.7889 standard deviation 0.001118 The area of the alternative tail is 3.7%. We conclude:  If, in fact, the population proportion of votes for Bush is at least 50%, then there is only a 3.7% chance of observing a sample proportion of, at most, 49.8%. © Copyright 2003. Do not distribute or copy without permission. 129 Hypothesis Testing The area corresponding to the alternative hypothesis is called the “p-value” (“p” stands for “probability”). In words, the p-value is the probability of rejecting the null hypothesis when, in fact, the null hypothesis is true. For example, suppose that the sample of 200,000 voters had a sample proportion of 49.8% voting for Bush. The null hypothesis is that the population proportion exceeds 0.5 – i.e. “Bush wins the election.” So, if Bush were to concede the election before the entire population of votes were tallied (i.e. if Bush were to reject the null hypothesis), then there is a 3.7% chance that he would be conceding when, in fact, the population of votes is in his favor. © Copyright 2003. Do not distribute or copy without permission. 130 Hypothesis Testing In making decisions on the basis of samples, you can make either of two types of errors. Type I Error Reject the null hypothesis when, in fact, the null hypothesis is true. Example: Conclude that the termite treatment does work when, in fact, it does not work. Type II Error Fail to reject the null hypothesis when, in fact, the null hypothesis is false. Example: Conclude that the termite treatment does not work when, in fact, it does work. Because all of our analyses begin with an assumption about the population, our p-values will always refer to Type I errors. This does not mean that we are immune from Type II errors. Rather, it means that the calculation of Type II errors is beyond the scope of this course. © Copyright 2003. Do not distribute or copy without permission. 131 Hypothesis Testing Returning to the EPA example, there are two ways the EPA analyst could construct hypotheses. H0 :   20 Ha :   20 Presumption GM is in compliance unless the data indicate otherwise. Implications of Results Reject the null: GM is not in compliance. Fail to reject the null: No conclusion. H0 :   20 Ha :   20 Presumption GM is not in compliance unless the data indicate otherwise. Implications of Results Reject the null: GM is in compliance. Fail to reject the null: No conclusion. © Copyright 2003. Do not distribute or copy without permission. 132 Hypothesis Testing H0 :   20 Ha :   20 Sample mean  18.6 Sample standard deviation of the sample means  0.508 18.6  20 Test statistic   2.756 0.508 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) Pr(t < Test statistic) (2.756) 9 98.89% 1.11% Pr(t > Critical value) Critical Value ########### Conclusion: If the fleet’s average mileage did exceed 20 mpg, then the probability of finding a sample with (at most) an average mileage of 18.6 would be 1.1%. Alternatively: The null hypothesis is that GM’s fleet meets or exceeds EPA requirements. Based on the sample data, were the EPA to declare GM in violation of EPA requirements (i.e. reject the null hypothesis), there would be a 1.1% chance that the EPA’s ruling would be incorrect. © Copyright 2003. Do not distribute or copy without permission. 133 Hypothesis Testing Two approaches to hypothesis testing Procedure for hypothesis testing using significance level approach: 1. State the null and alternative hypotheses. 2. Picture the distribution and identify the null and alternative areas. 3. Using the significance level, identify the critical values(s) that separate the null and alternative areas. 4. Calculate the test statistic. 5. Place the test statistic on the distribution. If it falls in the alternative area, reject the null hypothesis. If it falls in the null area, fail to reject the null hypothesis. Procedure for hypothesis testing using p-value approach: 1. State the null and alternative hypotheses. 2. Picture the distribution and identify the null and alternative areas. 3. Calculate the test statistic. 4. Find the area from the test statistic toward the alternative area(s). This area is the p-value. 5. Interpretation: p-value is the probability of rejecting the null when, in fact, the null is true. © Copyright 2003. Do not distribute or copy without permission. 134 Hypothesis Testing Example (significance level approach): Using the EPA data from analyst #1, test the hypothesis that the (population) average mileage of GM’s car fleet exceeds 20 mpg. Test the hypothesis at the 5% significance level. Area in alternative tail = 5% Critical value = -1.833 Test statistic = -1.949 Test statistic falls in alternative tail  Reject the null hypothesis © Copyright 2003. Do not distribute or copy without permission. 135 Hypothesis Testing Example (p-value approach): Using the EPA data from analyst #1, test the hypothesis that the (population) average mileage of GM’s car fleet exceeds 20 mpg. Test statistic = -1.949 Area from test statistic toward alternative area = 4.16% Interpretation: If we were to reject the null, there would be a 4.16% chance that we would be incorrect. © Copyright 2003. Do not distribute or copy without permission. t9 136 Hypothesis Testing Example: Test the hypothesis that the average real rate of return on 12 month municipal bonds exceeds 3% at a 5% significance level. Sample data: N  50 x  4.2% s x  9.9% sx  9.9% 50  1.4% Critical value: Critical value is the value that causes the alternative tail area to equal the significance level. Ha is to the left. Hypotheses: H0 :   3% Ha :   3% t49 Test Statistic: Test statistic  x   4.2%  3%   0.8571 1.4% sx -1.677 0.8571 Fail to reject H0. © Copyright 2003. Do not distribute or copy without permission. 137 Hypothesis Testing Example: A paint manufacturer advertises that, when applied correctly, its paint will resist peeling for 5 years. A consumer watchdog group has filed a class action suit against the manufacturer for false advertisement. Based on the following data (numbers reflect years prior to peeling) test the manufacturer’s claim at the 1% level of significance. Sample data: 4.9, 5.2, 3.7, 5.3, 4.8, 4.5, 5.1, 5.8, 4.1, 4.7 N  10 x  4.81 s x  0.6064 sx  0.6064 10  0.1917 Presumption of innocence H0 :   5 Ha :   5 t9 Ha on the left Critical value = –2.8214 x   4.81  5 Test statistic    0.991 sx 0.1917 © Copyright 2003. Do not distribute or copy without permission. -0.991 Test statistic falls in null tail  fail to reject null hypothesis. 138 Hypothesis Testing Example: A paint manufacturer advertises that, when applied correctly, its paint will resist peeling for 5 years. A consumer watchdog group has filed a class action suit against the manufacturer for false advertisement. Based on the following data (numbers reflect years prior to peeling) calculate the p-value for the manufacturer’s claim. Sample data: 4.9, 5.2, 3.7, 5.3, 4.8, 4.5, 5.1, 5.8, 4.1, 4.7 Using the p-value approach, we find the area of the alternative tail, starting at the test statistic. Conclusion: Assuming that the null hypothesis is true, there is a 17.4% chance that we would find a sample mean (based on 10 observations) of 4.81 or less. t9 -0.991 Alternatively: We can reject the null hypothesis, but there would be a 17.4% chance that we would be wrong in doing so. area = 0.174 Test statistic  © Copyright 2003. Do not distribute or copy without permission. x   4.81  5   0.991 sx 0.1917 139 Distribution of a Difference in Sample Means Frequently, we are interested in comparing the means of two populations. Statistically, this is a more complicated problem than simply testing a single sample mean. In the means tests we have seen thus far, we have always compared a sample mean to some fixed number. Example: In testing the hypothesis that the mean return on bonds exceeds 3%, we compared a random variable (the sample mean) to a fixed number (3%). When we perform a test on a single sample mean, we are comparing a single random variable to a fixed number. When we perform a test comparing two sample means, we are comparing two random variables to each other. © Copyright 2003. Do not distribute or copy without permission. 140 Distribution of a Difference in Sample Means Let x a  x b be a difference in sample means. The properties of the difference in sample means are: Population mean  a  b Sample standard deviation  s x a  x b Distribution  t df , where df  © Copyright 2003. Do not distribute or copy without permission. s x2a s x2b   Na Nb  s x2a  s x2b  2 s x4a s x4b  Na  1 Nb  1 141 Difference in Means Test Example: Test the hypothesis (at a 1% significance level) that the average rate of return on 12 month Aaa bonds is less than the average rate of return on 12 month municipal bonds. We draw two samples from two different populations (Aaa bonds and municipal bonds). We now have two random variables (the sample means from each population). Our hypotheses are: H0 :  Aaa  muni  0% Ha :  Aaa  muni  0% We obtain the following sample data: N Aaa  43, N muni  50 x Aaa  5.1%, x muni  4.2% s Aaa  1.4%, s muni  1.1% © Copyright 2003. Do not distribute or copy without permission. s x Aaa x s x2Aaa s x2 (0.014)2 (0.011)2      0.003 N Aaa N muni 43 50 muni muni 142 Difference in Means Test Example: Test the hypothesis (at a 1% significance level) that the average rate of return on 12 month Aaa bonds is less than the average rate of return on 12 month municipal bonds. Our hypotheses are: H0 :  Aaa  muni  0% Test statistic   0.051  0.042  0  3.407 0.003 Ha :  Aaa  muni  0% We obtain the following sample data: N Aaa  43, N muni  50 x Aaa  5.1%, x muni  4.2% s Aaa  1.4%, s muni  1.1% s x Aaa  x  0.003 muni © Copyright 2003. Do not distribute or copy without permission. The degrees of freedom are:   0.014 2  0.011 2   2     2 2  s xa  s xb   43   50   df   4 4 s x4a s x4b  0.014   0.011       Na  1 Nb  1  43    50  43  1 50  1   2 79 143 Difference in Means Test Example: Test the hypothesis (at a 1% significance level) that the average rate of return on 12 month Aaa bonds is less than the average rate of return on 12 month municipal bonds. Our hypotheses are: H0 :  Aaa  muni  0% Ha :  Aaa  muni  0% t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) 79.0 50.00% Pr(t < Test statistic) 50.00% Pr(t > Critical value) 1.00% Critical Value Test statistic  2.3745  0.051  0.042  0  3.407 © Copyright 2003. Do not distribute or copy without permission. 0.003 Test statistic falls in alternative tail  reject null hypothesis. 144 Difference in Means Test Example: Find the p-value for the hypothesis that the average rate of return on 12 month Aaa bonds is less than the average rate of return on 12 month municipal bonds. Our hypotheses are: H0 :  Aaa  muni  0% Ha :  Aaa  muni  0% Test statistic   0.051  0.042  0  3.407 0.003 Probability of finding a sample that disagrees with the null by at least as much as the sample we observed when, in fact, the null hypothesis is true = 0.05%. We can reject the null hypothesis, but there is a 0.05% chance that we would be wrong in doing so. © Copyright 2003. Do not distribute or copy without permission. t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) Pr(t < Test statistic) 3.407 79.0 0.05% 99.95% Pr(t > Critical value) Critical Value ########### 145 Difference in Means Test Example: Find the p-value for the hypothesis that the average rate of return on 12 month Aaa bonds is less than the average rate of return on 12 month municipal bonds. N Aaa  43, N muni  50 x Aaa  5.1%, x muni  4.2% s Aaa  1.4%, s muni  1.1% X1 bar Sx1 N1 Difference in Means Test 0.051 Stdev(X1 bar - X2 bar) 0.014 Test statistic (distributed t) 43 df X2 bar 0.042 Sx2 0.011 N2 © Copyright 2003. Do not distribute or copy without permission. 0.003 3.407 79.28 50 146 Difference in Means Test Using Data Set #1, test the hypothesis (at a 5% level of significance) that the average cost of unleaded gas is more expensive today than in the past. Suggestion: The data set ranges from January 1976 through April 2003. Split the data set into three parts (1/76 through 12/84, 1/85 through 12/93, and 1/94 through 4/03) and test for a difference in population means between the first and third parts. 1. State the hypotheses 3. Find appropriate critical value H0 : 1  3 H0 : 1  3  0 Ha : 1  3 Ha : 1  3  0 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) 178 50.00% Pr(t < Test statistic) 50.00% Pr(t > Critical value) 5.00% 2. Calculate sample statistics and test statistic N 1  108, N 3  112 x 1  80.58, x 3  106.92 s 1  24.49, s 3  15.37 X1 bar Sx1 N1 X2 bar Sx2 N2 © Copyright 2003. Do not distribute or copy without permission. Critical Value Difference in Means Test 80.580 Stdev(X1 bar - X2 bar) 24.490 Test statistic (distributed t) 108 df 1.6535 2.768 (9.515) 178.85 106.920 15.370 112 147 Difference in Means Test Using Data Set #1, test the hypothesis (at a 5% level of significance) that the average cost of unleaded gas is more expensive today than in the past. Suggestion: The data set ranges from January 1976 through April 2003. Split the data set into three parts (1/76 through 12/84, 1/85 through 12/93, and 1/94 through 4/03) and test for a difference in population means between the first and third parts. 4. Compare test statistic to critical value H0 : 1  3 Ha : 1  3 Test statistic falls in null area  fail to reject null hypothesis. © Copyright 2003. Do not distribute or copy without permission. 148 Difference in Means Test Using Data Set #1, test the hypothesis (at a 5% level of significance) that the average cost of unleaded gas is more expensive today than in the past. Note: The question asks if the average cost of unleaded gas is more expensive today than in the past. One way to interpret this is in terms of price (which we have done). Another way to interpret this is in terms of purchasing power. If the researcher intended this latter interpretation, then we may have introduced measurement bias  the price of gas in dollars may not reflect the cost of gas in purchasing power. Using Data Set #2, test the hypothesis (at a 5% level of significance) that the average cost of unleaded gas (in terms of purchasing power) is more expensive today than in the past. Suggestion: Again, split the data set in three parts and compare the sample means of parts 1 and 3. This data set includes average hourly earnings of private sector employees. Use the ratio of the price of gas to average hourly earnings as a measurement of the purchasing power cost of gas. Note: The cost of gas (in terms of purchasing power) is the price of gas divided by the wage rate  ($ / gal) / ($ / hr) = hr / gal = how many hours a person must work to be able to afford 1 gallon of gas. © Copyright 2003. Do not distribute or copy without permission. 149 Difference in Means Test Using Data Set #2, test the hypothesis (at a 5% level of significance) that the average cost of unleaded gas (in terms of purchasing power) is more expensive today than in the past. 1. State the hypotheses 3. Find appropriate critical value H0 : 1  3 H0 : 1  3  0 Ha : 1  3 Ha : 1  3  0 2. Calculate sample statistics and test statistic N 1  108, N 3  112 x 1  0.116, x 3  0.082 s 1  0.021, s 3  0.009 X1 bar Sx1 N1 Difference in Means Test 0.116 Stdev(X1 bar - X2 bar) 0.021 Test statistic (distributed t) 108 df 0.082 Sx2 0.009 © Copyright 2003. Do not distribute or copy without permission. 144 50.00% Pr(t < Test statistic) 50.00% Pr(t > Critical value) 5.00% Critical Value X2 bar N2 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) 1.6555 0.002 15.508 143.91 112 150 Difference in Means Test Using Data Set #2, test the hypothesis (at a 5% level of significance) that the average cost of unleaded gas (in terms of purchasing power) is more expensive today than in the past. 4. Compare test statistic to critical value H0 : 1  3 Ha : 1  3 15.508 Test statistic falls in the alternative tail  reject null hypothesis. © Copyright 2003. Do not distribute or copy without permission. 151 Difference in Means Test You work for a printing firm. In the past, the firm employed people to maintain the high speed copier. Six months ago, in an effort to reduce costs, management laid off the maintenance crew and contracted out service of the machine. You are looking at maintenance logs for the copier and note the following times between copier breakdowns. Test the hypothesis (at the 5% level) that the contracted maintenance does as good of a job as the in-house maintenance did. In-House Maintenance 26, 27, 22, 13, 8, 10, 28, 7, 16, 23, 26, 25 Contracted Maintenance 17, 13, 21, 17, 8, 6, 27, 6, 2, 20, 8, 9 Convert data to logs: Data represents time, so we can consider using the lognormal distribution. Note: This is an analysis of the sample means. The Central Limit Theorem tells us that sample means are (asymptotically) t-distributed regardless of the distribution of the underlying data. So, while taking logs will improve accuracy, it is not necessary (and becomes less necessary the larger the data set). In-House Maintenance 3.26, 3.30, 3.09, 2.56, 2.08, 2.30, 3.33, 1.95, 2.77, 3.14, 3.26, 3.22 Contracted Maintenance 2.83. 2.56. 3.04. 2.83. 2.08. 1.79. 3.30. 1.79. 0.69. 3.00. 2.08. 2.20 © Copyright 2003. Do not distribute or copy without permission. 152 Difference in Means Test You work for a printing firm. In the past, the firm employed people to maintain the high speed copier. Six months ago, in an effort to reduce costs, management laid off the maintenance crew and contracted out service of the machine. You are looking at maintenance logs for the copier and note the following times between copier breakdowns. Test the hypothesis (at the 5% level) that the contracted maintenance does as good of a job as the in-house maintenance did. ln(In-House Maintenance) 3.26, 3.30, 3.09, 2.56, 2.08, 2.30, 3.33, 1.95, 2.77, 3.14, 3.26, 3.22 ln(Contracted Maintenance) 2.83. 2.56. 3.04. 2.83. 2.08. 1.79. 3.30. 1.79. 0.69. 3.00. 2.08. 2.20 1. State the hypotheses 2. Calculate sample statistics and test statistic N ln(in-house)  12, N ln(contracted)  12 H0 : in-house  contracted x ln(in-house)  2.85, x ln(contracted)  2.35 Ha : in-house  contracted X1 bar Sx1 N1 s ln(in-house)  0.508, s ln(contracted)  0.729 Difference in Means Test 2.855 Stdev(X1 bar - X2 bar) 0.508 Test statistic (distributed t) 12 df X2 bar 2.350 Sx2 0.729 N2 © Copyright 2003. Do not distribute or copy without permission. 0.256 1.967 19.64 12 153 Difference in Means Test You work for a printing firm. In the past, the firm employed people to maintain the high speed copier. Six months ago, in an effort to reduce costs, management laid off the maintenance crew and contracted out service of the machine. You are looking at maintenance logs for the copier and note the following times between copier breakdowns. Test the hypothesis (at the 5% level) that the contracted maintenance does as good of a job as the in-house maintenance did. H0 : in-house  contracted  in-house  contracted  0 Ha : in-house  contracted  in-house  contracted  0 2.5% N ln(in-house)  12, N ln(contracted)  12 2.5% x ln(in-house)  2.85, x ln(contracted)  2.35 s ln(in-house)  0.508, s ln(contracted)  0.729 Test statistic  1.967, df  19 3. Find appropriate critical values t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) 19 t Distribution Pr(t < Test statistic) Pr(t > Critical value) 97.50% Test statistic Degrees of Freedom Pr(t > Test statistic) Critical Value -2.0930 Pr(t < Test statistic) Pr(t > Critical value) Critical Value © Copyright 2003. Do not distribute or copy without permission. 19 2.50% 2.0930 154 Difference in Means Test You work for a printing firm. In the past, the firm employed people to maintain the high speed copier. Six months ago, in an effort to reduce costs, management laid off the maintenance crew and contracted out service of the machine. You are looking at maintenance logs for the copier and note the following times between copier breakdowns. Test the hypothesis (at the 5% level) that the contracted maintenance does as good of a job as the in-house maintenance did. H0 : in-house  contracted  in-house  contracted  0 Ha : in-house  contracted  in-house  contracted  0 N ln(in-house)  12, N ln(contracted)  12 x ln(in-house)  2.85, x ln(contracted)  2.35 s ln(in-house)  0.508, s ln(contracted)  0.729 Test statistic  1.967, df  19 3. Find appropriate critical value 4. Compare test statistic to critical value 1.967 Test statistic falls in the null area  fail to reject null hypothesis. © Copyright 2003. Do not distribute or copy without permission. 155 Difference in Means Test You work for a printing firm. In the past, the firm employed people to maintain the high speed copier. Six months ago, in an effort to reduce costs, management laid off the maintenance crew and contracted out service of the machine. You are looking at maintenance logs for the copier and note the following times between copier breakdowns. Test the hypothesis (at the 10% level) that the contracted maintenance does as good of a job as the in-house maintenance did. H0 : in-house  contracted  in-house  contracted  0 Ha : in-house  contracted  in-house  contracted  0 N ln(in-house)  12, N ln(contracted)  12 x ln(in-house)  2.85, x ln(contracted)  2.35 s ln(in-house)  0.508, s ln(contracted)  0.729 Test statistic  1.967, df  19 1.967 Test the hypothesis at the 10% significance level. t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) t Distribution 19 Pr(t < Test statistic) Test statistic Degrees of Freedom Pr(t > Test statistic) Pr(t < Test statistic) Pr(t > Critical value) 95.00% Pr(t > Critical value) Critical Value -1.7291 Critical Value © Copyright 2003. Do not distribute or copy without permission. 19 5.00% 1.7291 Test statistic falls in the alternative area  Reject the null hypothesis. 156 Difference in Means Test You work for a printing firm. In the past, the firm employed people to maintain the high speed copier. Six months ago, in an effort to reduce costs, management laid off the maintenance crew and contracted out service of the machine. You are looking at maintenance logs for the copier and note the following times between copier breakdowns. Test the hypothesis that the contracted maintenance does as good of a job as the in-house maintenance did. Conclusion: 1. We reject the hypothesis that the contracted maintenance does as good a job as the in-house maintenance at a 10% level of significance. 2. We fail to reject the hypothesis that the contracted maintenance does as good a job as the in-house maintenance at a 5% level of significance. 3. p-value = (3.20%)(2) = 6.4%  Probability of rejecting the null when the null is true. t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) Pr(t < Test statistic) © Copyright 2003. Do not distribute or copy without permission. 1.967 19 3.20% 96.80% Multiply the p-value by two because this is a two-tailed test  an equal portion of the alternative tail exists on the opposite side of the distribution. 157 Difference in Means Test A plaintiff claims that an on-the-job injury has reduced his ability to earn tips. He is suing for lost future income. His tips for twelve weeks before and after the injury are shown below. Test the hypothesis that his injury reduced his earning power. Before injury 200, 210, 250, 180, 220, 200, 210, 230, 240, 190, 220, 250 Presumption of innocence After injury 200, 230, 190, 180, 200, 190, 210, 200, 220, 200, 180, 220 H0 : before  after  before  after  0 Ha : before  after  before  after  0 N before  12, N after  12 x before  216.67, x after  201.67 s before  22.697, s after  15.859 Test statistic  1.877 df  19 1.877 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) Pr(t < Test statistic) © Copyright 2003. Do not distribute or copy without permission. 1.877 19 3.80% 96.20% 158 Distribution of a Difference in Proportions A difference in proportions test examines samples from two populations in an attempt to compare the two population proportions. Let pa  pb be a difference in sample proportions. Population proportion   a   b Sample standard deviation  s pa  pb  pa (1  pa ) p b (1  p b )  Na Nb Distribution  standard normal provided N a pa  5, N a (1  pa )  5, N b p b  5, and N b (1  p b )  5 © Copyright 2003. Do not distribute or copy without permission. 159 Difference in Proportions Test An ABC News poll (summer 2003) of 551 women and 478 men shows that 31% of men and 36% of women would rather see Hillary Clinton as President in 2004 than George Bush. Test the hypothesis that the two proportions are equal. H0 :  men   women  0 Ha :  men   women  0 N men  478, N women  551 pmen  0.31, p women  0.36 sp men  p women  (0.31)(1  0.31) (0.36)(1  0.36)   0.029 478 551 N men p men  148.2  5 N men (1  p men )  329.8  5 N women p women  198.4  5 N women (1  p women )  352.6  5 © Copyright 2003. Do not distribute or copy without permission. Difference in sample proportions is normally distributed. 160 Difference in Proportions Test An ABC News poll (summer 2003) of 551 women and 478 men shows that 31% of men and 36% of women would rather see Hillary Clinton as President in 2004 than George Bush. H0 :  men   women  0 Ha :  men   women  0 p1 N1 Difference in Proportions Test 0.310 Stdev(p1 - p2) 478 Test statistic (distributed stnd norm) p2 0.360 N2 551 Standard Normal Distribution (Z) Test statistic (1.699) Pr(Z > Test statistic) 95.53% Pr(Z < Test statistic) 4.47% Pr(Z > Critical value) Critical Value © Copyright 2003. Do not distribute or copy without permission. 0.029 0.00% #NUM! (1.699) p-value = (4.47%)(2) = 8.94% Probability of rejecting the null hypothesis when the null is true = 9%. 161 Finite Population Correction Factor So far, we have assumed that the population of data is infinite. For example, in the case of bond yield data, the population of data representing the return on IBM bonds is all the returns that ever were, ever will be, or ever could have been. There are some instances in which the population data is not only finite, but small in comparison to the sample size. In these instances, the sample data reflects more information than normal because it represents, not a sample from an infinitely sized population, but a significant portion of the entire population. © Copyright 2003. Do not distribute or copy without permission. 162 Finite Population Correction Factor For example, suppose we want to construct a 95% confidence interval for the average price of retail gas in Pittsburgh. There are 500 gas stations and we have the following sample data: $1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10. The mean for the sample is $1.12 The standard deviation is $0.06 The question is about the mean of the the price of gas. According to the Central Limit Theorem, sample means are t-distributed regardless of the distributions of the underlying data, therefore we can skip the lognormal transformation. The critical value for a 95% confidence interval on a t11 distribution is 2.201. The 95% confidence interval is:  $0.06  $1.12  (2.201)    [$1.08, $1.16]  12  © Copyright 2003. Do not distribute or copy without permission. 163 Finite Population Correction Factor For example, suppose we want to construct a 95% confidence interval for the average price of retail gas in Pittsburgh. There are 500 gas stations and we have the following sample data: $1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.  $0.06  $1.12  (2.201)    [$1.08, $1.16]  12  Now, suppose that we have the same sample, but that there are only 25 gas stations in Pittsburgh. The 12 observations in our sample now constitute a large portion of the total population. As such, the information we obtain from the sample should more clearly reflect the population than it did when there were 500 gas stations in the population. To account for this additional information, we adjust the standard deviation of the mean by the finite population correction factor. The fpcf reduces the size of the standard deviation of the mean to reflect the fact that the sample represents a large portion of the total population. © Copyright 2003. Do not distribute or copy without permission. 164 Finite Population Correction Factor For example, suppose we want to construct a 95% confidence interval for the average price of retail gas in Pittsburgh. There are 500 gas stations and we have the following sample data: $1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.  $0.06  $1.12  (2.201)    [$1.08, $1.16]  12  Correcting the standard deviation of the sample mean by the finite population correction factor, we have: N n  Corrected s x  s x    N 1  N  Population size n  Sample size © Copyright 2003. Do not distribute or copy without permission. 165 Finite Population Correction Factor For example, suppose we want to construct a 95% confidence interval for the average price of retail gas in Pittsburgh. There are 25 gas stations and we have the following sample data: $1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.  25  12 $0.06  $1.12  (2.201)    [$1.10, $1.14]  25  1 12   Notes on the finite population correction factor: 1. The correction does not apply to standard deviations of observations. The fpcf only applies to standard deviations covered by the central limit theorem (including standard deviations of means, standard deviations of proportions, standard deviations of differences in means, and standard deviations of differences in proportions). 2. The correction becomes necessary only when the sample size exceeds 5% of the population size. © Copyright 2003. Do not distribute or copy without permission. 166 Distribution of Sample Variances The analyses we have seen thus far all involve single observations or sample means. Often, we will also want to conduct tests on variances. Example: Two paint companies both claim that their paints will resist peeling for an average of 10 years. You collect relevant durability data on both brands of paint. Brand A 10, 12, 10, 9, 10, 11, 8, 12, 9, 9 Brand B 12, 6, 6, 1, 6, 17, 5, 17, 17, 13 Both samples have means of 10. But, the sample from brand A exhibits a standard deviation of 1.3 compared to 5.9 for brand B.  While both brands appear to have the same average performance, brand A has more uniform product quality (i.e. lower variance). © Copyright 2003. Do not distribute or copy without permission. 167 Distribution of Sample Variances Let s be a sample standard deviation. The properties of a sample standard deviation: Population standard deviation   (N  1)s 2  2 is distributed N2 1 © Copyright 2003. Do not distribute or copy without permission. 168 Variance Test Example: A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A sample of 12 tires has a standard deviation of 18,000 miles. At a 5% level of significance, test the hypothesis that the production process does not require adjustment. H0 :   20,000 Ha :   20,000 (12  1)(18,0002 ) Test statistic   8.91 20,0002 2 Chi-Square Distribution (χ ) Test statistic Degrees of Freedom 11 Pr(χ2 > Test statistic) 100.00% Pr(χ2 < Test statistic) 0.00% Pr(χ2 > Critical value) Critical Value © Copyright 2003. Do not distribute or copy without permission. 5.00% 19.675 112 Test statistic falls in null area  fail to reject null hypothesis 169 Variance Test Example: A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A sample of 12 tires has a standard deviation of 18,000 miles. At a 5% level of significance, test the hypothesis that the production process does require adjustment. H0 :   20,000 Ha :   20,000 (12  1)(18,0002 ) Test statistic   8.91 20,0002 2 Chi-Square Distribution (χ ) Test statistic Degrees of Freedom 11 Pr(χ2 > Test statistic) 100.00% Pr(χ2 < Test statistic) 0.00% Pr(χ2 > Critical value) Critical Value © Copyright 2003. Do not distribute or copy without permission. 95.00% 4.575 112 Test statistic falls in null area  fail to reject null hypothesis 170 Variance Test Example: A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A sample of 12 tires has a standard deviation of 18,000 miles. At a 5% level of significance, test the hypothesis that the production process does require adjustment. H0 :   20,000 Ha :   20,000 H0 :   20,000 We have tested both sets of hypotheses and, in each case, failed to reject the null hypothesis. Ha :   20,000 Isn’t this contradictory because the two nulls are opposites? No. Remember: failing to reject the null (technically) leaves us with no conclusion. Therefore, what happened is that we ran two tests and neither resulted in a conclusion. © Copyright 2003. Do not distribute or copy without permission. 171 Variance Test Example: A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A sample of 12 tires has a standard deviation of 18,000 miles. What is the p-value for the hypothesis that the production process does not require adjustment? H0 :   20,000 Ha :   20,000 (12  1)(18,0002 ) Test statistic   8.91 20,0002 112 2 Chi-Square Distribution (χ ) Test statistic 8.910 Degrees of Freedom 11 Pr(χ2 > Test statistic) 63.02% Pr(χ2 < Test statistic) 36.98% 2 Pr(χ > Critical value) Critical Value p-value is the area from the test statistic toward the alternative area. #NUM! p-value is “the probability of erroneously rejecting the null hypothesis.” © Copyright 2003. Do not distribute or copy without permission. 172 Variance Test Example: A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A sample of 12 tires has a standard deviation of 18,000 miles. What is the p-value for the hypothesis that the production process does require adjustment? H0 :   20,000 Ha :   20,000 (12  1)(18,0002 ) Test statistic   8.91 20,0002 112 2 Chi-Square Distribution (χ ) Test statistic 8.910 Degrees of Freedom 11 Pr(χ2 > Test statistic) 63.02% Pr(χ2 < Test statistic) 36.98% 2 Pr(χ > Critical value) Critical Value p-value is the area from the test statistic toward the alternative area. #NUM! p-value is “the probability of erroneously rejecting the null hypothesis.” © Copyright 2003. Do not distribute or copy without permission. 173 Variance Test Example: A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A sample of 12 tires has a standard deviation of 18,000 miles. The production process requires is faulty if the population standard deviation exceeds 20,000. The production process is OK if the population standard deviation is less than 20,000. H0 :   20,000 Ha :   20,000 H0 :   20,000 Ha :   20,000 There is a 63% chance that we would be wrong in believing that the production process required adjustment. There is a 37% chance that we would be wrong in believing that the production process is OK. Under most circumstances, we only regard probabilities below 5% as “unusual.” Therefore, the sample data does not clearly refute either null hypothesis.  The data tell us nothing. © Copyright 2003. Do not distribute or copy without permission. 174 Variance Test Example: A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A sample of 12 tires has a standard deviation of 18,000 miles. At a 5% level of significance, test the hypothesis that the population standard deviation equals 20,000. H0 :   20,000 Ha :   20,000 (12  1)(18,0002 ) Test statistic   8.91 2 20,000 2 Chi-Square Distribution (χ ) Test statistic Degrees of Freedom 11 2 Pr(χ > Test statistic) 100.00% 2 Pr(χ < Test statistic) 0.00% Pr(χ2 > Critical value) Critical Value 112 97.50% 3.816 2 Chi-Square Distribution (χ ) Test statistic Degrees of Freedom 11 2 Pr(χ > Test statistic) 100.00% 2 Pr(χ < Test statistic) 0.00% Pr(χ2 > Critical value) Critical Value © Copyright 2003. Do not distribute or copy without permission. Test statistic falls in null area  Fail to reject the null hypothesis. 2.50% 21.920 175 Variance Test Example: Inspectors check chlorine levels in water at a processing facility several times each day. The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die of disease; too much chlorine and people die of poisoning). Over a two day period, inspectors take the following readings. Test the hypothesis that the water is adequately treated at the 1% significance level. Chlorine samples (ppm) 3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4 Note: Although the data is non-negative, for the analysis of the sample mean, it is not necessary to perform the log-normal transformation. According to the Central Limit Theorem, sample means are t-distributed regardless of the distribution of the underlying data. Having said this, performing the log-transformation will not hurt and may improve the accuracy of the results somewhat. © Copyright 2003. Do not distribute or copy without permission. 176 Variance Test Example: Inspectors check chlorine levels in water at a processing facility several times each day. The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die of disease; too much chlorine and people die of poisoning). Over a two day period, inspectors take the following readings. Test the hypothesis that the water is adequately treated at the 1% significance level. Chlorine samples (ppm) 3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4 H0 :   3 Ha :   3 Sample mean of logs  1.156 Sample stdev of logs  0.195 1.156  ln(3) Test statistic   0.923 0.195 10 © Copyright 2003. Do not distribute or copy without permission. Test statistic falls in null area  Fail to reject null hypothesis. 177 Variance Test Example: Inspectors check chlorine levels in water at a processing facility several times each day. The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die of disease; too much chlorine and people die of poisoning). Over a two day period, inspectors take the following readings. Test the hypothesis that the water is adequately treated at the 1% significance level. Chlorine samples (ppm) 3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4 Note: Although the data is non-negative, for the analysis of the sample variance, it is not necessary to perform the log-normal transformation. This is because the distributions we use for analyzing variances and standard deviations (the chisquare and F-distributions) account for the fact that sample variance is nonnegative. © Copyright 2003. Do not distribute or copy without permission. 178 Variance Test Example: Test the hypothesis that the water is adequately treated at the 1% significance level. Chlorine samples (ppm) 3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4 H0 :   0.4 Ha :   0.4 Sample standard deviation  0.622 (10  1)(0.6222 ) Test statistic   21.76 0.4 2 2 Chi-Square Distribution (χ ) Test statistic Degrees of Freedom 9 Pr(χ2 > Test statistic) 100.00% Pr(χ2 < Test statistic) 0.00% Pr(χ2 > Critical value) Critical Value 1.00% 21.666 © Copyright 2003. Do not distribute or copy without permission. Test statistic falls in alternative area  Reject null hypothesis. 179 Variance Test Example: Inspectors check chlorine levels in water at a processing facility several times each day. The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die of disease; too much chlorine and people die of poisoning). Over a two day period, inspectors take the following readings. Test the hypothesis that the water is adequately treated at the 1% significance level. Chlorine samples (ppm) 3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4 The tests we conducted were at the 1% significance level. This means that there is a 1% probability that we might draw a sample that caused us to erroneously reject the null hypothesis. Suppose we want to err on the side of caution wherein we would rather risk finding that the water is not adequately treated when, in fact it is, than to risk finding that the water is adequately treated when, in fact, it is not. How should we adjust our significance level? Increase significance level of the test  increases the probability of rejecting the null when, in fact, the null is true. © Copyright 2003. Do not distribute or copy without permission. 180 Variance Test Example: Inspectors check chlorine levels in water at a processing facility several times each day. The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die of disease; too much chlorine and people die of poisoning). Over a two day period, inspectors take the following readings. Test the hypothesis that the water is adequately treated at the 10% significance level. Chlorine samples (ppm) 3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4 H0 :   3 Ha :   3 Sample mean of logs  1.156 Sample stdev of logs  0.195 1.156  ln(3) Test statistic   0.923 0.195 10 © Copyright 2003. Do not distribute or copy without permission. Test statistic falls in null area  Fail to reject null hypothesis. 181 Variance Test Example: Test the hypothesis that the water is adequately treated at the 10% significance level. Chlorine samples (ppm) 3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4 H0 :   0.4 Ha :   0.4 Sample standard deviation  0.622 (10  1)(0.6222 ) Test statistic   21.76 0.4 2 2 Chi-Square Distribution (χ ) Test statistic Degrees of Freedom 9 Pr(χ2 > Test statistic) 100.00% Pr(χ2 < Test statistic) 0.00% Pr(χ2 > Critical value) Critical Value 10.00% 14.684 © Copyright 2003. Do not distribute or copy without permission. Test statistic falls in alternative area  Reject null hypothesis. 182 Confidence Interval for a Variance When we constructed confidence intervals for sample means and for observations, we used the formula: measure  (critical value)(stdev of measure) This formula comes from the test statistic for normally (and t-) distributed random variables. Note: measure  upper limit stdev measure  lower limit measure  (cv)(stdev)  lower limit   cv  stdev measure  (cv)(stdev)  upper limit  cv  The formula for the critical value shown above (cv) is the same as the formula for the test statistic. test statistic  estimate  parameter stdev of estimate Therefore, when we find a confidence interval, what we are really doing is: 1. Setting the test statistic equal to the critical value that gives us the desired level of confidence, and 2. Solving for parameter. © Copyright 2003. Do not distribute or copy without permission. 183 Confidence Interval for a Variance Because the formula for the test statistic for a sample variance is different than the formula for the test statistic for a sample mean, we would expect the formula for the confidence interval to be different also. (N  1) estimate2 Test statistic  parameter 2 Setting the test statistic equal to the critical value that gives us the desired level of confidence, and solving for parameter, we get: (N  1) estimate2 parameter  critical value 2 (N  1) estimate2  parameter  critical value Note that we use only the positive root because standard deviations are non-negative. © Copyright 2003. Do not distribute or copy without permission. 184 Confidence Interval for a Variance Example: A sample of 10 observations has a standard deviation of 3. Find the 95% confidence interval for the population standard deviation. To find a 95% confidence interval, we need the two critical values that give 2.5% in the upper and lower tails. 2 2 Chi-Square Distribution (χ ) Test statistic Degrees of Freedom 9 2 Pr(χ > Test statistic) 100.00% 2 Pr(χ < Test statistic) 0.00% Chi-Square Distribution (χ ) Test statistic Degrees of Freedom 9 2 Pr(χ > Test statistic) 100.00% 2 Pr(χ < Test statistic) 0.00% Pr(χ2 > Critical value) Critical Value Pr(χ2 > Critical value) Critical Value 97.50% 2.700 Upper limit  (10  1) 32  5.48 2.700 Lower limit  (10  1) 32  2.06 19.023 © Copyright 2003. Do not distribute or copy without permission. 2.50% 19.023 We find that there is a 95% probability that the population standard deviation lies between 2.06 and 5.48. 185 Distribution of a Difference in Sample Variances In the same way that we had different procedures for testing a sample mean versus testing a difference in two sample means, we similarly have different procedures for testing a sample variance versus testing a difference in two sample variances. Where variances are concerned, however, we look not at the difference in the sample variances, but at the ratio of the sample variances. sa be a ratio of two sample standard deviations. sb s If s a  s b , then the ratio a will be greater than 1. sb s If s a  s b , then the ratio a will be less than 1. sb s If s a  s b , then the ratio a will equal 1. sb Let © Copyright 2003. Do not distribute or copy without permission. 186 Distribution of a Difference in Sample Variances Let s a and s b standard deviations taken from different populations. The properties of a ratio of sample standard deviations: Population ratio  a b s a2 is distributed FN a 1, N b 1 s b2 © Copyright 2003. Do not distribute or copy without permission. 187 Difference in Variances Test Example: A recent consumer behavior study was designed to test the “beer goggles” effect. A group of volunteers was shown pictures (head shots) of members of the opposite sex and asked to rate the people in the pictures according to attractiveness. Another group of volunteers was given two units of alcohol, shown the same pictures, and also asked to rate the people in the pictures according to attractiveness. Test the hypothesis that, when subjects consume alcohol, they (on average) find pictures of the opposite sex more attractive. The straightforward hypothesis test is a difference of means test where: H0 : drunk  sober  0 Ha : drunk  sober  0 © Copyright 2003. Do not distribute or copy without permission. 188 Difference in Variances Test Example: A recent consumer behavior study was designed to test the “beer goggles” effect. A group of volunteers was shown pictures (head shots) of members of the opposite sex and asked to rate the people in the pictures according to attractiveness. Another group of volunteers was given two units of alcohol, shown the same pictures, and also asked to rate the people in the pictures according to attractiveness. H0 : drunk  sober  0 Ha : drunk  sober  0 Suppose we collect data, run the appropriate tests and fail to reject the null hypothesis. Can we conclude (roughly speaking) that, on average, drinking alcohol causes one to find the opposite sex more attractive? Yes. However, it may be the case that the alcohol only affects a subset of the population. For example, perhaps only men are affected; or, perhaps only those who rarely drink are affected. The difference in means test does not detect these cases – it only detects differences in the average of all subjects in the samples. © Copyright 2003. Do not distribute or copy without permission. 189 Difference in Variances Test Example: A recent consumer behavior study was designed to test the “beer goggles” effect. A group of volunteers was shown pictures (head shots) of members of the opposite sex and asked to rate the people in the pictures according to attractiveness. Another group of volunteers was given two units of alcohol, shown the same pictures, and also asked to rate the people in the pictures according to attractiveness. Consider the following two scenarios (calculate the means and stdevs for the data sets): Scenario #1 Scenario #2 Sober 3, 2, 3, 1, 1, 3, 4, 2, 3, 4 Sober 3, 2, 3, 1, 1, 3, 4, 2, 3, 4 Drunk 4, 3, 4, 2, 2, 4, 5, 3, 4, 5 Everyone is affected. Drunk 3, 2, 3, 1, 1, 5, 6, 4, 5, 6 Only males are affected. Average rating for sober is 2.6 compared to an average rating for drunk of 3.6. Average rating for sober is 2.6 compared to an average rating for drunk of 3.6. Standard deviations for both sober and drunk are 1.07 because all 10 subjects were affected by the alcohol. Standard deviation for sober is 1.07, but for drunk is 1.90 because only males (the last 5 observations) were affected by the alcohol. © Copyright 2003. Do not distribute or copy without permission. 190 Difference in Variances Test Example: A recent consumer behavior study was designed to test the “beer goggles” effect. A group of volunteers was shown pictures (head shots) of members of the opposite sex and asked to rate the people in the pictures according to attractiveness. Another group of volunteers was given two units of alcohol, shown the same pictures, and also asked to rate the people in the pictures according to attractiveness. Consider the following two scenarios (calculate the means and stdevs for the data sets): Implication: A difference in means test would report the same result for scenarios #1 and #2 (population mean for drunk is greater than population mean for sober). But, a difference in variances test would show that all of the subjects were affected by the alcohol in scenario #1, while only some of the subjects were affected by the alcohol in scenario #2. © Copyright 2003. Do not distribute or copy without permission. 191 Difference in Variances Test Example: A recent consumer behavior study was designed to test the “beer goggles” effect. A group of volunteers was shown pictures (head shots) of members of the opposite sex and asked to rate the people in the pictures according to attractiveness. Another group of volunteers was given two units of alcohol, shown the same pictures, and also asked to rate the people in the pictures according to attractiveness. Using the scenario #2 data, test the hypotheses (at the 10% significance level): H0 :  drunk   sober Ha :  drunk   sober F Distribution Test statistic df in Numerator df in Denominator Pr(F > Test statistic) Pr(F < Test statistic) 9 9 100.00% 0.00% Pr(F > Critical value) Critical Value 95.00% 0.315 F Distribution Test statistic df in Numerator df in Denominator Pr(F > Test statistic) Pr(F < Test statistic) Pr(F > Critical value) Critical Value © Copyright 2003. Do not distribute or copy without permission. 9 9 100.00% 0.00% 5.00% 3.179 192 Difference in Variances Test Example: A recent consumer behavior study was designed to test the “beer goggles” effect. A group of volunteers was shown pictures (head shots) of members of the opposite sex and asked to rate the people in the pictures according to attractiveness. Another group of volunteers was given two units of alcohol, shown the same pictures, and also asked to rate the people in the pictures according to attractiveness. Using the scenario #2 data, test the hypotheses (at the 10% significance level): H0 :  drunk   sober Ha :  drunk   sober N drunk  10, N sober  10 s drunk  1.90, s sober  1.07 2 s drunk 1.902 Test statistic  2   3.12 s sober 1.072 3.12 Test statistic falls in the null area  Fail to reject the null hypothesis. © Copyright 2003. Do not distribute or copy without permission. 193 Difference in Variances Test Using the scenario #2 data, test the hypotheses (at the 10% significance level): In scenario #2, we know for certain that only males were affected, so we should expect to see a difference in the standard deviations across the two samples (sober vs. drunk). Why did we end up failing to reject the null hypothesis? 2.98 3.12  The result may be due to the small number of observations in the samples. What if we had only one more observation in each sample, but our sample standard deviations remained the same? Sample stdevs don’t change, so the test statistic doesn’t change. F Distribution Test statistic df in Numerator df in Denominator Pr(F > Test statistic) Pr(F < Test statistic) Pr(F > Critical value) Critical Value © Copyright 2003. Do not distribute or copy without permission. 10 10 100.00% 0.00% 5.00% 2.978 One more observation in each sample  df = 10 Critical value changes  now we reject the null hypothesis. 194 Hypothesis Testing: Summary Procedure for Hypothesis Testing 1. State hypotheses 2. Picture distribution* 3. Identify null and alternative regions 4. Calculate test statistic* significance level approach 5. Find critical value(s) that define alternative area(s) equal to the significance level 6. If test statistic falls in alternative area, reject null hypothesis. If test statistic falls in null area, fail to reject null hypothesis. p-value approach 5. p-value = area from test statistic toward alternative tail(s). p-value is “prob of being wrong in rejecting the null,” or “prob of results being due to random chance rather than due to null.” *procedure varies depending on the type of test being performed © Copyright 2003. Do not distribute or copy without permission. 195 Hypothesis Testing: Summary Hypothesis Test Mean Difference in means Proportion Difference in proportions Variance Difference in variances © Copyright 2003. Do not distribute or copy without permission. Test Statistic x  sx (x 1  x 2 )  ( 1  2 ) sx Distribution t N 1 t N , where N  s 2 x1  s x22 p  p Standard Normal provided Np  5, N (1  p )  5 ( p1  p 2 )  ( 1   2 ) Standard Normal provided N 1 p1  5, N 2 p 2  5,  1  2 (N  1)s 2  2 s 12 s 22 2 s x4 s x4  N1  1 N 2  1 1 1 x 2  2 N 1 (1  p1 )  5, N 2 (1  p 2 )  5  N2 1 FN1 1, N 2 1 196 Causal vs. Exploratory Analysis The goal of exploratory analysis is to obtain a measure of a phenomenon. Example: Subjects are given a new breakfast cereal to taste and asked to rate the cereal. The measured phenomenon is taste. Although taste is subjective, by taking the average of the measures from a large number of subjects, we can measure the underlying objective components that give rise to the subjective feeling of taste. © Copyright 2003. Do not distribute or copy without permission. 197 Causal vs. Exploratory Analysis The goal of causal analysis is to obtain the change in measure of a phenomenon due to the presence vs. absence of a control variable. Example: Two groups of subjects are given the same breakfast cereal to taste and are asked to rate the cereal. One group is given the cereal in a black and white box. The other in a multi-colored box. The two groups of subjects exist under identical conditions (same cereal, same testing environment, etc.), with the exception of the color of the cereal box. Because the color of the cereal box is the only difference between the two groups, we call the color of the box the control variable. If we find a difference in subjects’ reported tastes, then we know that the difference in perceived taste is due to the color (or lack of color) of the cereal box. It is possible that, apart from random chance, one group of subjects reports liking the cereal and the other does not (e.g. one group was tested in the morning and the other in the evening). We would call this a confound. A confound is the presence of an additional (and unwanted) difference in the two groups. When a confound is present, it makes it difficult (perhaps impossible) to determine how much of the difference in reported taste between the two groups is due to the control and how much is due to the confound. © Copyright 2003. Do not distribute or copy without permission. 198 Causal vs. Exploratory Analysis Because the techniques for causal and exploratory analysis are identical (with the exception that causal analysis includes the use of a control variable whereas exploratory analysis does not), we will limit our discussion to causal analysis. © Copyright 2003. Do not distribute or copy without permission. 199 Designing Survey Instruments The Likert Scale We use the Likert scale to rate responses to qualitative questions. Example: “Which of the following best describes your opinion of the taste of Coke?” Too Sweet 1 Very Sweet 2 Just Right 3 Slightly Sweet 4 Not Sweet 5 The Likert scale elicits more information than a simple “Yes/No” response  the analyst can gauge the degree rather than simply the direction of opinion. © Copyright 2003. Do not distribute or copy without permission. 200 Designing Survey Instruments Rules for Using the Likert Scale 1. 2. 3. 4. Use 5 or 7 gradations of response.  fewer than 5 yields too little information  more than 7 creates too much difficulty for respondents in distinguishing one response from another Always include a mid-point (or neutral) response. When appropriate, include a separate response for “Not applicable,” or “Don’t know.” When possible, include a descriptor with each response rather than simply a single descriptor on each end of the scale. Example: Yes Very Bad 1 No Very Bad 1 Bad 2 2 Neutral 3 3 Good 4 Very Good 5 4 Good 5 The presence of the lone words at the ends of the scale will introduce a bias by causing subjects to shun the center of the scale. © Copyright 2003. Do not distribute or copy without permission. 201 Designing Survey Instruments Rules for Using the Likert Scale 5. Use the same words and (where possible) the same number of words for each descriptor. Example: Yes Very Bad 1 Bad 2 Neutral 3 Good 4 Very Good 5 No Bad 1 Poor 2 OK 3 Better 4 Best 5 When using different words for different descriptors, subjects may perceive varying quantities of difference between points on the scale. For example, subjects may perceive that the difference between “Bad” and “Poor” is less than the difference between “Poor” and “OK.” © Copyright 2003. Do not distribute or copy without permission. 202 Designing Survey Instruments Rules for Using the Likert Scale 6. Avoid using zero as an endpoint on the scale. Example: Yes Very Bad 1 Bad 2 Neutral 3 Good 4 Very Good 5 No Very Bad 0 Bad 1 Neutral 2 Good 3 Very Good 4 On average, subjects will associate the number zero with “bad.” Thus, using zero at the endpoint of the scale can bias subjects away from the side of the scale with the zero. © Copyright 2003. Do not distribute or copy without permission. 203 Designing Survey Instruments Rules for Using the Likert Scale 7. Avoid using unbalanced negative numbers. Example: Yes Very Bad -2 Bad -1 Neutral 0 Good 1 Very Good 2 No Very Bad -3 Bad -2 Neutral -1 Good 0 Very Good 1 Subjects associate negative numbers with “bad.” If you have more negative numbers on one side of the scale than the other, subjects will be biased away from that side of the scale. © Copyright 2003. Do not distribute or copy without permission. 204 Designing Survey Instruments Rules for Using the Likert Scale 8. Keep the descriptors balanced. Example: Yes Very Bad 1 Bad 2 Neutral 3 Good 4 Very Good 5 No Very Bad 1 Bad 2 Slightly Good 3 Good 4 Very Good 5 Subjects will be biased toward the side with more descriptors. © Copyright 2003. Do not distribute or copy without permission. 205 Designing Survey Instruments Rules for Using the Likert Scale 9. Arrange the scale so as to maintain (1) symmetry around the neutral point, and (2) consistency in the intervals between points. Example: Yes Very Bad 1 No Very Bad 1 No Very Bad 1 Bad 2 Neutral 3 Good 4 Very Good 5 Bad Neutral Good Very Good 2 3 4 5 Bad 2 Neutral Good 3 4 Very Good 5 In the second example, subjects perceive the difference between “Neutral” and “Very Bad” to be greater than the difference between “Neutral” and “Very Good.” Responses will be biased toward the right side of the scale. In the third example, subjects perceive the difference between “Very Bad” and “Bad” to be greater than the difference between “Bad” and “Neutral.” Responses will be biased toward the center of the scale. © Copyright 2003. Do not distribute or copy without permission. 206 Designing Survey Instruments Rules for Using the Likert Scale 10. Use multi-item scales for ill-defined constructs. Example: “I liked the product.” Strongly Agree Agree 1 2 Yes No Neutral 3 Disagree 4 Strongly Disagree 5 “I am satisfied with the product.” Strongly Agree Agree Neutral 1 2 3 Disagree 4 Strongly Disagree 5 “I believe that this is a good product.” Strongly Agree Agree Neutral 1 2 3 Disagree 4 Strongly Disagree 5 “I liked the product.” Strongly Agree Agree 1 2 Disagree 4 Strongly Disagree 5 © Copyright 2003. Do not distribute or copy without permission. Neutral 3 207 Designing Survey Instruments Rules for Using the Likert Scale 10. Use multi-item scales for ill-defined constructs. Ill-defined constructs may be interpreted differently by different people. Use the multi-item scale (usually three items) and then average the items to obtain a single response for the ill-defined construct. Example: The ill-defined construct is Product satisfaction We construct three questions, each of which touch of the idea of product satisfaction. A subject gives the following responses: “I liked the product.” “I am satisfied with the product.” “I believe that this is a good product.” 4 4 3 Average response for Product satisfaction is 3.67 © Copyright 2003. Do not distribute or copy without permission. 208 Designing Survey Instruments Rules for Using the Likert Scale 10. Use multi-item scales for ill-defined constructs. Be careful that the multi-item scales all measure the same ill-defined construct. Yes “I liked the product.” “I am satisfied with the product.” “I believe that this is a good product.” No “I liked the product.” “I am satisfied with the product.” “I will purchase the product.” The statement “I will purchase the product” includes the consideration of “price” which the other two questions do not. © Copyright 2003. Do not distribute or copy without permission. 209 Designing Survey Instruments Rules for Using the Likert Scale 11. Occasionally, it is useful to verify that the subjects are giving considered (as opposed to random) answers. To do this, ask the same question more than once at different points in the survey. Look at the variance of the responses across the multiple instances of the question. If the subject is giving considered answers, the variance should be small. © Copyright 2003. Do not distribute or copy without permission. 210 Designing Survey Instruments Rules for Using the Likert Scale 12. Avoid self-referential questions. Yes “How do you perceive that others around you feel right now?” No “How do you feel right now?” Self-referential questions elicit bias because they encourage the respondent to answer subsequent questions consistently with the self-referential question. Example: If we ask the subject how he feels and he responds positively, then his subsequent answers will be biased in a positive direction. The subject will, unconsciously, attempt to behave consistently with his reported feelings. Exception: You can ask a self-referential question if it is the last question in the survey. As long as the subject does not go back and change previous answers, there is no opportunity for the self-reference to bias the subject’s responses. © Copyright 2003. Do not distribute or copy without permission. 211 Designing Survey Instruments Example: We want to test the effect of relevant news on purchase decisions. Specifically, we want to know if the presence of positive news about a low-cost product increases the probability of consumers purchasing that product. Causal Design: We will expose two subjects to news announcements about aspirin. The control group will see a neutral announcement that says nothing about the performance of aspirin. The experimental group will see a positive announcement that says that aspirin has positive health benefits. After exposure to the announcements, we will ask each group to rate their attitudes toward aspirin. Our hypothesis is that there is no difference in the average attitudes toward aspirin between the two groups. To account for possible preconceptions about aspirin, before we show the subjects the news announcements, we will ask how frequently they take aspirin. To account for possible gender effects, we will also ask subjects to report their genders. © Copyright 2003. Do not distribute or copy without permission. 212 Designing Survey Instruments How often do you take aspirin? Infrequently 1 2 Occasionally 3 4 5 Frequently 6 7 Please identify your gender (M/F). All subjects are first asked to respond to these questions. © Copyright 2003. Do not distribute or copy without permission. 213 Designing Survey Instruments Subjects in the control group see this news announcement. The analyst reads the headline and the introductory paragraph. Subjects in the control group are then asked to answer this question. Please rate your attitude toward aspirin. Unfavorable 1 © Copyright 2003. Do not distribute or copy without permission. 2 Neutral 3 4 Favorable 5 6 7 214 Designing Survey Instruments Subjects in the experimental group see this news announcement. The analyst reads the headline and the introductory paragraph. Subjects in the experimental group are then asked to answer this question. Please rate your attitude toward aspirin. Unfavorable 1 © Copyright 2003. Do not distribute or copy without permission. 2 Neutral 3 4 Favorable 5 6 7 215 Designing Survey Instruments Results: Results for an actual experiment are shown below. Test the following hypotheses: The data is in Data Set #3. H0 :  control  baseline Attitude 7 4 5 5 4 6 7 4 5 1 5 5 5 3 2 5 5 4 6 4 6 4 5 Use 1 2 3 3 4 4 1 4 4 2 4 2 3 1 1 2 2 1 3 1 2 1 4 Gender (1=male, 0=female) 1 1 1 0 0 0 1 1 0 0 1 0 1 1 1 0 0 1 1 0 1 0 1 © Copyright 2003. Do not distribute or copy without permission. Group (1=control, 0=baseline) 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 Ha :  control  baseline H0 :  control   baseline Ha :  control   baseline Rejecting the null in the first set of hypotheses would indicate that the news did have an impact on subjects’ attitudes toward aspirin. Rejecting the null in the second set of hypotheses would indicate that the news had an impact on the degree of disparity in subjects’ attitudes toward aspirin. 216 Designing Survey Instruments Note: The survey responses are non-negative (the lowest possible response is 1). This may suggest that a log-normal transformation is appropriate. However, we are testing the mean of the observations, therefore, by the Central Limit Theorem, do not need to perform the log-normal transformation. © Copyright 2003. Do not distribute or copy without permission. 217 Designing Survey Instruments Test the following hypotheses: H0 :  experimental   control Ha :  experimental   control x control  4.82 s control  1.66 N control  11 x experimental  4.50 s experimental  1.17 X1 bar Difference in Means Test 4.820 Stdev(X1 bar - X2 bar) Sx1 N1 1.660 Test statistic (distributed t) 11 df X2 bar 4.500 Sx2 1.170 N2 N experimental  12 p-value = (30.13%)(2) = 60.26% 0.604 0.530 17.82 12 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) 0.530 18 30.13% Pr(t < Test statistic) 69.87% There is a 60% chance that we would be incorrect in believing that the news altered the subjects’ average attitude toward aspirin. © Copyright 2003. Do not distribute or copy without permission. 218 Designing Survey Instruments Test the following hypotheses: H0 :  control   baseline Ha :  control   baseline s control  1.66 N control  11 s experimental  1.17 N experimental  12 2 s control 1.662 Test statistic  2   2.01 2 s baseline 1.17 F Distribution Test statistic df in Numerator df in Denominator Pr(F > Test statistic) Pr(F < Test statistic) 2.010 10 11 13.38% 86.62% p-value = (13.38%)(2) = 26.76% There is a 27% chance that we would be incorrect in believing that the news altered the disparity in subjects’ attitudes toward aspirin. Conclusion In market research, we typically use 10% as the cut-off for determining “significance” of results. Advertising had no significant effect on the average attitude toward aspirin nor on the disparity of attitudes toward aspirin. © Copyright 2003. Do not distribute or copy without permission. 219 Designing Survey Instruments The results appear to indicate that the news announcement had no effect at all on the subjects. It is possible that the news announcement does not affect people who do not take aspirin. Let us filter the data set, removing all subjects who report that they infrequently use aspirin. Our filtered data set will include only those subjects who responded with at least 2 to the question regarding frequency of use. Filtered data set © Copyright 2003. Do not distribute or copy without permission. Attitude Use Gender (1=male, 0=female) Group (1=control, 0=baseline) 4 5 5 4 6 2 3 3 4 4 1 1 0 0 0 0 0 0 0 0 4 5 1 5 5 5 4 4 2 4 2 3 1 0 0 1 0 1 0 0 0 0 1 1 5 5 2 2 0 0 1 1 6 3 1 1 6 2 1 1 5 4 1 1 220 Designing Survey Instruments Test the following hypotheses: H0 :  experimental   control Ha :  experimental   control x control  4.33 s control  1.41 N control  9 x experimental  5.33 X1 bar Sx1 Difference in Means Test 4.330 Stdev(X1 bar - X2 bar) 1.410 Test statistic (distributed t) N1 9 df X2 bar 5.330 Sx2 0.520 N2 s experimental  0.52 N experimental  7 p-value = (3.90%)(2) = 7.8% 0.509 (1.963) 10.61 7 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) Pr(t < Test statistic) (1.963) 10 96.10% 3.90% Pr(t > Critical value) Critical Value ########### There is a 8% chance that we would be incorrect in believing that the news altered the subjects’ average attitude toward aspirin. © Copyright 2003. Do not distribute or copy without permission. 221 Designing Survey Instruments Test the following hypotheses: H0 :  experimental   control Ha :  experimental   control s control  1.41 N control  9 s experimental  0.52 N experimental  7 Test statistic  2 s control 2 s experimental 1.412   7.35 0.522 F Distribution Test statistic df in Numerator df in Denominator Pr(F > Test statistic) Pr(F < Test statistic) 7.350 8 6 1.28% 98.72% Pr(F > Critical value) Critical Value 0.00% ########## © Copyright 2003. Do not distribute or copy without permission. p-value = (1.28%)(2) = 3.56% There is a 4% chance that we would be incorrect in believing that the news altered the disparity in subjects’ attitudes toward aspirin. 222 Designing Survey Instruments The results using the filtered data appear to indicate that, for subjects who report using aspirin more than “infrequently”: 1. The news announcement significantly changed (increased) subjects’ average attitude toward aspirin. 2. The news announcement significantly changed (decreased) the disparity in subjects attitudes toward aspirin. The increase in subjects’ attitudes toward aspirin is what the aspirin manufacturer would hope for. The decrease in disparity of attitudes is an added bonus. This can be interpreted as a reduction in the “uncertainty” of the benefit of aspirin. © Copyright 2003. Do not distribute or copy without permission. 223 A Look Back… Thus far, we have learned the following statistical techniques… Calculating probabilities using Marginal probability Joint probability Disjoint probability Conditional probability Bayes’ theorem Estimating probabilities for Binomial processes Hypergeometric processes Poisson processes Constructing confidence intervals for Single observations Population means Population proportions Population variances Conducting hypothesis tests for Population mean Population proportion Population variance Difference in two population means Difference in two population proportions Difference in two population variances © Copyright 2003. Do not distribute or copy without permission. 224 Regression Analysis In regression analysis, we look at how one variable (or a group of variables) can affect another variable. We use a technique called “ordinary least squares” or OLS. The OLS technique looks at a sample of two (or more) variables and filters out random noise so as to find the underlying deterministic relationship among the variables. Example: A retailer suspects that monthly sales follow unemployment rate announcements with a onemonth lag. When the Bureau of Labor Statistics announces that the unemployment rate is up, one month later, sales appear to fall. When the BLS announces that the unemployment rate is down, one month later, sales appear to rise. The retailer wants to know if this relationship actually exists. If so, the retailer can use BLS announcements to help predict future sales. In linear regression analysis, we assume that the relationship between the two variables (in this example, sales and unemployment rate) is linear and that any deviation from the linear relationship must be due to noise (i.e. unaccounted randomness in the data). © Copyright 2003. Do not distribute or copy without permission. 225 Regression Analysis Example: The chart below shows data (see Data Set #4) on sales and the unemployment rate collected over a 10 month period. Date Montly Sales (current month) (current month) January $257,151 February $219,202 March $222,187 April $267,041 May $265,577 June $192,566 July $197,655 August $200,370 September $203,730 October $181,303 Unemployment Rate (previous month) 4.5% 4.7% 4.6% 4.4% 4.8% 4.9% 5.0% 4.9% 4.7% 4.8% Notice that the relationship (if there is one) between the unemployment rate and sales is subject to some randomness. Over some months (e.g. May to June), an increase in the previous month’s unemployment rate corresponds to a decrease in the current month’s sales. But, over other months (e.g. June to July), an increase in the previous month’s unemployment rate corresponds to an increase in the current month’s sales. © Copyright 2003. Do not distribute or copy without permission. 226 Regression Analysis Example: It is easier to picture the relationship between unemployment and sales if we graph the data. Since we are hypothesizing that changes in the unemployment rate cause changes in sales, we put unemployment on the horizontal axis and sales on the vertical axis. $280,000 Sales (current month) $260,000 $240,000 $220,000 $200,000 $180,000 $160,000 4.3% 4.4% 4.5% 4.6% 4.7% 4.8% 4.9% 5.0% 5.1% Unemployment Rate (previous month) © Copyright 2003. Do not distribute or copy without permission. 227 Regression Analysis Example: OLS finds the line that most closely fits the data. Because we have assumed that the relationship is linear, two numbers describe the relationship: (1) the slope, and (2) the vertical intercept. $280,000 Sales (current month) $260,000 $240,000 $220,000 $200,000 $180,000 ^ y = -11,648,868x + 771,670 Sales  771,670  11,648, 868 (unemp rate) $160,000 4.3% 4.4% 4.5% 4.6% 4.7% 4.8% 4.9% 5.0% 5.1% Unemployment Rate (previous month) Slope = –11,648,868 Vertical intercept = 771,670 © Copyright 2003. Do not distribute or copy without permission. 228 Regression Analysis The graph below shows two relationships: 1. The regression model is the scattering of dots and represents the actual data. 2. The estimated (or fitted) regression model is the line and represents the regression model after random noise has been removed. After eliminating noise, we estimate that sales should have been 771,670 – (11,648,868)(0.045) = $247,471 Regression model Salest     (unemp ratet 1 )  ut …is observed with sales of $257,151 $280,000 True intercept and slope $260,000 Estimated noise associated with this observation ^ Sales (current month)  Sales  Sales  $257,151  $247, 471  $9, 680  uˆt Noise (also called “error term”) $240,000 Estimated regression model $220,000 ^ Salest  ˆ  ˆ(unemp ratet 1 ) $200,000 Estimated intercept, slope, and sales after estimating and removing noise $180,000 ^ y = -11,648,868x + 771,670 Sales  771,670  11,648, 868 (unemp rate) $160,000 4.3% 4.4% 4.5% 4.6% 4.7% 4.8% 4.9% 5.0% 5.1% Unemployment Rate (previous month) Unemp rate of 4.5%… © Copyright 2003. Do not distribute or copy without permission. 229 Regression Analysis Terminology: Variables on the right hand side of the regression equation are called exogenous, or explanatory, or independent variables. They usually represent variables that are assumed to influence the left hand side variable. The variable on the left hand side of the regression equation is called the endogenous, or outcome, or dependent variable. The dependent variable is the variable whose behavior you are interested in analyzing. The intercept and slopes of the regression model are called parameters. The intercept and slopes of the estimated (or fitted) regression model are called estimated parameters. The noise term in the regression model is called the error or noise. The estimated error is called the residual, or estimated error. Regression model Fitted (estimated) model Y    X u Yˆ  ˆ  ˆX Outcome variable Fitted (estimated) Error (noise) outcome variable Parameters © Copyright 2003. Do not distribute or copy without permission. uˆ  Y Yˆ Explanatory variable Residual (estimated error) Parameter estimates 230 Regression Analysis OLS estimates the regression model parameters by selecting parameter values that minimize the variance of the residuals. = Residual  difference between actual and fitted values of the outcome variable. $280,000 Sales (current month) $260,000 $240,000 $220,000 $200,000 $180,000 y = -11,648,868x + 771,670 $160,000 4.3% 4.4% 4.5% 4.6% 4.7% 4.8% 4.9% 5.0% 5.1% Unemployment Rate (previous month) © Copyright 2003. Do not distribute or copy without permission. 231 Regression Analysis OLS estimates the regression model parameters by selecting parameter values that minimize the variance of the residuals. = Residual  difference between actual and fitted values of the outcome variable. Choosing different parameter values moves the estimated regression line away (on average) from the data points. This results in increased variance in the residuals. $280,000 Sales (current month) $260,000 $240,000 $220,000 $200,000 $180,000 y = -11,648,868x + 771,670 $160,000 4.3% 4.4% 4.5% 4.6% 4.7% 4.8% 4.9% 5.0% 5.1% Unemployment Rate (previous month) © Copyright 2003. Do not distribute or copy without permission. 232 Regression Analysis To perform regression in Excel: (1) Select TOOLS, then DATA ANALYSIS (2) Select REGRESSION © Copyright 2003. Do not distribute or copy without permission. 233 Regression Analysis To perform regression in Excel: (3) Enter the range of cells containing outcome (“Y”) and explanatory (“X”) variables (4) Enter a range of cells for the output Constant is zero Check this box to force the vertical intercept to be zero. Confidence level Excel automatically reports 95% confidence intervals. Check this box and enter a level of confidence if you want a different confidence interval. Residuals Check this box if you want Excel to report the residuals. Standardized residuals Check this box if you want Excel to report the residuals in terms of standard deviations from the mean. © Copyright 2003. Do not distribute or copy without permission. 234 Regression Analysis Regression results Vertical intercept estimate Slope estimate 95% confidence interval around parameter estimate Test statistic and p-value for H0: parameter = 0 Standard deviation of slope estimate Standard deviation of vertical intercept estimate © Copyright 2003. Do not distribute or copy without permission. 235 Distribution of Regression Parameter Estimates If we select a different sample of observations from a population and then perform OLS, we will obtain slightly different parameter estimates. Thus, regression parameter estimates are random variables. Let ˆ be a regression parameter estimate. The properties of a regression parameter estimates: Population parameter   Standard deviation of  varies depending on the regression mode ˆ is distributed t N k , where k = number of parameters in the regression model © Copyright 2003. Do not distribute or copy without permission. 236 Distribution of Regression Parameter Estimates Regression demo Enter population values here. Spreadsheet selects a sample from the population and calculates parameter estimates based on the sample. Press F9 to select a new sample. © Copyright 2003. Do not distribute or copy without permission. 237 Regression Analysis Example: Proponents of trade restrictions claim that free trade costs American jobs because of foreign competition. Free trade advocates claim that free trade creates American jobs because of foreign demand for American products. Using regression analysis, test the hypothesis that higher levels of unemployment accompany lower levels of trade restrictions. © Copyright 2003. Do not distribute or copy without permission. 238 Regression Analysis 1. State the regression model. Unemp Ratet  0  1 (Freedom of Tradet )  ut Problem: We don’t have a measure for freedom of trade. Solution: Greater trade freedom results in more trade, so use total trade as a proxy for freedom of trade. Unemp Ratet  0  1 (Total Tradet )  ut Problem: Because the economy grows over time, we would expect total trade to grow over time also. Solution: Instead of looking at total trade, look at trade as a percentage of GDP. This measure tells us what percentage of total economic activity is devoted to trade.  Total Trade Unemp Ratet  0  1  GDP t  © Copyright 2003. Do not distribute or copy without permission.    ut  239 Regression Analysis Regression model  Total Trade Unemp Ratet  0  1  GDP t     ut  2. Collect the data. Data Set #5 contains the following information (for the U.S., 1/92 through 3/03): 1. 2. 3. 4. Unemployment rate Volume of Exports Volume of Imports Gross domestic product (GDP)  Calculate total trade as a % of GDP © Copyright 2003. Do not distribute or copy without permission. 240 Regression Analysis Regression model  Total Trade Unemp Ratet  0  1  GDP t     ut  3. State the hypotheses. Our hypothesis is: “Higher levels of unemployment accompany lower levels of trade restrictions.” The explanatory variable we are using is a proxy for freedom of trade, not trade restrictions. Restating in terms of freedom of trade, our hypothesis becomes: “Higher levels of unemployment accompany higher levels of freedom of trade.” In statistical notation, the hypotheses are: H0 : 1  0 Ha : 1  0 © Copyright 2003. Do not distribute or copy without permission. 241 Regression Analysis Regression model  Total Trade Unemp Ratet  0  1  GDP t  H0 : 1  0    ut  Ha : 1  0 4. Estimate the regression parameters using OLS. SUMMARY OUTPUT Regression Statistics Multiple R 0.895688128 R Square 0.802257223 Adjusted R Square 0.800770435 Standard Error 0.004771175 Observations 135 ˆ0 ˆ1 s ˆ 0 s ˆ 1 ANOVA df Regression Residual Total Intercept X Variable 1 1 133 134 SS 0.012283307 0.003027626 0.015310933 MS 0.012283307 2.27641E-05 Coefficients Standard Error t Stat 0.190194429 0.00586219 32.444264 -7.205346804 0.310186266 -23.22909678 © Copyright 2003. Do not distribute or copy without permission. F Significance F 539.5909374 1.19383E-48 P-value 4.82099E-65 1.19383E-48 Lower 95% Upper 95% Lower 95.0% Upper 95.0% 0.178599252 0.201789605 0.178599252 0.201789605 -7.818882827 -6.591810782 -7.818882827 -6.591810782 242 Regression Analysis Regression model  Total Trade Unemp Ratet  0  1  GDP t  H0 : 1  0    ut  Ha : 1  0 5. Construct the test statistic. Test statistic  Test value  hypothesized value ˆ1  1 7.205  0    23.23 standard deviation s ˆ 0.310 1 © Copyright 2003. Do not distribute or copy without permission. 243 Regression Analysis Regression model  Total Trade Unemp Ratet  0  1  GDP t  H0 : 1  0    ut  Ha : 1  0 6. Picture the distribution and identify the null and alternative areas. t133 © Copyright 2003. Do not distribute or copy without permission. 244 Regression Analysis Regression model  Total Trade Unemp Ratet  0  1  GDP t  H0 : 1  0    ut  Ha : 1  0 7. Insert the test statistic and find the area of the alternative tail (p-value approach). t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) Pr(t < Test statistic) (23.300) 133 100.00% 0.00% Pr(t > Critical value) Critical Value t133 -23.23 © Copyright 2003. Do not distribute or copy without permission. ########### p-value = 0.00% The probability of our being wrong in believing that higher levels of unemployment are associated with lower levels of free trade is virtually 0%. 245 Regression Analysis Regression model  Total Trade Unemp Ratet  0  1  GDP t  H0 : 1  0    ut  Ha : 1  0 7. Insert the test statistic and find the area of the alternative tail (p-value approach). SUMMARY OUTPUT Note: The test statistic and p-value (for two-tailed test) are given in the output. Regression Statistics Multiple R 0.895688128 R Square 0.802257223 Adjusted R Square 0.800770435 Standard Error 0.004771175 Observations 135 ANOVA df Regression Residual Total Intercept X Variable 1 1 133 134 SS 0.012283307 0.003027626 0.015310933 MS 0.012283307 2.27641E-05 Coefficients Standard Error t Stat 0.190194429 0.00586219 32.444264 -7.205346804 0.310186266 -23.22909678 © Copyright 2003. Do not distribute or copy without permission. F Significance F 539.5909374 1.19383E-48 P-value 4.82099E-65 1.19383E-48 Lower 95% Upper 95% Lower 95.0% Upper 95.0% 0.178599252 0.201789605 0.178599252 0.201789605 -7.818882827 -6.591810782 -7.818882827 -6.591810782 246 Regression Analysis Regression model  Total Trade Unemp Ratet  0  1  GDP t  H0 : 1  0    ut  Ha : 1  0 8. Check results by looking at a graph of the data. January 1992 - March 2003 9% 8% Unemployment Rate 7% 6% 5% 4% 3% 2% 1% 0% 1.5% 1.6% 1.7% 1.8% 1.9% 2.0% 2.1% 2.2% 2.3% Trade as % of GDP © Copyright 2003. Do not distribute or copy without permission. 247 Correlation vs. Causation Regression model  Total Trade Unemp Ratet  0  1  GDP t     ut  Our results only indicate that higher levels of free trade are associated with lower levels of unemployment. The results do not say anything about causality. Example: The incidence of alarm clocks going off is strongly associated with the rising of the sun. However, this does not mean that alarm clocks cause the sun to rise. The relationship is correlational not causational. Example: Could it be that the relationship between free trade and the unemployment rate is reverse causal?  Perhaps lower levels of unemployment cause higher levels of trade rather than higher levels of trade causing lower levels of unemployment. © Copyright 2003. Do not distribute or copy without permission. 248 Correlation vs. Causation Regression model  Total Trade Unemp Ratet  0  1  GDP t     ut  One way to check for causality (though, technically, this is not a rigorous test), is to look for a relationship that spans time. Example: If higher levels of free trade causes lower levels of unemployment, then past trade levels should be negatively related to future unemployment levels. To run this (quasi) test for causality, let us alter our regression model as follows:  Total Trade  Unemp Ratet  0  1    ut GDP t  6   The unemployment rate today… …is a function of trade six months ago. © Copyright 2003. Do not distribute or copy without permission. 249 Correlation vs. Causation Regression model  Total Trade  Unemp Ratet  0  1    ut GDP t 6   H0 : 1  0 Ha : 1  0 Test statistic   SUMMARY OUTPUT Test value  hypothesized value standard deviation ˆ1  1 6.109  0   17.01 s ˆ 0.359 1 Regression Statistics Multiple R 0.833589958 R Square 0.694872218 Adjusted R Square 0.692469637 Standard Error 0.005515358 Observations 129 Probability of wrongly rejecting the null hypothesis is (virtually) 0%. ANOVA df Regression Residual Total Intercept X Variable 1 1 127 128 SS 0.008797804 0.003863235 0.012661039 MS 0.008797804 3.04192E-05 Coefficients Standard Error t Stat 0.168489919 0.006784649 24.83399257 -6.109445472 0.359243017 -17.00644184 © Copyright 2003. Do not distribute or copy without permission. F Significance F 289.219064 1.55366E-34 P-value 1.39934E-50 1.55366E-34 Lower 95% Upper 95% Lower 95.0% Upper 95.0% 0.155064324 0.181915514 0.155064324 0.181915514 -6.820322542 -5.398568401 -6.820322542 -5.398568401 250 Correlation vs. Causation Regression model  Total Trade  Unemp Ratet  0  1    ut GDP t 6   Notice that our regression model is expressed in terms of levels. The regression assumes that the level of the unemployment rate is a function of the level of trade (as a % of GDP). Another way to test for causality is to look at the relationship between changes instead of levels of data. Such a relationship would assume that the change in the unemployment rate is a function of the change in trade (as a % of GDP). The level relationship says: “When trade is high, unemployment is low.” The change relationship says: “When trade increases, unemployment decreases.” © Copyright 2003. Do not distribute or copy without permission. 251 Correlation vs. Causation Regression model  Total Trade  Unemp Ratet  0  1     ut GDP t 6   Change in the unemployment rate from month t –1 to month t. We use capital delta to signify “change.” By convention, a delta in front of a variable indicates the change from the previous observation to the current observation. The regression model shown above assumes that the change in unemployment from time t –1 to time t is a function of the change in total trade (as a % of GDP) from time t –7 to time t –6. © Copyright 2003. Do not distribute or copy without permission. 252 Correlation vs. Causation We must discard these observations because there are no matching observations in the explanatory variable. Regression model  Total Trade  Unemp Ratet  0  1     ut GDP t 6   When computing changes and taking lags, be extremely careful not to make errors in lining up the data with the dates. The chart below shows the first few rows of data for Data Set #4 after the appropriate changes and lags have been made. Date Jan-92 Feb-92 Mar-92 Apr-92 May-92 Jun-92 Jul-92 Aug-92 Sep-92 Oct-92 Nov-92 Dec-92 Jan-93 Feb-93 Mar-93 Apr-93 Unemploymentt 0.073 0.074 0.074 0.074 0.076 0.078 0.077 0.076 0.076 0.073 0.074 0.074 0.073 0.071 0.07 0.071 Trade GDP t 0.01651 0.01664 0.01643 0.01649 0.01650 0.01684 0.01701 0.01642 0.01673 0.01697 0.01671 0.01674 0.01664 0.01648 0.01709 0.01706 Unemploymentt 1 0.073 0.074 0.074 0.074 0.076 0.078 0.077 0.076 0.076 0.073 0.074 0.074 0.073 0.071 0.070 Trade GDP t 1 0.01651 0.01664 0.01643 0.01649 0.01650 0.01684 0.01701 0.01642 0.01673 0.01697 0.01671 0.01674 0.01664 0.01648 0.01709 Outcome variable © Copyright 2003. Do not distribute or copy without permission. Unemploymentt 0.001 0.000 0.000 0.002 0.002 -0.001 -0.001 0.000 -0.003 0.001 0.000 -0.001 -0.002 -0.001 0.001  Trade GDP t 0.00013 -0.00020 0.00005 0.00002 0.00034 0.00017 -0.00059 0.00032 0.00024 -0.00026 0.00002 -0.00010 -0.00017 0.00062 -0.00003  Trade GDP t 6 0.00013 -0.00020 0.00005 0.00002 0.00034 0.00017 -0.00059 0.00032 0.00024 Explanatory variable 253 Correlation vs. Causation Regression model  Total Trade  Unemp Ratet  0  1     ut GDP t 6   H0 : 1  0 Ha : 1  0 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) (2.059) 127 97.92% Pr(t < Test statistic) 2.08% Pr(t > Critical value) ˆ  1 0.978  0 Test statistic  1   2.059 s ˆ 0.4749 1 SUMMARY OUTPUT Regression Statistics Multiple R 0.180447306 R Square 0.03256123 Adjusted R Square 0.024883145 Standard Error 0.001397002 Observations 128 Critical Value ########### Probability of being incorrect in rejecting the null hypothesis is 2.1%. Warning: This is a two-tailed p-value. ANOVA df Regression Residual Total Intercept X Variable 1 1 126 127 SS 8.2764E-06 0.000245903 0.00025418 MS F Significance F 8.2764E-06 4.240800722 0.041523461 1.95161E-06 Coefficients Standard Error t Stat P-value -0.000128959 0.00012384 -1.04133302 0.299714994 -0.978057529 0.474941881 -2.059320451 0.041523461 © Copyright 2003. Do not distribute or copy without permission. Lower 95% Upper 95% Lower 95.0% Upper 95.0% -0.000374035 0.000116117 -0.000374035 0.000116117 -1.917953036 -0.038162022 -1.917953036 -0.038162022 254 Correlation vs. Causation Regression model  Total Trade  Unemp Ratet  0  1     ut GDP t 6   H0 : 1  0 Ha : 1  0 Test statistic  t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) (2.059) 127 97.92% Pr(t < Test statistic) 2.08% Pr(t > Critical value) ˆ1  1 s ˆ 1  0.978  0  2.059 0.4749 Critical Value ########### Probability of being incorrect in rejecting the null hypothesis is 2.1%. Conclusion: The data support the proposition that an increase in trade (as a % of GDP) today is associated with a decrease in the unemployment rate six months later. © Copyright 2003. Do not distribute or copy without permission. 255 Regression Analysis Applications of regression analysis: 1. 2. Impact study Prediction Impact study: Impact studies are concerned with measuring the impact of explanatory variables on an outcome variable. Whether or not the resultant regression model adequately predicts the outcome variable is (for the most part) inconsequential. Prediction: Prediction models are concerned with accounting for as many sources of influence on the outcome variable as possible. The more sources of influence that can be accounted for, the better the model is able to predict the outcome variable. To what extent the explanatory variables impact the outcome variable is (for the most part) inconsequential. © Copyright 2003. Do not distribute or copy without permission. 256 Regression Analysis Regression model  Total Trade  Unemp Ratet  0  1     ut GDP t 6   R2 measures the proportion of variation in the outcome variable that is accounted for by variations in the explanatory variables. SUMMARY OUTPUT Example: Regression Statistics Multiple R 0.180447306 R Square 0.03256123 Adjusted R Square 0.024883145 Standard Error 0.001397002 Observations 128 In our regression model, fluctuations in the change in our trade measure (lagged 6 months) account for 3.3% of fluctuations in the change in the unemployment rate. ANOVA df Regression Residual Total Intercept X Variable 1 1 126 127 SS 8.2764E-06 0.000245903 0.00025418 MS F Significance F 8.2764E-06 4.240800722 0.041523461 1.95161E-06 Coefficients Standard Error t Stat P-value -0.000128959 0.00012384 -1.04133302 0.299714994 -0.978057529 0.474941881 -2.059320451 0.041523461 © Copyright 2003. Do not distribute or copy without permission. Lower 95% Upper 95% Lower 95.0% Upper 95.0% -0.000374035 0.000116117 -0.000374035 0.000116117 -1.917953036 -0.038162022 -1.917953036 -0.038162022 257 Regression Analysis Regression model  Total Trade  Unemp Ratet   0  1     ut GDP t 6   R 2  0.033 If our model accounts for 3.3% of the fluctuations in changes in the unemployment rate, then the remaining 96.7% of the fluctuations are unaccounted. Remember that the error term represents all factors that influence changes in unemployment other than those explicitly appearing in the model. We have said two (apparently) contradictory things: 1. The slope coefficient is non-zero  changes in trade significantly affect changes in unemployment. 2. The R2 is small  fluctuations in changes in trade only account for 3% of fluctuations in changes in unemployment. These two statements are not contradictory because the slope coefficient and the R2 measure different things. What the results tell us is that the influence of trade on unemployment is consistent enough to be detected against the background noise. However, the background noise is extremely loud. © Copyright 2003. Do not distribute or copy without permission. 258 Regression Analysis  ˆ  5.55 Yt  0  1 X t  ut  u  0.5  ˆ  11.09 Yt  0  1 X t  ut  u  1.0 0  ˆ  0.08 1 R 2  0.72 7.0 7.0 6.0 6.0 5.0 5.0 4.0 4.0 3.0 3.0 2.0 2.0 1.0 1.0 0  ˆ  0.16 1 R 2  0.44 0.0 0.0 0.0 0.5 1.0 1.5 © Copyright 2003. Do not distribute or copy without permission. 2.0 2.5 3.0 3.5 4.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 259 Multiple Regression Analysis In multiple regression analysis the OLS technique finds the linear relationship between an outcome variable and a group of explanatory variables. As in simple regression analysis, OLS filters out random noise so as to find the underlying deterministic relationship. OLS also identifies the individual effects of each of the multiple explanatory variables. Simple regression Yt  0  1X t  ut Multiple regression Yt  0  1X 1,t  2 X 2,t  ...  m X m ,t  ut © Copyright 2003. Do not distribute or copy without permission. 260 Multiple Regression Analysis Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict a truck’s round-trip travel time. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Approach #1: Calculate Average Time per Mile Trucks in the data set required a total of 87 hours to travel a total of 4,000 miles. Dividing hours by miles, we find an average of 0.02 hours per mile journeyed. Problem: This approach ignores a possible fixed effect. For example, if travel time is measured starting from the time that out-bound goods begin loading, then there will be some fixed time (the time it takes to load the truck) tacked on to all of the trips. For longer trips this fixed time will be “amortized” over more miles and will have less of an impact on the time/mile ratio than for shorter trips. This approach also ignores the impact of the number of deliveries. © Copyright 2003. Do not distribute or copy without permission. 261 Multiple Regression Analysis Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict a truck’s round-trip travel time. Approach #2: Calculate Average Time per Mile and Average Time per Delivery Trucks in the data set averaged 87 / 4,000 = 0.02 hours per mile journeyed, and 87 / 29 = 3 hours per delivery. Problem: Like the previous approach, this approach ignores a possible fixed effect. This approach does account for the impact of both miles and deliveries, but the approach ignores the possible interaction between miles and deliveries. For example, trucks that travel more miles likely also make more deliveries. Therefore, when we combine the time/miles and time/delivery measures, we may be double-counting time. © Copyright 2003. Do not distribute or copy without permission. 262 Multiple Regression Analysis Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict a truck’s round-trip travel time. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Timei  0  1 (milesi )  u i Approach #3: Regress Time on Miles The regression model will detect and isolate any fixed effect. Problem: The model ignores the impact of the number of deliveries. For example, a 500 mile journey with 4 deliveries will take longer than a 500 mile journey with 1 delivery. © Copyright 2003. Do not distribute or copy without permission. 263 Multiple Regression Analysis Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict a truck’s round-trip travel time. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Timei  0  1 (deliveriesi )  u i Approach #4: Regress Time on Deliveries The regression model will detect and isolate any fixed effect and will account for the impact of the number of deliveries. Problem: The model ignores the impact of miles traveled. For example, a 500 mile journey with 4 deliveries will take longer than a 200 mile journey with 4 deliveries. © Copyright 2003. Do not distribute or copy without permission. 264 Multiple Regression Analysis Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict a truck’s round-trip travel time. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries Travel Time (hours) 4 11.3 3 6.8 4 10.9 2 8.5 2 6.2 2 8.2 3 9.4 4 8 3 9.6 2 8.1 Timei  0  1 (milesi )  2 (deliveriesi )  u i Approach #5: Regress Time on Both Miles and Deliveries The multiple regression model (1) will detect and isolate any fixed effect, (2) will account for the impact of the number of deliveries, (3) will account for the impact of miles, and (4) will eliminate out the overlapping effects of miles and deliveries. © Copyright 2003. Do not distribute or copy without permission. 265 Multiple Regression Analysis Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict a truck’s round-trip travel time. Regression model: Timei  0  1 (milesi )  2 (deliveriesi )  u i Estimated regression model: ^ Timei  ˆ0  ˆ1 (miles i )  ˆ2 (deliveries i ) SUMMARY OUTPUT ˆ0  1.13 (0.952) [0.2732] Regression Statistics Multiple R 0.950678166 R Square 0.903788975 Adjusted R Square 0.876300111 Standard Error 0.573142152 Observations 10 ˆ1  0.01 (0.002) [0.0005] ˆ2  0.92 (0.221) [0.0042] R 2  0.90 ANOVA df Regression Residual Total Intercept X Variable 1 X Variable 2 2 7 9 SS MS F Significance F 21.60055651 10.80027826 32.87836743 0.00027624 2.299443486 0.328491927 23.9 Coefficients Standard Error t Stat P-value 1.131298533 0.951547725 1.188903619 0.273240329 0.01222692 0.001977699 6.182396959 0.000452961 0.923425367 0.221113461 4.176251251 0.004156622 © Copyright 2003. Do not distribute or copy without permission. Lower 95% Upper 95% -1.118752683 3.38134975 0.007550408 0.016903431 0.400575489 1.446275244 Standard deviations of parameter estimates and pvalues are typically shown in parentheses and brackets, respectively, near the parameter estimates. 266 Multiple Regression Analysis Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict a truck’s round-trip travel time. Estimated regression model: ^ Timei  ˆ0  ˆ1 (miles i )  ˆ2 (deliveries i ) ˆ0  1.13 (0.952) [0.2732] ˆ1  0.01 (0.002) [0.0005] ˆ2  0.92 (0.221) [0.0042] R 2  0.90 Notes on results: 1. Constant is not significantly different from zero. 2. Slope coefficients are significantly different from zero. 3. Variation in miles and deliveries, together, account for 90% of the variation in time. © Copyright 2003. Do not distribute or copy without permission. The parameter estimates are measures of the marginal impact of the explanatory variables on the outcome variable. Marginal impact measures the impact of one explanatory variable after the impacts of all the other explanatory variables are filtered out. Marginal impacts of explanatory variables 0.01 = increase in time given increase of 1 mile traveled. 0.92 = increase in time given increase of 1 delivery. 267 Prediction Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict the round-trip travel time for a truck that is traveling 600 miles and making 1 delivery. Approach #1: Prediction based on average time-per-mile ^ Timei  (Average hours per mile)(miles i ) ^ Timei  0.02(600)  12 hours 16 Predicted Travel Time (hours) 14 12 10 8 6 4 2 0 Approach #1 © Copyright 2003. Do not distribute or copy without permission. Approach #2 268 Prediction Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict the round-trip travel time for a truck that is traveling 600 miles and making 1 delivery. Approach #2: Prediction based on average time-per-mile and time-per-delivery ^ Timei  (Average hours per mile)(miles i )  (Average hours per delivery)(deliveriesi ) ^ Timei  0.02(600)  3(1)  15 hours 16 Predicted Travel Time (hours) 14 12 10 8 6 4 2 0 Approach #1 © Copyright 2003. Do not distribute or copy without permission. Approach #2 269 Prediction Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict the round-trip travel time for a truck that is traveling 600 miles and making 1 delivery. Approach #3: Prediction based on simple regression of time on miles ^ Timei  ˆ0  ˆ1 (milesi ) ^ Timei  3.27  0.01(600)  9.3 hours 16 Predicted Travel Time (hours) 14 12 10 8 6 4 2 0 Approach #1 © Copyright 2003. Do not distribute or copy without permission. Approach #2 Approach #3 270 Prediction Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict the round-trip travel time for a truck that is traveling 600 miles and making 1 delivery. Approach #4: Prediction based on simple regression of time on deliveries ^ Timei  ˆ0  ˆ1 (deliveries i ) ^ Timei  5.38  1.14(1)  6.5 hours 16 Predicted Travel Time (hours) 14 12 10 8 6 4 2 0 Approach #1 © Copyright 2003. Do not distribute or copy without permission. Approach #2 Approach #3 Approach #4 271 Prediction Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict the round-trip travel time for a truck that is traveling 600 miles and making 1 delivery. Approach #5: Prediction based on multiple regression of time on miles and deliveries ^ Timei  ˆ0  ˆ1 (milesi )  ˆ2 (deliveries i ) ^ Timei  1.13  0.01(600)  0.92(1)  8.1 hours 16 Predicted Travel Time (hours) 14 12 10 8 6 4 2 0 Approach #1 © Copyright 2003. Do not distribute or copy without permission. Approach #2 Approach #3 Approach #4 Approach #5 272 Prediction and Goodness of Fit Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict the round-trip travel time for a truck that is traveling 600 miles and making 1 delivery. Compare the R2 (goodness of fit) from the three regression models (approaches #3, #4, and #5) Approach #3 Timei   0  1 (miles i )  u i R 2  0.66 Approach #4 Timei   0  1 (deliveries i )  u i R 2  0.38 Approach #5 Timei   0  1 (miles i )   2 (deliveries i )  u i R 2  0.90 © Copyright 2003. Do not distribute or copy without permission. In approach #3, 66% of the variation in time is explained. This leaves 34% of the variation in time unexplained and, therefore, unpredictable. In approach #4, only 38% of the variation in time is explained. This leaves 62% of the variation in time unexplained and, therefore, unpredictable. In approach #5, 90% of the variation in time is explained. This leaves only 10% of the variation in time unexplained and unpredictable. 273 Prediction and Goodness of Fit Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict a truck’s round-trip travel time. In the table below, we have added a new explanatory variable (called Random – see Data Set #7) that contains randomly derived numbers. Because the numbers are random, they have no impact on the dependent variable. Miles Traveled 500 250 500 500 250 400 375 325 450 450 Deliveries 4 3 4 2 2 2 3 4 3 2 Random 0.087 0.002 0.794 0.910 0.606 0.239 0.265 0.842 0.662 0.825 Travel Time (hours) 11.3 6.8 10.9 8.5 6.2 8.2 9.4 8 9.6 8.1 Estimate the following regression model: Timei  0  1 (milesi )  2 (deliveriesi )  3 (randomi )  u i © Copyright 2003. Do not distribute or copy without permission. 274 Prediction and Goodness of Fit Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict a truck’s round-trip travel time. Notice that the goodness of fit measure has increased from 0.904 (in Approach #5) to 0.909. This would seem to indicate that this model provides a better fit than did Approach #5. SUMMARY OUTPUT Regression Statistics Multiple R 0.95328585 R Square 0.908753912 Adjusted R Square 0.863130868 Standard Error 0.60287941 Observations 10 It turns out that, every time you add an explanatory variable, the R2 increases. This is because OLS looks for any portion of the remaining noise that the new variable can explain. At the very worst, OLS will find no explanatory power to attribute to the new variable and so the R2 will not change – but adding another explanatory variable never causes R2 to fall. ANOVA df Regression Residual Total Intercept X Variable 1 X Variable 2 X Variable 3 3 6 9 SS 21.7192185 2.1807815 23.9 Coefficients Standard Error 1.455892534 1.150895704 0.012185716 0.002081561 0.894281428 0.238113014 -0.394347384 0.690166078 © Copyright 2003. Do not distribute or copy without permission. MS F Significance F 7.2397395 19.91874794 0.001603903 0.363463583 t Stat 1.265008226 5.85412499 3.755701594 -0.57138042 P-value 0.252777599 0.001097066 0.009446164 0.588489374 Lower 95% -1.360249864 0.007092317 0.311639445 -2.083124175 Upper 95% 4.272034931 0.017279115 1.476923411 1.294429407 275 Prediction and Goodness of Fit Example: A trucking company wants to be able to predict the round-trip travel time of its trucks. Data Set #6 contains historical information on miles traveled, number of deliveries per trip, and total travel time. Use the information to predict a truck’s round-trip travel time. SUMMARY OUTPUT Regression Statistics Multiple R 0.95328585 R Square 0.908753912 Adjusted R Square 0.863130868 Standard Error 0.60287941 Observations 10 To determine whether or not a new explanatory variable adds anything of substance, we look at the adjusted R2. The adjusted R2 includes a penalty for adding more explanatory variables. Approach #5 had an adjusted R2 of 0.876. When we added the random explanatory variable, the R2 dropped to 0.863. This indicates that the extra explanatory power the new variable adds does not make up for the loss in degrees of freedom from adding the variable to the model. Therefore, your model is actually improved by leaving the new variable out. ANOVA df Regression Residual Total Intercept X Variable 1 X Variable 2 X Variable 3 3 6 9 SS 21.7192185 2.1807815 23.9 Coefficients Standard Error 1.455892534 1.150895704 0.012185716 0.002081561 0.894281428 0.238113014 -0.394347384 0.690166078 © Copyright 2003. Do not distribute or copy without permission. MS F Significance F 7.2397395 19.91874794 0.001603903 0.363463583 t Stat 1.265008226 5.85412499 3.755701594 -0.57138042 P-value 0.252777599 0.001097066 0.009446164 0.588489374 Lower 95% -1.360249864 0.007092317 0.311639445 -2.083124175 Upper 95% 4.272034931 0.017279115 1.476923411 1.294429407 276 Prediction and Goodness of Fit Technical notes on R2 and adjusted R2 1. 2. 3. Regardless of the number of explanatory variables, R2 always measures the proportion of variation in the outcome variable explained by variations in the explanatory variables. You cannot compare R2’s or adjusted R2’s from two models that use different outcome variables. Adjusted R2 is often written as R 2. © Copyright 2003. Do not distribute or copy without permission. 277 Properties of OLS Parameter Estimates Provided the data you are analyzing is well behaved, the parameter estimates that you obtain via the OLS procedure have the the following properties: 1. Unbiasedness 2. Consistency 3. Efficiency © Copyright 2003. Do not distribute or copy without permission. 278 Properties of OLS Parameter Estimates Provided the data you are analyzing is well behaved, the parameter estimates that you obtain via the OLS procedure have the following properties: 1. The parameter estimates are unbiased. An estimate is unbiased when the expected value of the estimate is equal to the parameter the estimate intends to measure. Example: Consider rolling a die. The population mean of the die rolls is 3.5. Suppose we take a sample of N rolls of the die. Let Xi be the i th die roll. We then estimate the population mean via the equation Parameter Estimator #1  1 N N Xi  i 1 Parameter Estimator #1 is unbiased because, on average, it will equal 3.5. Suppose we use a different equation, called Parameter Estimator #2, to estimate the population mean of the die rolls. 1 N Parameter Estimator #2  Xi N  1 i 1 Parameter Estimator #2 is biased because, on average, it will be less than 3.5. © Copyright 2003. Do not distribute or copy without permission. 279 Properties of OLS Parameter Estimates Provided the data you are analyzing is well behaved, the parameter estimates that you obtain via the OLS procedure have the following properties: 2. The parameter estimates are consistent. An estimate is consistent when the expected difference between the estimate and the population parameter decreases as the sample size increases. Example: Parameter Estimator #1  1 N N Xi  i 1 Parameter Estimator #1 is unbiased because, on average, it will equal 3.5. It is also consistent because the estimate comes closer to 3.5 (on average) as N increases. Similarly, Parameter Estimate #2 is biased but it is consistent. Parameter Estimate #2 is, on average, less than 3.5, but as the number of observations increases, Parameter Estimate #2 becomes closer (on average) to 3.5. 1 N Parameter Estimator #2  Xi N  1 i 1 © Copyright 2003. Do not distribute or copy without permission. 280 Properties of OLS Parameter Estimates Provided the data you are analyzing is well behaved, the parameter estimates that you obtain via the OLS procedure have the following properties: 2. The parameter estimates are consistent. An estimate is consistent when the expected difference between the estimate and the population parameter decreases as the sample size increases. Example: Suppose we use a different equation, called Parameter Estimator #3, to estimate the population mean of the die rolls. For the i th die roll:  1 if i is odd Parameter Estimator #3   6 if i is even Parameter Estimator #3 is unbiased because, on average, it will equal 3.5. But, Parameter Estimator #3 is inconsistent because, as the sample size increases, the parameter estimator does not come closer to the population parameter of 3.5. © Copyright 2003. Do not distribute or copy without permission. 281 Properties of OLS Parameter Estimates Provided the data you are analyzing is well behaved, the parameter estimates that you obtain via the OLS procedure have the following properties: 3. The parameter estimates are efficient. An estimate is efficient when it has the lowest achievable standard deviation (among all linear, unbiased estimators). Example: Suppose we use Parameter Estimator #4, to estimate the population mean of the die rolls. Parameter Estimator #4 multiplies the N observations and then takes the N th root of the product. Parameter Estimator #4  0.5  N N Xi  i 1 Parameter Estimator #4 is unbiased because, on average, it will equal 3.5. Parameter Estimator #4 is consistent because, as the sample size increases, the parameter estimator comes closer (on average) to the population parameter of 3.5. Parameter Estimator #4 is inefficient because the standard deviation of Parameter Estimator #4 is not the minimum achievable standard deviation. Parameter Estimator #1 has a lower standard deviation. © Copyright 2003. Do not distribute or copy without permission. 282 Properties of OLS Parameter Estimates Summary of properties of OLS parameter estimates (assuming well-behaved data): Unbiasedness: E ( ˆ)   Consistency: plim( ˆ)   Efficiency: s ˆ  minimum of all linear, unbiased estimators of  Let X be a sample estimator for the population mean, . Unbiased and consistent X  1 N N Xi  i 1 Unbiased and inconsistent  1 if i is odd 6 if i is even X  E (X )   E (X )   E (| X   |) approaches zero as N increases E (| X   |) does not approach zero as N increases Biased and consistent Biased and inconsistent 1 N X  Xi N  1 i 1 X 3 E (X )   E (X )   E (| X   |) approaches zero as N increases E (| X   |) does not approach zero as N increases © Copyright 2003. Do not distribute or copy without permission. 1 N N Xi  i 1 283 Properties of OLS Parameter Estimates What does well behaved mean? Well behaved is short-hand term meaning “The data conform to all the applicable assumptions.” The full scope of the OLS assumptions are beyond the scope of this course. Some of the assumptions are: 1. 2. 3. 4. 5. 6. 7. 8. 9. The error term is normally distributed. The error term has a population mean of zero. The error term has a population variance that is constant and finite. Past values of the error term are unrelated to future values of the error term. The underlying relationship between the outcome and explanatory variables is linear. The explanatory variables are not measured with error. There are no relevant explanatory variables excluded from the regression model. There are no irrelevant explanatory variables included in the regression model. The regression parameters do not change over the sample. © Copyright 2003. Do not distribute or copy without permission. 284 Statistical Anomalies We will look at a few of the more egregious violations of the OLS assumptions (called statistical anomalies). Statistical anomalies cause OLS parameter estimates to no longer be unbiased, consistent, and efficient. Our goal is to: 1. 2. 3. Recognize the impact of the anomalies on the regression results. Test for the presence of statistical anomalies. Correct for the statistical anomalies. We will cover the anomalies in their (approximate) order of severity. Note that some of these anomalies are specific to either time-series or cross-sectional data. Time-series: Data is indexed by time. The order of the data matters. Cross-sectional: Data is not indexed by time. The order of the data does not matter. © Copyright 2003. Do not distribute or copy without permission. 285 Non-Stationarity Non-stationarity (also called unit root) occurs when at least one of the variables in a time-series model has an infinite population variance. Example: Stock prices are non-stationary. If you plot the Dow-Jones Industrial Average (see Data Set #8), you will see that stock prices follow a trend. Data series that follow trends have infinite population variances. Dow Jones Industrial Average 12000 10000 8000 6000 4000 2000 18 96 19 00 19 04 19 08 19 12 19 16 19 20 19 24 19 28 19 32 19 36 19 40 19 44 19 48 19 52 19 56 19 60 19 64 19 68 19 72 19 76 19 80 19 84 19 88 19 92 19 96 20 00 0 © Copyright 2003. Do not distribute or copy without permission. 286 Non-Stationarity The chart below shows the standard deviation of the DJIA from 1896 to the indicated date. Because the DJIA follows a trend, the standard deviation increases over time. This means that the population standard deviation is infinite. Standard Deviation 1896 to Indicated Date 2500 2000 1500 1000 500 0 00 04 08 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 00 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 © Copyright 2003. Do not distribute or copy without permission. 287 Non-Stationarity Implications of non-stationarity: 1. 2. 3. 4. Parameter estimates are biased and inconsistent. Standard deviations of parameter estimates are biased and inconsistent. R2 measure is biased and inconsistent. These results hold for all parameter estimates, regardless of which variable(s) is (are) non-stationary. The implications indicate that, in the presence of non-stationarity, none of the OLS results are useful to us. This makes non-stationarity one of the most severe of the statistical anomalies. © Copyright 2003. Do not distribute or copy without permission. 288 Non-Stationarity Example of the implications of non-stationarity: Using Data Set #8, estimate the following regression model. DJIAt  0  1 (DJIAt 1 )  ut You should get the results shown below. Note that the results seem too good to be true. 1. 2. The R2 measure is very close to 1. Some of the p-values are exceptionally close to zero. SUMMARY OUTPUT 1. The model explains virtually all of the variation in the DJIA. Regression Statistics Multiple R 0.985124455 R Square 0.970470193 Adjusted R Square 0.970183495 Standard Error 392.8090624 Observations 105 2. The probability of the slope coefficient equaling zero is about the same as the probability of six killer asteroids all hitting the Earth within the next 60 seconds. ANOVA df SS 522302143.5 15892792.83 538194936.4 MS 522302143.5 154298.9595 F Significance F 3385.001074 1.31605E-80 Coefficients Standard Error 25.36791839 42.79835355 1.067973529 0.018356128 t Stat 0.592731175 58.18076206 P-value 0.554660204 1.31605E-80 Regression Residual Total Intercept X Variable 1 1 103 104 © Copyright 2003. Do not distribute or copy without permission. Lower 95% Upper 95% -59.51244428 110.2482811 1.031568512 1.104378547 289 Non-Stationarity Example of the implications of non-stationarity: Using Data Set #8, estimate the following regression model. DJIAt  0  1 (DJIAt 1 )  ut To see the impact of non-stationarity, split the data set into three parts: 1. 2. 3. 1897 through 1931 1897 through 1966 1897 through 2002 Estimate the regression model for each of the three subsets and compare the results. © Copyright 2003. Do not distribute or copy without permission. 290 Non-Stationarity SUMMARY OUTPUT 1896 through 1931 As we add observations, R2 is approaching one, and uncertainty is approaching zero. 1896 through 1966 Regression Statistics Multiple R 0.835430571 R Square 0.69794424 Adjusted R Square 0.688791035 Standard Error 32.42075194 Observations 35 Intercept X Variable 1 Coefficients Standard Error 19.81139741 10.86212922 0.826611264 0.094662408 t Stat 1.823896311 8.732201973 P-value 0.077235343 4.30227E-10 t Stat 0.227616894 39.31338637 P-value 0.820627082 1.71371E-48 t Stat 0.592731175 58.18076206 P-value 0.554660204 1.31605E-80 SUMMARY OUTPUT Regression Statistics Multiple R 0.978701538 R Square 0.9578567 Adjusted R Square 0.957236945 Standard Error 45.10223435 Observations 70 Intercept X Variable 1 Coefficients Standard Error 1.743586516 7.660180603 1.051148157 0.026737665 SUMMARY OUTPUT 1896 through 2002 Regression Statistics Multiple R 0.985124455 R Square 0.970470193 Adjusted R Square 0.970183495 Standard Error 392.8090624 Observations 105 Intercept X Variable 1 © Copyright 2003. Do not distribute or copy without permission. Coefficients Standard Error 25.36791839 42.79835355 1.067973529 0.018356128 291 Non-Stationarity The implication is that, eventually, we will be able to predict perfectly next year’s DJIA given this year’s DJIA with absolute certainty. We call these results spurious because they appear reasonable but are really the result of a statistical anomaly, not an underlying statistical relationship. © Copyright 2003. Do not distribute or copy without permission. 292 Non-Stationarity Detecting non-stationarity: 1. 2. For each variable in the model: (a) regress the variable on itself lagged one period, (b) regress the variable on a constant term and itself lagged one period, and (c) regress the variable on a constant term, a time trend, and itself lagged one period. Test the null hypothesis that the absolute value of the coefficient on the lagged variable is greater than or equal to 1. If the slope coefficient greater than or equal to one (in absolute value) for any of the three tests, then the variable is nonstationary. Note: This is only an approximate test. Because this test assumes non-stationarity, the test statistic is not t-distributed, but tau-distributed. As the tau-distribution is beyond the scope of this course, you can use the t-distribution as an approximation. Note also that the tails on the tau-distribution are fatter than the tails on the tdistribution. Therefore, if you fail to reject the null (in step 2 above) using the t-distribution, then you would also fail to reject the null using the tau-distribution. © Copyright 2003. Do not distribute or copy without permission. 293 Non-Stationarity Example: Test the DJIA for non-stationarity. DJIAt  1 (DJIAt 1 )  ut A test of the null hypothesis that the slope is greater than or equal to one yields a test statistic of 4.439 and a p-value of (virtually) zero. We therefore conclude that the DJIA is non-stationary. Because the DJIA is non-stationary, any regression including the DJIA contains biased and inconsistent results. SUMMARY OUTPUT Regression Statistics Multiple R 0.985073331 R Square 0.970369467 Adjusted R Square 0.960754083 Standard Error 391.5821301 Observations 105 ANOVA df Regression Residual Total 1 104 105 Coefficients Intercept X Variable 1 0 1.072811655 © Copyright 2003. Do not distribute or copy without permission. SS MS 522247933.7 522247933.7 15947002.72 153336.5646 538194936.4 Standard Error t Stat #N/A #N/A 0.016390124 65.45476184 F Significance F 3405.893011 9.67537E-81 P-value #N/A 2.41925E-86 Lower 95% Upper 95% #N/A #N/A 1.040309466 1.105313844 294 Non-Stationarity Correcting for non-stationarity: 1. 2. 3. Remove the trend from the non-stationary variable by: (a) taking the first difference, (b) taking the natural log, (c) taking the percentage change, or (d) taking the second difference. Test the transformed version of the variable to verify that the transformed variable is now stationary. Re-run the regression using the transformed version of the variable. Note: If you have a model in which one of the variables is non-stationary and another is not, you need only perform this transformation on the non-stationary variable. However, often it is easier to interpret the results if you perform the same transformation on all the variables in the model. © Copyright 2003. Do not distribute or copy without permission. 295 Non-Stationarity Correct the DJIA model for non-stationarity: 1. Transform the DJIA into the growth rate in the DJIA. The transformation is: DJIAt  DJIAt 1 DJIAt 1 Test the growth rate in the DJIA to verify that the non-stationarity has been removed (test 1 of 3 – regress dependent on lagged dependent) GDJIAt  2. GDJIAt  1 (GDJIAt 1 )  ut SUMMARY OUTPUT Regression Statistics Multiple R 65535 R Square -0.12944963 Adjusted R Square -0.13915837 Standard Error 0.236589292 Observations 104 A test of the null hypothesis that the slope is greater than or equal to one yields a test statistic of –10 and a p-value of (virtually) 100%. We therefore conclude that GDJIA is stationary. ANOVA df Regression Residual Total 1 103 104 Coefficients Intercept X Variable 1 © Copyright 2003. Do not distribute or copy without permission. 0 0.001742434 SS MS F Significance F -0.660786781 -0.66078678 -11.80514097 #NUM! 5.765372783 0.055974493 5.104586002 Standard Error t Stat #N/A #N/A 0.098580251 0.017675287 P-value #N/A 0.985932089 Lower 95% Upper 95% #N/A #N/A -0.193768065 0.197252934 296 Non-Stationarity Correct the DJIA model for non-stationarity: 3. Test the growth rate in the DJIA to verify that the non-stationarity has been removed (test 2 of 3 – regress dependent on lagged dependent and constant) SUMMARY OUTPUT Regression Statistics Multiple R 0.128401676 R Square 0.01648699 Adjusted R Square 0.006844706 Standard Error 0.221855516 Observations 104 A test of the null hypothesis that the slope is greater than or equal to one yields a test statistic of –11.5 and a p-value of (virtually) 100%. We therefore conclude that GDJIA is stationary. ANOVA df Regression Residual Total Intercept X Variable 1 1 102 103 SS 0.08415926 5.020426742 5.104586002 MS 0.08415926 0.04921987 Coefficients Standard Error t Stat 0.090018858 0.023138826 3.890381371 -0.128568196 0.098322481 -1.307617496 © Copyright 2003. Do not distribute or copy without permission. F Significance F 1.709863516 0.193942749 P-value 0.000178584 0.193942749 Lower 95% Upper 95% 0.044123129 0.135914587 -0.323590273 0.06645388 297 Non-Stationarity Correct the DJIA model for non-stationarity: 4. Test the growth rate in the DJIA to verify that the non-stationarity has been removed (test 3 of 3 – regress dependent on lagged dependent, constant, and time trend) SUMMARY OUTPUT Regression Statistics Multiple R 0.148095661 R Square 0.021932325 Adjusted R Square 0.002564648 Standard Error 0.222333051 Observations 104 A test of the null hypothesis that the slope is greater than or equal to one yields a test statistic of –11.5 and a p-value of (virtually) 100%. We therefore conclude that GDJIA is stationary. ANOVA df Regression Residual Total Intercept X Variable 1 X Variable 2 2 101 103 SS 0.111955438 4.992630564 5.104586002 MS 0.055977719 0.049431986 Coefficients Standard Error t Stat 0.06182561 0.044173173 1.399618949 -0.134769286 0.098880518 -1.362950854 0.000546484 0.000728767 0.749874369 © Copyright 2003. Do not distribute or copy without permission. F Significance F 1.132418977 0.326309523 P-value 0.164691437 0.175928971 0.455073571 Lower 95% -0.025802071 -0.330921607 -0.000899194 Upper 95% 0.149453291 0.061383035 0.001992162 298 Non-Stationarity Now that we know that GDJIA is stationary, we can estimate our transformed model: GDJIAt  0  1 (GDJIAt 1 )  ut SUMMARY OUTPUT Regression Statistics Multiple R 0.128401676 R Square 0.01648699 Adjusted R Square 0.006844706 Standard Error 0.221855516 Observations 104 ANOVA df Regression Residual Total Intercept X Variable 1 1 102 103 SS 0.08415926 5.020426742 5.104586002 MS 0.08415926 0.04921987 Coefficients Standard Error t Stat 0.090018858 0.023138826 3.890381371 -0.1285682 0.098322481 -1.3076175 F Significance F 1.709863516 0.193942749 P-value 0.000178584 0.193942749 Lower 95% Upper 95% 0.044123129 0.135914587 -0.323590273 0.06645388 Our fitted model is: ^ GDJIAt  0.09  0.129(GDJIAt 1 ) © Copyright 2003. Do not distribute or copy without permission. 299 Non-Stationarity Using the fitted model, predict DJIA for 2003: ^ GDJIAt  0.09  0.129(GDJIAt 1 ) DJIA 2001  11005 DJIA 2002  10104 GDJIA 2002  DJIA 2002  DJIA 2001 10104  11005  0.082  11005 DJIA 2001 ^ GDJIA 2003  0.09  0.1286(GDJIA 2002 )  0.09  0.1286(0.082)  0.101 ^ GDJIA 2003  ^ DJIA 2003  DJIA 2002 DJIA 2002 ^  0.101  DJIA 2003  10104 10104  ^ DJIA 2003  11125 As of today, the DJIA for 2003 is 9,710. This is significantly different from the prediction of 11,125. Note that the regression model has an R2 of less than 0.02. This means that the model fails to explain 98% of the variation in the growth rate of the DJIA. © Copyright 2003. Do not distribute or copy without permission. 300 Non-Stationarity Using the spurious model, predict DJIA for 2003: ^ DJIAt  25.4  1.07(DJIAt 1 ) DJIA 2002  10104 ^ DJIA 2003  25.4  1.07(10104)  10837 Although the spurious prediction is closer to the actual than was the prediction using the stationary model, the prediction is extremely far from the actual considering the (reported) R2 of 0.97. Prediction from stationary model: Prediction from non-stationary model: Actual: 11,125 10,837 9,710 (15% overestimated) (12% overestimated) Note that it is simply random chance that the non-stationary model gave a (slightly) closer prediction. We would not necessarily expect the non-stationary model to give a better (or worse) prediction. What is important is that the non-stationary model mislead us (via the high R2 and low standard deviations) into thinking that it would produce good predictions. © Copyright 2003. Do not distribute or copy without permission. 301 Non-Linearity Non-linearity occurs when the relationship between the outcome and explanatory variables is non-linear. Example: Suppose that the true relationship between two variables is: Yi  0  1X i2  u i OLS assumes (incorrectly in this case) that the relationship between the outcome and explanatory variables is linear. When OLS attempts to find the best fitting linear relationship, it will end up with something like that shown in the figure below. 35.0 30.0 Non-linear data can cause the fitted model to be biased in one direction at the extremes and biased in the other direction in the center. 25.0 20.0 Y 15.0 10.0 5.0 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 -5.0 -10.0 X © Copyright 2003. Do not distribute or copy without permission. 302 Non-Linearity Implications of non-linearity: 1. 2. 3. Parameter estimates are biased and inconsistent. Standard deviations of parameter estimates are biased and inconsistent. R2 measure is biased and inconsistent. The implications indicate that, in the presence of non-linearity, none of the OLS results are useful to us. Like non-stationarity, this makes non-linearity one of the most severe of the statistical anomalies. © Copyright 2003. Do not distribute or copy without permission. 303 Non-Linearity In the regression demo, enter a 2 for the “X Exponent.” The demo now generates data according to the model: Yi  0  1X i2  u i Repeatedly press F9 and notice that the confidence interval for the slope coefficient does not include the population value. OLS is producing biased results. © Copyright 2003. Do not distribute or copy without permission. 304 Non-Linearity Example: As Director of Human Resources, you are charged with generating estimates of the cost of labor for a firm that is opening an office in Pittsburgh. To estimate the cost of labor, you need two numbers for each job description: (1) base salary, and (2) benefits. You are comfortable with your base salary estimates. You need to generate estimates for benefits. Data Set #9 contains median salary and benefits numbers for a random sampling of white-collar jobs in the Pittsburgh area. Using this data, generate a model that can be used to predict the cost of benefits given base salary. Estimate the following model: Benefitsi  0  1 (Salaryi )  u i © Copyright 2003. Do not distribute or copy without permission. 305 Non-Linearity Fitted model: ^ Benefitsi  6332  0.5976(Salary i ) According to the output, this model accounts for 82% of the variation in benefits. Example: SUMMARY OUTPUT Regression Statistics Multiple R 0.905889157 R Square 0.820635165 Adjusted R Square 0.805688095 Standard Error 12339.93518 Observations 14 You expect someone earning $90,000 base salary to cost the firm an additional –6332+(0.5976)($90,000) = $47,452 in benefits. ANOVA df Regression Residual Total Intercept X Variable 1 1 12 13 SS 8360260745 1827288004 10187548749 MS 8360260745 152274000.3 Coefficients Standard Error t Stat -6331.559158 7782.474687 -0.813566303 0.597573741 0.080648162 7.409638715 © Copyright 2003. Do not distribute or copy without permission. F Significance F 54.90274589 8.16758E-06 P-value 0.431740633 8.16758E-06 Lower 95% Upper 95% -23288.11456 10624.99625 0.421856495 0.773290987 306 Non-Linearity Now, create a plot of the data and overlay the fitted regression model. Notice that the line appears to overestimate the center observations and underestimate the end observations. Warning: The apparent over and under estimation may be due to a few outliers in the data. That is  if you obtained more data, this apparent non-linearity may go away. So, do we have non-linearity or not? Can you find a theoretical justification for non-linearity? $120,000 Yes. $100,000 The value of most benefits are tied to pay (e.g. firm contributes 5% of gross salary to 401k). But, as salary rises, the number of benefits also increases (e.g. basic health, retirement, dental, stock options, car, expense account, use of corporate jet, …). Benefits $80,000 $60,000 $40,000 $20,000 $0 $0 $20,000 $40,000 $60,000 $80,000 $100,000 Base Salary © Copyright 2003. Do not distribute or copy without permission. $120,000 $140,000 $160,000 $180,000 Because both the value and the number of benefits increase with salary, we should expect a non-linear relationship. 307 Non-Linearity Example: What is the form of the non-linearity? We don’t know, but we can try different forms and compare the R2. Note: We can compare the R2’s from the different models because the outcome variable is the same in all the models. Benefits i   0  1 (Salary i )  u i Benefits i   0  1 ln (Salary i )  u i Benefits i   0  1 eSalaryi  u i Benefits i   0  1 (Salary i1 )  u i Benefits i   0  1 (Salary i2 )  u i Note: In this model, the value of exp(salary) is too large. Therefore, for this model, we first divide salary by 100,000, then take the exponential. This will cause the slope coefficient and the stdev of the slope coefficient to scale down, but the ratio of the estimate to the stdev and the other regression results will not change. Benefits i   0  1 (Salary i3 )  u i © Copyright 2003. Do not distribute or copy without permission. 308 Non-Linearity Model Squared Correlation Benefits i   0  1 (Salary i )  u i R 2  0.821 Benefits i   0  1 ln (Salary i )  u i R 2  0.750 Benefits i   0  1 eSalaryi  u i R 2  0.844 Benefits i   0  1 (Salary i1 )  u i R 2  0.622 Benefits i   0  1 (Salary i2 )  u i R 2  0.843 Benefits i   0  1 (Salary i3 )  u i R 2  0.828 The squared and exponential models explain more of the variation in benefits than does the linear model, and the two yield almost identical R2’s. Since the squared model is less complicated, we’ll use that model to predict benefits. © Copyright 2003. Do not distribute or copy without permission. 309 Non-Linearity Regression model: Benefitsi  0  1 (Salaryi2 )  u i SUMMARY OUTPUT Regression Statistics Multiple R 0.918129382 R Square 0.842961561 Adjusted R Square 0.829875025 Standard Error 11546.41629 Observations 14 ANOVA df SS 8587712000 1599836749 10187548749 MS 8587712000 133319729.1 Coefficients Standard Error 14734.7501 4959.966734 3.34675E-06 4.16996E-07 t Stat 2.970735671 8.025858976 Regression Residual Total Intercept X Variable 1 1 12 13 F Significance F 64.4144123 3.63765E-06 P-value 0.011685201 3.63765E-06 Lower 95% Upper 95% 3927.911135 25541.58907 2.43819E-06 4.2553E-06 Estimated regression model: ^ Benefitsi  14735  0.00000347(Salaryi2 ) © Copyright 2003. Do not distribute or copy without permission. 310 Non-Linearity Estimate the cost of benefits for the following salaries using both the linear model and the preferred non-linear model. Salaries: $20,000, $40,000, $60,000, $80,000, $100,000. Linear model: Preferred non-linear model: 6332  0.5976(20000)  $5, 620 14735  0.00000347(200002 )  $16,123 6332  0.5976(40000)  $17,571 14735  0.00000347(400002 )  $20, 090 6332  0.5976(60000)  $29,523 14735  0.00000347(600002 )  $26, 783 6332  0.5976(80000)  $41, 474 14735  0.00000347(800002 )  $36,154 6332  0.5976(100000)  $53, 426 14735  0.00000347(1000002 )  $48, 202 Compared to the preferred non-linear model, the linear model is biased downward at low salary levels and biased upward at high salary levels. © Copyright 2003. Do not distribute or copy without permission. 311 Regime Change Regime change occurs when the parameters change value at one (or more) points in the data set. Example: Conventional wisdom says that (Reagan aside) Democrats contribute to greater deficits (i.e. smaller surpluses) than do Republicans. Data Set #10 contains relevant macroeconomic data and data on political parties in power from 1929 through 2001. Test the hypothesis (at 5% significance) that a change in control of the Congress by Democrats corresponds to a change in Federal government surplus (as a % of GDP). 1. Generate the Federal budget surplus as a % of GDP. 2. State the regression model. Budget Surplus     (% Congressional Seats Held by Democrats)t  u t GDP t 3. Test the hypothesis. H0 :   0 Ha :   0 © Copyright 2003. Do not distribute or copy without permission. 312 Regime Change Regime change occurs when the parameters change value at one (or more) points in the data set. Example: Budget Surplus     (% Congressional Seats Held by Democrats)t  u t GDP t H0 :   0 Ha :   0 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) SUMMARY OUTPUT 71 Pr(t < Test statistic) Regression Statistics Multiple R 0.124324727 R Square 0.015456638 Adjusted R Square 0.00158983 Standard Error 0.050239746 Observations 73 Pr(t > Critical value) Critical Value 2.50% 1.9939 Fail to reject the null. ANOVA df Regression Residual Total Intercept X Variable 1 1 71 72 SS 0.002813412 0.179206277 0.182019689 MS F Significance F 0.002813412 1.114650028 0.294652679 0.002524032 Coefficients Standard Error t Stat P-value 0.014202688 0.040708434 0.348888091 0.728205865 -0.073210394 0.069343136 -1.055769874 0.294652679 © Copyright 2003. Do not distribute or copy without permission. Lower 95% -0.066967664 -0.211476747 Upper 95% 0.095373039 0.06505596 313 Regime Change Regime change occurs when the parameters change value at one (or more) points in the data set. Example: This analysis ignores the effect of war. If the country is involved in a war, it is forced to run greater deficits – regardless of which party controls the Congress. How can we account for this “war effect?”  The war effect is a regime change. We are hypothesizing that, during war, the regression model changes. Let us propose two regression models: Regression model when there is peace The models assume different “baseline” surpluses. Budget Surplus  1   (% Congressional Seats Held by Democrats)t  u t GDP t The models assume the same marginal surpluses. Regression model when there is war Budget Surplus   2   (% Congressional Seats Held by Democrats)t  u t GDP t © Copyright 2003. Do not distribute or copy without permission. 314 Regime Change Example: If we run two separate regressions, we not only fail to hold the marginal effects constant, but we lose information from the observations that we have removed from the regression. SUMMARY OUTPUT Estimated peace year model Regression Statistics Multiple R 0.543091875 R Square 0.294948784 Adjusted R Square 0.280559984 Standard Error 0.020616858 Observations 51 Intercept X Variable 1 Coefficients Standard Error 0.060045099 0.017698364 -0.136171953 0.030076456 t Stat 3.392692117 -4.52752662 P-value 0.001377543 3.82563E-05 Lower 95% 0.024478927 -0.196612817 Upper 95% 0.09561127 -0.07573109 SUMMARY OUTPUT Estimated war year model Regression Statistics Multiple R 0.299629119 R Square 0.089777609 Adjusted R Square 0.04426649 Standard Error 0.079192076 Observations 22 Intercept X Variable 1 Coefficients Standard Error -0.325767374 0.197127821 0.47422938 0.337647235 © Copyright 2003. Do not distribute or copy without permission. t Stat -1.652569243 1.404511366 P-value 0.114031776 0.175507104 Lower 95% -0.736968613 -0.230090084 Upper 95% 0.085433864 1.178548844 315 Regime Change Another way to solve the problem is to think of the change in the baseline (the constant term) as a regime change. In this regime change, the value of the constant term is different over some subset of the data than it is over other subsets. Let us define a dummy variable as follows: 1 if year t is a war year 0 otherwise Dt   Using the dummy variable, we can combine our two models into one (avoiding the information loss that comes from splitting the data) and hold the marginal effect constant. Budget Surplus    Dt   (% Congressional Seats Held by Democrats)t  u t GDP t For peace years, Dt is zero. The term Dt disappears, and we are left with our "peace year" model. For war years, Dt is one. The term Dt becomes  , so the constant term is   . Therefore,    is the constant term for the "war year" model. For both models, the marginal effect,  , is the same. © Copyright 2003. Do not distribute or copy without permission. 316 Regime Change Let us test our hypothesis accounting for a possible regime shift in the constant term between war and peace years. Budget Surplus    Dt   (% Congressional Seats Held by Democrats)t  u t GDP t 1 if year t is a war year 0 otherwise Dt   H0 :   0 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) Pr(t < Test statistic) Ha :   0 Pr(t > Critical value) Critical Value 2.50% 1.9944 Fail to reject the null. SUMMARY OUTPUT Regression Statistics Multiple R 0.309328104 R Square 0.095683876 Adjusted R Square 0.069846272 Standard Error 0.048492023 Observations 73 Intercept X Variable 1 X Variable 2 70 The estimated regression model is Budget Surplus  0.023  0.031Dt  0.072(% Democrats)t GDP t Coefficients Standard Error t Stat P-value 0.022906038 0.039447193 0.580676002 0.56332367 -0.030824288 0.012369247 -2.492010132 0.015073295 -0.07220134 0.066932075 -1.07872555 0.284413228 © Copyright 2003. Do not distribute or copy without permission. Lower 95% Upper 95% -0.055768843 0.10158092 -0.055493952 -0.006154624 -0.205693045 0.061290366 317 Regime Change Is there a regime change from war to peace years? If there is no regime change, then the coefficient attached to the dummy variable will be (statistically) zero. Budget Surplus    Dt   (% Congressional Seats Held by Democrats)t  u t GDP t 1 if year t is a war year 0 otherwise Dt   H0 :   0 Ha :   0 t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) 70 Pr(t < Test statistic) Pr(t > Critical value) Critical Value 2.50% 1.9944 SUMMARY OUTPUT Regression Statistics Multiple R 0.309328104 R Square 0.095683876 Adjusted R Square 0.069846272 Standard Error 0.048492023 Observations 73 Intercept X Variable 1 X Variable 2 Reject the hypothesis that there is no regime change. Coefficients Standard Error t Stat P-value 0.022906038 0.039447193 0.580676002 0.56332367 -0.030824288 0.012369247 -2.492010132 0.015073295 -0.07220134 0.066932075 -1.07872555 0.284413228 © Copyright 2003. Do not distribute or copy without permission. Lower 95% Upper 95% -0.055768843 0.10158092 -0.055493952 -0.006154624 -0.205693045 0.061290366 318 Regime Change We designed our regression model to account for a possible regime change in the constant term. It is possible that there is a regime change in the slope coefficient. The slope coefficient measures the marginal effect on the budget surplus of increasing the percentage of the Congress that is controlled by Democrats. It is possible that the marginal effect of Democrats in Congress changes in war vs. peace years. Consider the following model: Budget Surplus     (% Democrats)t   (Dt )(% Democrats)t  u t GDP t In peace years, Dt = 0, so the model becomes In war years, Dt = 1, so the model becomes Budget Surplus     (% Democrats)t  u t GDP t Budget Surplus    (    )(% Democrats)t  u t GDP t © Copyright 2003. Do not distribute or copy without permission. 319 Regime Change To test for a regime change in the slope coefficient, we generate a new regressor that is % Democrats multiplied by the dummy variable. We include this new regressor in our model. Budget Surplus     (% Democrats)t   (Dt )(% Democrats)t  u t GDP t 1 if year t is a war year 0 otherwise Dt   H0 :   0 H0 :   0 Ha :   0 Ha :   0 t Distribution SUMMARY OUTPUT Regression Statistics Multiple R 0.28024201 R Square 0.078535584 Adjusted R Square 0.05220803 Standard Error 0.048949634 Observations 73 Intercept X Variable 1 X Variable 2 Test statistic Degrees of Freedom Pr(t > Test statistic) Pr(t < Test statistic) Pr(t > Critical value) Critical Value Coefficients Standard Error t Stat P-value 0.019037674 0.039724529 0.479242286 0.633260099 -0.067417636 0.06761427 -0.99709183 0.322154281 -0.046776582 0.021368623 -2.189031199 0.031931708 © Copyright 2003. Do not distribute or copy without permission. 70 2.50% 1.9944 Lower 95% Upper 95% -0.060190336 0.098265684 -0.202269935 0.067434663 -0.089394921 -0.004158243 320 Regime Change We can account for both possible regime changes in the baseline (constant term) and marginal effect (slope) in the same model as follows: Budget Surplus    Dt   (% Democrats)t   (Dt )(% Democrats)t  u t GDP t 1 if year t is a war year 0 otherwise Dt   H0 :   0 H0 :   0 H0 :   0 Ha :   0 Ha :   0 Ha :   0 Conclusion: t Distribution Test statistic Degrees of Freedom Pr(t > Test statistic) SUMMARY OUTPUT 69 Pr(t < Test statistic) Regression Statistics Multiple R 0.443267307 R Square 0.196485905 Adjusted R Square 0.16155051 Standard Error 0.046039584 Observations 73 Intercept X Variable 1 X Variable 2 X Variable 3 Pr(t > Critical value) Critical Value 2.50% 1.9949 Reject the null. Coefficients Standard Error t Stat P-value 0.060045099 0.039522283 1.51927203 0.13326319 -0.385812473 0.121226874 -3.182565561 0.002189546 -0.136171953 0.067163847 -2.027459116 0.04647992 0.610401333 0.207468917 2.942133892 0.004435278 © Copyright 2003. Do not distribute or copy without permission. Adjusting for the impact of war on the budget, evidence suggests that an increase in Democratically controlled seats increases the budget surplus in war and increases the budget deficit in peace. Lower 95% Upper 95% -0.018799674 0.138889871 -0.627653394 -0.143971552 -0.27016012 -0.002183787 0.196512296 1.02429037 321 Regime Change We can split our results into two estimated regression models. We use one to predict the impact of political party on the budget surplus in war years, and the other to predict the impact in peace years. The estimated regression model is Budget Surplus  0.060  0.386Dt  0.136(% Democrats)  0.610 Dt  (% Democratst ) GDP t For war years (Dt  1), the estimated regression model is Budget Surplus  0.326  0.474(% Democratst ) GDP t For peace years (Dt  0), the estimated regression model is Budget Surplus  0.060  0.136(% Democratst ) GDP t © Copyright 2003. Do not distribute or copy without permission. 322 Regime Change The estimated regression model is Budget Surplus  0.060  0.386Dt  0.136(% Democrats)  0.610 Dt  (% Democratst ) GDP t We interpret the slope coefficient as follows: “After accounting for a baseline impact of war, in peace time every 1 percentage point (0.01) increase in Democrat-controlled seats is associated with a 0.136 percentage point (0.00136) decrease in the budget surplus (relative to GDP). In war time, every 1 percentage point (0.01) increase in Democrat-controlled seats is associated with a 0.61 percentage point (0.0061) increase in the budget surplus.” To put the numbers in perspective: 1. There are currently 440 members of Congress. 2. Replacing one Republican with a Democrat increases the number of Democrats by 0.23 percentage points (0.0023). 3. In peace time, we expect this to be associated with a -0.03% [-0.03% = (-0.136)(0.23%)] change in the surplus (relative to GDP). 4. GDP is currently $12 trillion. 5. So, we expect that every replacement of a Republican with a Democrat will cost the Federal government (0.03%)($12 trillion) = $3.6 billion. © Copyright 2003. Do not distribute or copy without permission. 323 Regime Change The estimated regression model is Budget Surplus  0.060  0.386Dt  0.136(% Democrats)  0.610 Dt  (% Democratst ) GDP t We interpret the slope coefficient as follows: “After accounting for a baseline impact of war, in peace time every 1 percentage point (0.01) increase in Democrat-controlled seats is associated with a 0.136 percentage point (0.00136) decrease in the budget surplus (relative to GDP). In war time, every 1 percentage point (0.01) increase in Democrat-controlled seats is associated with a 0.61 percentage point (0.0061) increase in the budget surplus.” To put the numbers in perspective: 1. In war time, the replacement of one Republican with a Democrat is associated with a 0.34% [0.34% = (-0.136+0.610)(0.23%)] change in the surplus (relative to GDP). 2. GDP is currently $12 trillion. 3. In war, we expect that every replacement of a Republican with a Democrat will save the Federal government (0.34%)($12 trillion) = $40.6 billion. © Copyright 2003. Do not distribute or copy without permission. 324 Regime Change Implications of regime change: 1. 2. Parameter estimates may be biased and inconsistent. Standard deviations of parameter estimates may be biased and inconsistent. If the data is time-series and the regime shift will not occur again, then parameter estimates will be biased but consistent.  As more data is added, the regime shift is pushed further into the past and becomes increasingly insignificant. Unlike the cases of non-stationarity and non-linearity, the R2 is a reliable estimator. Therefore, you can compare the adjusted R2’s in models with and without regime change corrections to decide whether or not it is necessary to account for a regime change. © Copyright 2003. Do not distribute or copy without permission. 325 Regime Change Detecting regime change: 1. 2. 3. 4. 5. Create a dummy variable that is 1 in one state and 0 in the other state. Include the dummy itself as a regressor. For each regressor, X, create a new regressor that is the dummy multiplied by X. Include all of these new regressors in the regression. Test the hypotheses that the coefficients attached to the dummy and the new regressors is zero. A parameter estimate that fails the “zero test” indicates the presence of a regime shift for that regressor (or the constant term). © Copyright 2003. Do not distribute or copy without permission. 326 Regime Change Correcting for regime change: 1. 2. 3. 4. After determining which regressors (and/or the constant) are subject to regime changes, include dummies for those regressors (and/or the constant). You can correct for regime change using the level or deviation approach. You can use different approaches for different regressors (and/or the constant). For the deviation approach: For each regressor, X, associated with a regime change, generate a new regressor: (D)(X). Include this new regressor in the regression model. For the level approach: For each regressor, X, associated with a regime change, generate two new regressors: (D)(X) and (1–D)(X). Remove the original regressor X from the regression and replace it with these two new regressors. © Copyright 2003. Do not distribute or copy without permission. 327 Omitted Variables An omitted variable is an explanatory regressor that belongs in the regression model (i.e. the explanatory variable has a significant impact on the outcome variable) but which does not appear in the regression model. Example: Suppose an outcome variable, Y, is determined by two explanatory variables, X and W. This results in the true regression model: Yi  0  1X i  2Wi  u i Suppose we hypothesize a different regression model that excludes W. Yi  0  1X i  u i When we estimate the hypothesized model, OLS will assign some of the impact that should have gone to W to the constant term and to X. This will result in the parameter estimates being biased and inconsistent. © Copyright 2003. Do not distribute or copy without permission. 328 Omitted Variables Example: Data Set #11 contains voter demographics and the percentage of voters (by voting district) who claim to have voted for a candidate in the last election. Your goal is to attempt to use the voter demographics to predict what percentage of the vote your candidate will garner in other districts. You hypothesize the following regression model: Votes Garneredi  0  1 (Incomei )  u i H0 : 1  0 Ha : 1  0 SUMMARY OUTPUT Looking at the marginal effect: Regression Statistics Multiple R 0.235498805 R Square 0.055459687 Adjusted R Square 0.014392717 Standard Error 0.079740721 Observations 25 Every $1,000 increase in average income in a district implies a projected (1,000)(0.000002) = 0.2% increase in garnered votes. Intercept X Variable 1 Coefficients Standard Error t Stat P-value 0.471662433 0.115189263 4.094673574 0.000444586 2.32719E-06 2.00258E-06 1.162096993 0.257113527 © Copyright 2003. Do not distribute or copy without permission. Lower 95% 0.233375611 -1.81546E-06 Upper 95% 0.709949255 6.46984E-06 329 Omitted Variables Suppose that garnered votes are not only a function of average income within a district, but also the disparity of income across households. Unknown to you, the true regression model is: Votes Garneredi  0  1 (Incomei )  2 (Income Disparityi )  u i Your hypothesized model excludes Income Disparity therefore your model suffers from the omitted variable problem. The results below are those you would have obtained had you included Income Disparity in the model. Looking at the marginal effect: Every $1,000 increase in average income in a district implies a projected (1,000)(0.000004) = 0.4% increase in garnered votes. SUMMARY OUTPUT This is twice the impact that you estimated. Regression Statistics Multiple R 0.541404531 R Square 0.293118866 Adjusted R Square 0.228856945 Standard Error 0.07053354 Observations 25 Intercept X Variable 1 X Variable 2 By excluding Income Disparity from your model, you force OLS to attribute some of the negative impact of Income Disparity to Income. This causes your estimate of the coefficient on Income to be biased downward. Coefficients Standard Error t Stat P-value 0.448319072 0.102249939 4.384541224 0.000235831 3.9192E-06 1.86557E-06 2.100806052 0.047339969 -6.31882E-06 2.32338E-06 -2.719665179 0.012513327 © Copyright 2003. Do not distribute or copy without permission. Lower 95% 0.23626545 5.02412E-08 -1.11372E-05 Upper 95% 0.660372694 7.78817E-06 -1.50042E-06 330 Omitted Variables Implications of omitted variables: 1. 2. Parameter estimates may be biased and inconsistent. Standard deviations of parameter estimates may be biased and inconsistent. The higher the correlation between the omitted variable and the other variables in the model, the greater will be the bias and inconsistency. If the omitted variable is not correlated with one or more of the included variables, then those variables will be unbiased and consistent. Unlike the cases of non-stationarity and non-linearity, the R2 is a reliable estimator. Therefore, you can compare the adjusted R2’s in models with and without regime change corrections to decide whether or not it is necessary to account for a regime change. © Copyright 2003. Do not distribute or copy without permission. 331 Omitted Variables Detecting and correcting omitted variables: 1. 2. If you have reason to believe that a given explanatory variable is excluded, include the variable and test if its coefficient is non-zero. If the coefficient is non-zero, the variable should be included in the regression model. Warning: It is possible that, by random chance, a given explanatory variable will pass the test for a non-zero coefficient when, in fact, the variable does not belong in the equation. Therefore, you should first have a theoretically justifiable reason why the variable should be included before considering inclusion. © Copyright 2003. Do not distribute or copy without permission. 332 Extraneous Variables An extraneous variable is an explanatory regressor that does not belong in the regression model but which does appear in the regression model. Example: Suppose an outcome variable, Y, is determined by one explanatory variable: X. This results in the true regression model: Yi  0  1X i  u i Suppose we hypothesize a different regression model that includes both X and another variable, W. Yi  0  1X i  2Wi  u i When we estimate the hypothesized model, OLS will pick up some (randomly occurring) relationship between W and Y, and will attribute that relationship to W when, in fact, it should be attributed to the error term, u. This will result is the parameter estimates being inefficient. © Copyright 2003. Do not distribute or copy without permission. 333 Extraneous Variables Example: Applying the following regression model to the data in Data Set #11, we obtain the results shown below. Votes Garneredi  0  1 (Incomei )  2 (Income Disparityi )  u i Intercept X Variable 1 X Variable 2 Coefficients Standard Error t Stat P-value 0.448319072 0.102249939 4.384541224 0.000235831 3.9192E-06 1.86557E-06 2.100806052 0.047339969 -6.31882E-06 2.32338E-06 -2.719665179 0.012513327 Lower 95% 0.23626545 5.02412E-08 -1.11372E-05 Upper 95% 0.660372694 7.78817E-06 -1.50042E-06 We can generate a third variable consisting of randomly selected numbers and include this in the regression. Because this third variable does not impact the outcome variable, the third variable is extraneous. The results of this regression are shown below. Votes Garneredi  0  1 (Incomei )  2 (Income Disparityi )  3 (Randomi )  u i Intercept X Variable 1 X Variable 2 X Variable 3 Coefficients Standard Error t Stat P-value 0.448888291 0.105133686 4.269690417 0.000340931 3.93864E-06 1.94016E-06 2.030056577 0.05520916 -6.33939E-06 2.40567E-06 -2.635186476 0.015473615 -0.00262222 0.046489209 -0.05640492 0.955552469 Lower 95% 0.230250784 -9.61497E-08 -1.13423E-05 -0.099301839 Upper 95% 0.667525797 7.97342E-06 -1.33652E-06 0.094057399 The presence of an extraneous variable increases the standard errors of the parameter estimates. © Copyright 2003. Do not distribute or copy without permission. 334 Extraneous Variables Implications of extraneous variables: 1. 2. Parameter estimates are unbiased and consistent. Parameter estimates are inefficient. Because the implications of extraneous variables are much less onerous than those of omitted variables, when in doubt as to whether to include a given explanatory variables in a model, it is usually wise to err on the side of including rather than excluding. © Copyright 2003. Do not distribute or copy without permission. 335 Extraneous Variables Detecting and correcting extraneous variables: 1. 2. If you have reason to believe that a given explanatory variable is extraneous, test whether the coefficient attached to the variable is (statistically) zero. If the coefficient is zero, the variable should be excluded from the regression model. Warning: It is possible that, by random chance, a given explanatory variable will pass the test for a zero coefficient when, in fact, the variable does belong in the equation. Therefore, if you have a theoretically justifiable reason for why the variable should be included in the model, you may want to leave the variable in the model even if its coefficient is zero. If the variable truly does influence the outcome variable, the coefficient may come up as non-zero with different sample data. © Copyright 2003. Do not distribute or copy without permission. 336 Multicollinearity Multicollinearity occurs when two or more of the explanatory variables are correlated. Example: Data Set #12 contains clinical trial data for a new blood pressure drug. Using the data, estimate the following regression model. Blood Pressurei  0  1 (Dosagei )  2 (Reported Stressi )  3 (Daily Caffeine Intakei )  u i Dosage of the drug appears to have a strongly significant impact on blood pressure (p = 0.02). SUMMARY OUTPUT Regression Statistics Multiple R 0.675097429 R Square 0.455756539 Adjusted R Square 0.4202624 Standard Error 17.26241704 Observations 50 Intercept X Variable 1 X Variable 2 X Variable 3 Stress appears to have a slightly significant affect on blood pressure (p = 0.08). Caffeine intake appears not to affect blood pressure (p = 0.39). Coefficients Standard Error t Stat 111.7045551 10.91943115 10.22988777 -0.10538001 0.042907405 -2.455986577 3.549658078 1.950808713 1.81958285 1.557211753 1.780563939 0.874560985 © Copyright 2003. Do not distribute or copy without permission. P-value 1.96572E-13 0.01788627 0.075334043 0.386356235 Lower 95% Upper 95% 89.72490125 133.684209 -0.191748054 -0.019011967 -0.37711244 7.476428595 -2.026874137 5.141297643 337 Multicollinearity Multicollinearity occurs when two or more of the explanatory variables are correlated. Example: Now estimate the model with Daily Caffeine Intake removed. Blood Pressurei  0  1 (Dosagei )  2 (Reported Stressi )  u i Dosage of the drug appears to have a strongly significant impact on blood pressure (p = 0.02). SUMMARY OUTPUT Regression Statistics Multiple R 0.668361599 R Square 0.446707226 Adjusted R Square 0.423162853 Standard Error 17.21918057 Observations 50 Intercept X Variable 1 X Variable 2 Stress appears to have a remarkably significant affect on blood pressure (p = 0.00). The results for the marginal impact of stress on blood pressure changed dramatically when we dropped Caffeine Intake from the model. Coefficients Standard Error t Stat 115.0315253 10.20971156 11.26687318 -0.105731825 0.042798055 -2.470481988 5.046727412 0.933290929 5.407453621 © Copyright 2003. Do not distribute or copy without permission. P-value 5.93644E-15 0.017177119 2.09574E-06 Lower 95% Upper 95% 94.49225435 135.5707963 -0.191830325 -0.019633324 3.169190011 6.924264813 338 Multicollinearity Multicollinearity occurs when two or more of the explanatory variables are correlated. Example: Now estimate the model with Daily Caffeine Intake included and Reported Stress removed. Blood Pressurei  0  1 (Dosagei )  3 (Daily Caffeine Intakei )  u i Dosage of the drug appears to have a strongly significant impact on blood pressure (p = 0.02). SUMMARY OUTPUT Regression Statistics Multiple R 0.645433374 R Square 0.41658424 Adjusted R Square 0.391758038 Standard Error 17.6817017 Observations 50 Intercept X Variable 1 X Variable 2 Caffeine Intake appears to have a remarkably significant affect on blood pressure (p = 0.00). The results for the marginal impact of caffeine on blood pressure changed dramatically when we dropped Reported Stress from the model. Coefficients Standard Error t Stat 110.3315158 11.15791351 9.888185256 -0.107984588 0.043925114 -2.458379208 4.400144179 0.874724943 5.030317487 © Copyright 2003. Do not distribute or copy without permission. P-value 4.59794E-13 0.017696906 7.59199E-06 Lower 95% Upper 95% 87.88471037 132.7783213 -0.196350436 -0.019618739 2.640426232 6.159862125 339 Multicollinearity Example: The results you are seeing are typical of multicollinearity. It is likely that Caffeine Intake and Reported Stress are correlated. Because they are correlated, they (at least in part) reflect the same information. When you include only one (either one) of the regressors in the model, you get a significant marginal effect. But, when you include both, OLS attempts to allocate an amount of explanatory that is worthy of only one regressor to two regressors. As a result, neither of them appear overly significant. Coefficients Standard Error P-value ß0 ß1 111.705 10.919 0.000 -0.105 0.043 0.018 ß2 ß3 3.550 1.951 0.075 1.557 1.781 0.386 Coefficients Standard Error 10.210 115.032 P-value 0.000 Reported Stress included -0.106 0.043 0.017 Caffeine Intake excluded 5.047 0.933 0.000 ß0 ß1 ß2 Coefficients Standard Error All regressors included P-value ß0 110.332 11.158 0.000 Reported Stress excluded ß1 ß3 -0.108 0.044 0.018 Caffeine Intake included 4.400 0.875 0.000 © Copyright 2003. Do not distribute or copy without permission. 340 Multicollinearity Implications of multicollinearity: 1. 2. Parameter estimates are unbiased and consistent. Parameter estimates are inefficient. The higher the correlation between the multicollinear regressors, the greater the inefficiency (i.e. the greater the standard errors associated with the parameter estimates). In the extreme case of perfect multicollinearity (one explanatory regressor is an exact linear function of another), the regression will fail. Either the software will return an error or the results will show an R2 of one and standard errors of zero or infinity. © Copyright 2003. Do not distribute or copy without permission. 341 Multicollinearity Detecting multicollinearity: 1. 2. To detect multicollinearity, calculate the Variance Inflation Factor (VIF) for each explanatory variable. A VIF greater than 4 indicates detectable multicollinearity. A VIF greater than 10 indicates severe multicollinearity. Correcting multicollinearity: The correction for multicollinearity often introduces worse anomalies than the multicollinearity. The correction is to drop from the model the explanatory variable with the greatest VIF. However, if the offending explanatory variable does affect the outcome variable, then by dropping the variable you eliminate multicollinearity but create an omitted variable. As the implications of the omitted variable anomaly are more onerous than those of multicollinearity, it is usually desirable to just live with the multicollinearity. An exception is in the case of severe multicollinearity (a VIF greater than 10). In this case, the bias and inconsistency caused by omitting the variable may be of less consequence than the inefficiency caused by the multicollinearity. © Copyright 2003. Do not distribute or copy without permission. 342 Multicollinearity Variance Inflation Factor: To compute the VIF for explanatory regressor j, regress explanatory variable j on a constant term and all of the other explanatory regressors. VIFj  1 1  R 2j © Copyright 2003. Do not distribute or copy without permission. 343 Multicollinearity Example: Calculate the VIF’s for Dosage, Reported Stress and Caffeine Intake. Dosagei  0  2 (Reported Stressi )  3 (Daily Caffeine Intakei )  u i VIFDosage  1  1.01 1  0.0076 Reported Stressi  0  1 (Dosagei )  3 (Daily Caffeine Intakei )  u i VIFReported Stress  1  4.38 1  0.7717 Daily Caffeine Intakei  0  1 (Dosagei )  2 (Reported Stressi )  u i VIFDaily Caffeine Intake  1  4.38 1  0.7715 © Copyright 2003. Do not distribute or copy without permission. 344 Multicollinearity The VIF’s indicate that there is detectable multicollinearity for Reported Stress and Daily Caffeine Intake. However, because the VIF’s are well less than 10, we would not drop either variable from the model. VIFDosage  1  1.01 1  0.0076 VIFReported Stress  1  4.38 1  0.7717 VIFDaily Caffeine Intake  1  4.38 1  0.7715 © Copyright 2003. Do not distribute or copy without permission. 345 Summary of Statistical Anomalies Anomaly Properties of OLS Parameter Estimates Non-stationarity Biased, inconsistent, inefficient Non-linearity Biased, inconsistent, inefficient Regime change Biased, (possibly) inconsistent, inefficient Omitted variables Biased, inconsistent, inefficient Extraneous variables Unbiased, consistent, inefficient Multicollinearity Unbiased, consistent, inefficient © Copyright 2003. Do not distribute or copy without permission. 346

Example

Related documents

Products

Support

Example

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib