X i

advertisement
Forecasting Techniques
Materials required for this course are at:
www.business.duq.edu/faculty/davies
© Copyright 2006. Do not distribute or copy without permission.
1
Don’t Trust Your Eyes
© Copyright 2006. Do not distribute or copy without permission.
2
Desirable Properties of Estimators
 
Unbiasedness: E ˆ  
Consider rolling a die. The population mean of the die rolls is 3.5. Suppose we take a
sample of N rolls of the die. Let Xi be the i th die roll. We then estimate the
population mean via the equation
Parameter Estimator #1 
1
N
N
Xi

i
1
1 N
Parameter Estimator #2 
Xi
N  1 i 1
N

1 if  X i is odd
Parameter Estimator #3  
i 1
 6 otherwise

Estimators #1 and #3 are unbiased because, on average, they will equal 3.5.
Estimator #2 is biased because, on average, it will be less than 3.5.
© Copyright 2006. Do not distribute or copy without permission.
3
Desirable Properties of Estimators


Pr ˆ    0  1
Consistency: Nlim

Consider rolling a die. The population mean of the die rolls is 3.5. Suppose we take a
sample of N rolls of the die. Let Xi be the i th die roll. We then estimate the
population mean via the equation
Parameter Estimator #1 
1
N
N
Xi

i
1
1 N
Parameter Estimator #2 
Xi
N  1 i 1
N

1 if  X i is odd
Parameter Estimator #3  
i 1
 6 otherwise

Estimators #1 and #2 are both consistent because, as N increases, the estimates
(on average) approach 3.5.
Estimator #3 is inconsistent because, as N increases, the estimate does not
approach 3.5.
© Copyright 2006. Do not distribute or copy without permission.
4
Desirable Properties of Estimators
Efficiency:  ˆ  minimum attainable  in the class of linear, unbiased estimators

1.7078
Standard Error of Estimator #1 

N
N
1  3.5    6  3.5 
2
Standard Error of Estimator #3 
2
2
 2.5
Estimator #3 is not efficient because there is another linear unbiased estimator
(Estimator #1) that attains a lesser standard error.
© Copyright 2006. Do not distribute or copy without permission.
5
Impacts of Anomalies on Estimator Properties
Parameter Estimates
Anomaly
Biased
Inconsistent
Non-Stationarity
X
X
Omitted Variable
X
X
Non-Linearity
X
X
Regime Shift
X
(X)
Non-Zero Errors
X
(X)
Measurement Error
X
X
Truncated Data
X
X
Censored Data
X
X
Standard Error Estimates
Inefficient
Biased
Inconsistent
X
X
Serial Correlation
X
X
Heteroskedasticity
X
X
Extraneous Variable
X
Multicollinearity
X
© Copyright 2006. Do not distribute or copy without permission.
6
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #15 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Miles Traveled
500
250
500
500
250
400
375
325
450
450
Deliveries Travel Time (hours)
4
11.3
3
6.8
4
10.9
2
8.5
2
6.2
2
8.2
3
9.4
4
8
3
9.6
2
8.1
Approach #1: Calculate Average Time per Mile
Trucks in the data set required a total of 87 hours to
travel a total of 4,000 miles. Dividing hours by miles,
we find an average of 0.02 hours per mile journeyed.
Problem:
This approach ignores a possible fixed effect. For
example, if travel time is measured starting from the
time that out-bound goods begin loading, then there
will be some fixed time (the time it takes to load the
truck) tacked on to all of the trips. For longer trips this
fixed time will be “amortized” over more miles and will
have less of an impact on the time/mile ratio than for
shorter trips.
This approach also ignores the impact of the number
of deliveries.
© Copyright 2006. Do not distribute or copy without permission.
7
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #15 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Approach #2: Calculate Average Time per Mile and Average Time per Delivery
Trucks in the data set averaged 87 / 4,000 = 0.02 hours per mile journeyed,
and 87 / 29 = 3 hours per delivery.
Problem:
Like the previous approach, this approach ignores a possible fixed effect.
This approach does account for the impact of both miles and deliveries, but the approach
ignores the possible interaction between miles and deliveries. For example, trucks that travel
more miles likely also make more deliveries. Therefore, when we combine the time/miles
and time/delivery measures, we may be double-counting time.
© Copyright 2006. Do not distribute or copy without permission.
8
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #15 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Miles Traveled
500
250
500
500
250
400
375
325
450
450
Deliveries Travel Time (hours)
4
11.3
3
6.8
4
10.9
2
8.5
2
6.2
2
8.2
3
9.4
4
8
3
9.6
2
8.1
Timei  0  1 (milesi )  u i
Approach #3: Regress Time on Miles
The regression model will detect and isolate any fixed effect.
Problem:
The model ignores the impact of the number of deliveries. For example, a 500 mile journey
with 4 deliveries will take longer than a 500 mile journey with 1 delivery.
© Copyright 2006. Do not distribute or copy without permission.
9
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #15 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Miles Traveled
500
250
500
500
250
400
375
325
450
450
Deliveries Travel Time (hours)
4
11.3
3
6.8
4
10.9
2
8.5
2
6.2
2
8.2
3
9.4
4
8
3
9.6
2
8.1
Timei  0  1 (deliveriesi )  u i
Approach #4: Regress Time on Deliveries
The regression model will detect and isolate any fixed effect and will account for the impact
of the number of deliveries.
Problem:
The model ignores the impact of miles traveled. For example, a 500 mile journey with 4
deliveries will take longer than a 200 mile journey with 4 deliveries.
© Copyright 2006. Do not distribute or copy without permission.
10
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #15 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Miles Traveled
500
250
500
500
250
400
375
325
450
450
Deliveries Travel Time (hours)
4
11.3
3
6.8
4
10.9
2
8.5
2
6.2
2
8.2
3
9.4
4
8
3
9.6
2
8.1
Timei  0  1 (milesi )  2 (deliveriesi )  u i
Approach #5: Regress Time on Both Miles and Deliveries
The multiple regression model (1) will detect and isolate any fixed effect, (2) will account
for the impact of the number of deliveries, (3) will account for the impact of miles, and (4)
will eliminate out the overlapping effects of miles and deliveries.
© Copyright 2006. Do not distribute or copy without permission.
11
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #15 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Regression model:
Timei   0  1 (miles i )   2 (deliveries i )  u i
Estimated regression model:
^
Timei  ˆ0  ˆ1 (miles i )  ˆ2 (deliveries i )
SUMMARY OUTPUT
ˆ0  1.13 (0.952) [0.2732]
Regression Statistics
Multiple R
0.950678166
R Square
0.903788975
Adjusted R Square
0.876300111
Standard Error
0.573142152
Observations
10
ˆ1  0.01 (0.002) [0.0005]
ˆ2  0.92 (0.221) [0.0042]
R 2  0.90
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
X Variable 2
2
7
9
SS
MS
F
Significance F
21.60055651 10.80027826 32.87836743
0.00027624
2.299443486 0.328491927
23.9
Coefficients Standard Error
t Stat
P-value
1.131298533
0.951547725 1.188903619 0.273240329
0.01222692
0.001977699 6.182396959 0.000452961
0.923425367
0.221113461 4.176251251 0.004156622
© Copyright 2006. Do not distribute or copy without permission.
Lower 95%
Upper 95%
-1.118752683 3.38134975
0.007550408 0.016903431
0.400575489 1.446275244
Standard deviations of
parameter estimates and pvalues are typically shown in
parentheses and brackets,
respectively, near the
parameter estimates.
12
Multiple Regression Analysis
Example:
A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #15 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a truck’s round-trip travel time.
Estimated regression model:
^
Timei  ˆ0  ˆ1 (miles i )  ˆ2 (deliveries i )
ˆ0  1.13 (0.952) [0.2732]
ˆ1  0.01 (0.002) [0.0005]
ˆ2  0.92 (0.221) [0.0042]
R 2  0.90
Notes on results:
1. Constant is not significantly
different from zero.
2. Slope coefficients are significantly
different from zero.
3. Variation in miles and deliveries,
together, account for 90% of the
variation in time.
© Copyright 2006. Do not distribute or copy without permission.
The parameter estimates are measures of
the marginal impact of the explanatory
variables on the outcome variable.
Marginal impact measures the impact of
one explanatory variable after the impacts
of all the other explanatory variables are
filtered out.
Marginal impacts of explanatory variables
0.01 = increase in time given increase of
1 mile traveled.
0.92 = increase in time given increase of
1 delivery.
13
Causal vs. Exploratory Analysis
The goal of exploratory analysis is to obtain a measure of a phenomenon.
Example:
Subjects are given a new breakfast cereal to taste and asked to rate the cereal.
The measured phenomenon is taste. Although taste is subjective, by taking the average
of the measures from a large number of subjects, we can measure the underlying
objective components that give rise to the subjective feeling of taste.
© Copyright 2006. Do not distribute or copy without permission.
14
Causal vs. Exploratory Analysis
The goal of causal analysis is to obtain the change in measure of a phenomenon due to the
presence vs. absence of a control variable.
Example:
Two groups of subjects are given the same breakfast cereal to taste and are asked to rate the
cereal. One group is given the cereal in a black and white box. The other in a multi-colored
box.
The two groups of subjects exist under identical conditions (same cereal, same testing
environment, etc.), with the exception of the color of the cereal box. Because the color of the
cereal box is the only difference between the two groups, we call the color of the box the
control variable. If we find a difference in subjects’ reported tastes, then we know that the
difference in perceived taste is due to the color (or lack of color) of the cereal box.
It is possible that, apart from random chance, one group of subjects reports liking the cereal
and the other does not (e.g. one group was tested in the morning and the other in the
evening). We would call this a confound. A confound is the presence of an additional (and
unwanted) difference in the two groups. When a confound is present, it makes it difficult
(perhaps impossible) to determine how much of the difference in reported taste between the
two groups is due to the control and how much is due to the confound.
© Copyright 2006. Do not distribute or copy without permission.
15
Causal vs. Exploratory Analysis
Because the techniques for causal and exploratory analysis are identical (with the
exception that causal analysis includes the use of a control variable whereas exploratory
analysis does not), we will limit our discussion to causal analysis.
© Copyright 2006. Do not distribute or copy without permission.
16
Designing Survey Instruments
The Likert Scale
We use the Likert scale to rate responses to qualitative questions.
Example:
“Which of the following best describes your opinion of the taste of Coke?”
Too Sweet
1
Very Sweet
2
Just Right
3
Slightly Sweet
4
Not Sweet
5
The Likert scale elicits more information than a simple “Yes/No” response  the analyst
can gauge the degree rather than simply the direction of opinion.
© Copyright 2006. Do not distribute or copy without permission.
17
Designing Survey Instruments
Rules for Using the Likert Scale
1.
2.
3.
4.
Use 5 or 7 gradations of response.
 fewer than 5 yields too little information
 more than 7 creates too much difficulty for respondents in distinguishing one
response from another
Always include a mid-point (or neutral) response.
When appropriate, include a separate response for “Not applicable,” or “Don’t
know.”
When possible, include a descriptor with each response rather than simply a single
descriptor on each end of the scale.
Example:
Yes
Very Bad
1
No
Very Bad
1
Bad
2
2
Neutral
3
3
Good
4
Very Good
5
4
Good
5
The presence of the lone words at the ends of the scale will introduce a bias by
causing subjects to shun the center of the scale.
© Copyright 2006. Do not distribute or copy without permission.
18
Designing Survey Instruments
Rules for Using the Likert Scale
5.
Use the same words and (where possible) the same number of words for each
descriptor.
Example:
Yes
Very Bad
1
Bad
2
Neutral
3
Good
4
Very Good
5
No
Bad
1
Poor
2
OK
3
Better
4
Best
5
When using different words for different descriptors, subjects may perceive varying
quantities of difference between points on the scale.
For example, subjects may perceive that the difference between “Bad” and “Poor”
is less than the difference between “Poor” and “OK.”
© Copyright 2006. Do not distribute or copy without permission.
19
Designing Survey Instruments
Rules for Using the Likert Scale
6.
Avoid using zero as an endpoint on the scale.
Example:
Yes
Very Bad
1
Bad
2
Neutral
3
Good
4
Very Good
5
No
Very Bad
0
Bad
1
Neutral
2
Good
3
Very Good
4
On average, subjects will associate the number zero with “bad.” Thus, using zero at
the endpoint of the scale can bias subjects away from the side of the scale with the
zero.
© Copyright 2006. Do not distribute or copy without permission.
20
Designing Survey Instruments
Rules for Using the Likert Scale
7.
Avoid using unbalanced negative numbers.
Example:
Yes
Very Bad
-2
Bad
-1
Neutral
0
Good
1
Very Good
2
No
Very Bad
-3
Bad
-2
Neutral
-1
Good
0
Very Good
1
Subjects associate negative numbers with “bad.” If you have more negative
numbers on one side of the scale than the other, subjects will be biased away from
that side of the scale.
© Copyright 2006. Do not distribute or copy without permission.
21
Designing Survey Instruments
Rules for Using the Likert Scale
8.
Keep the descriptors balanced.
Example:
Yes
Very Bad
1
Bad
2
Neutral
3
Good
4
Very Good
5
No
Very Bad
1
Bad
2
Slightly Good
3
Good
4
Very Good
5
Subjects will be biased toward the side with more descriptors.
© Copyright 2006. Do not distribute or copy without permission.
22
Designing Survey Instruments
Rules for Using the Likert Scale
9.
Arrange the scale so as to maintain (1) symmetry around the neutral point, and (2)
consistency in the intervals between points.
Example:
Yes
Very Bad
1
No
Very Bad
1
No
Very Bad
1
Bad
2
Neutral
3
Good
4
Very Good
5
Bad Neutral Good Very Good
2
3
4
5
Bad
2
Neutral Good
3
4
Very Good
5
In the second example, subjects perceive the difference between “Neutral” and
“Very Bad” to be greater than the difference between “Neutral” and “Very Good.”
Responses will be biased toward the right side of the scale.
In the third example, subjects perceive the difference between “Very Bad” and
“Bad” to be greater than the difference between “Bad” and “Neutral.”
Responses will be biased toward the center of the scale.
© Copyright 2006. Do not distribute or copy without permission.
23
Designing Survey Instruments
Rules for Using the Likert Scale
10. Use multi-item scales for ill-defined constructs.
Example:
“I liked the product.”
Strongly Agree
Agree
1
2
Yes
No
Neutral
3
Disagree
4
Strongly Disagree
5
“I am satisfied with the product.”
Strongly Agree
Agree
Neutral
1
2
3
Disagree
4
Strongly Disagree
5
“I believe that this is a good product.”
Strongly Agree
Agree
Neutral
1
2
3
Disagree
4
Strongly Disagree
5
“I liked the product.”
Strongly Agree
Agree
1
2
Disagree
4
Strongly Disagree
5
© Copyright 2006. Do not distribute or copy without permission.
Neutral
3
24
Designing Survey Instruments
Rules for Using the Likert Scale
10. Use multi-item scales for ill-defined constructs.
Ill-defined constructs may be interpreted differently by different people. Use the
multi-item scale (usually three items) and then average the items to obtain a single
response for the ill-defined construct.
Example:
The ill-defined construct is Product satisfaction
We construct three questions, each of which touch of the idea of product
satisfaction. A subject gives the following responses:
“I liked the product.”
“I am satisfied with the product.”
“I believe that this is a good product.”
4
4
3
Average response for Product satisfaction is 3.67
© Copyright 2006. Do not distribute or copy without permission.
25
Designing Survey Instruments
Rules for Using the Likert Scale
10. Use multi-item scales for ill-defined constructs.
Be careful that the multi-item scales all measure the same ill-defined construct.
Yes
“I liked the product.”
“I am satisfied with the product.”
“I believe that this is a good product.”
No
“I liked the product.”
“I am satisfied with the product.”
“I will purchase the product.”
The statement “I will purchase the product” includes the consideration of “price”
which the other two questions do not.
© Copyright 2006. Do not distribute or copy without permission.
26
Designing Survey Instruments
Rules for Using the Likert Scale
11. Occasionally, it is useful to verify that the subjects are giving considered (as
opposed to random) answers. To do this, ask the same question more than once at
different points in the survey. Look at the variance of the responses across the
multiple instances of the question. If the subject is giving considered answers, the
variance should be small.
© Copyright 2006. Do not distribute or copy without permission.
27
Designing Survey Instruments
Rules for Using the Likert Scale
12. Avoid self-referential questions.
Yes
“How do you perceive that others around you feel right now?”
No
“How do you feel right now?”
Self-referential questions elicit bias because they encourage the respondent to
answer subsequent questions consistently with the self-referential question.
Example:
If we ask the subject how he feels and he responds positively, then his subsequent
answers will be biased in a positive direction. The subject will, unconsciously,
attempt to behave consistently with his reported feelings.
Exception:
You can ask a self-referential question if it is the last question in the survey. As long
as the subject does not go back and change previous answers, there is no
opportunity for the self-reference to bias the subject’s responses.
© Copyright 2006. Do not distribute or copy without permission.
28
Designing Survey Instruments
Example:
We want to test the effect of relevant news on purchase decisions. Specifically, we
want to know if the presence of positive news about a low-cost product increases the
probability of consumers purchasing that product.
Causal Design:
We will expose two subjects to news announcements about aspirin. The control group
will see a neutral announcement that says nothing about the performance of aspirin.
The experimental group will see a positive announcement that says that aspirin has
positive health benefits.
After exposure to the announcements, we will ask each group to rate their attitudes
toward aspirin. Our hypothesis is that there is no difference in the average attitudes
toward aspirin between the two groups.
To account for possible preconceptions about aspirin, before we show the subjects
the news announcements, we will ask how frequently they take aspirin. To account
for possible gender effects, we will also ask subjects to report their genders.
© Copyright 2006. Do not distribute or copy without permission.
29
Designing Survey Instruments
How often do you take aspirin?
Infrequently
1
2
Occasionally
3
4
5
Frequently
6
7
Please identify your gender (M/F).
All subjects are first asked to respond to
these questions.
© Copyright 2006. Do not distribute or copy without permission.
30
Designing Survey Instruments
Subjects in the control
group see this news
announcement. The
analyst reads the
headline and the
introductory paragraph.
Subjects in the control group
are then asked to answer this
question.
Please rate your attitude toward aspirin.
Unfavorable
1
© Copyright 2006. Do not distribute or copy without permission.
2
Neutral
3
4
Favorable
5
6
7
31
Designing Survey Instruments
Subjects in the
experimental group see
this news announcement.
The analyst reads the
headline and the
introductory paragraph.
Subjects in the experimental
group are then asked to
answer this question.
Please rate your attitude toward aspirin.
Unfavorable
1
© Copyright 2006. Do not distribute or copy without permission.
2
Neutral
3
4
Favorable
5
6
7
32
Statistical inference within the context of classical regression analysis requires that
the researcher test a hypothesis against a sample data set.
Example
1. Hypothesize that the following relationship exists
Y    1 X1   2 X 2  u
2. Determine the likelihood of observing the sample data assuming that the
hypothesis is correct.
The process requires the specification of a hypothesized model  the theory leads
the data.
However, in many applications, there is no hypothesized model because there is no
(or incomplete) theory  the data leads the theory.
Example: Clinical data, financial data
In these instances, stepwise regression procedures are typically employed.
© Copyright 2006. Do not distribute or copy without permission.
Stepwise Regression
Begin with a set of K candidate regressors. “Smartly” select regressors so as to
achieve the highest adjusted R2.
For a set of K candidate regressors, the number of regression models that can be
constructed is 2K – 1.
Ideally, the researcher would look at all 2K – 1 models and select the “best” model.
In practice, the number of models becomes prohibitively large very quickly.
 With 40 candidate regressors, one can construct more than 1 trillion models.
 It would take 100 years to estimate every one of the models.
© Copyright 2006. Do not distribute or copy without permission.
For a set of K candidate regressors, the number of regression models that can be
constructed is 2K – 1.
1,000,000,000,000,000
100,000,000,000,000
10,000,000,000,000
1,000,000,000,000
Number of Models
100,000,000,000
10,000,000,000
1,000,000,000
100,000,000
10,000,000
1,000,000
100,000
10,000
1,000
100
10
1
0
5
10
15
20
25
30
35
Number of Candidate Regressors
© Copyright 2006. Do not distribute or copy without permission.
40
45
50
As K increases, the time required to run all 2K – 1 models increases exponentially.
The practical limit (running on a single computer) is 25 to 30 candidate regressors.
100000000
100 years to estimate all possible models that
can be constructed from 40 candidate factors
10000000
100000
10000
1000
100
10
0.01
0.001
0.0001
0.00001
0.000001
0.0000001
Number of Candidate Regressors
© Copyright 2006. Do not distribute or copy without permission.
43
40
37
34
31
28
25
22
19
16
13
7
10
0.1
4
1
1
Hours to Estimate All Possible Models
1000000
Stepwise Regression
To avoid looking at all 2K – 1 models, stepwise methods employ heuristics to arrive
at a “good” model in a reasonable amount of time.
Problems:
1. The likelihood of stepwise methods failing to find the best model falls as the
number of candidate regressors rises. For K > 7, stepwise methods almost
always fail to find the best model (according to stepwise’s definition of “best”).
2. Which model stepwise returns usually depends on the (arbitrary) starting point.
3. The likelihood of stepwise methods finding a spurious model rises with K.
4. Stepwise provides no information as to the quality of the returned model relative
to other possible models.
© Copyright 2006. Do not distribute or copy without permission.
Stepwise methods select a starting model and then iteratively adjust the model in an
attempt to find better models.
A
Quality of Model
C
Bad
Poor
Better
Good
Best
D
Set of all 2K-1 possible
models.
B
Of the four starting points shown, only with starting point
D will stepwise methods discover the best model.
© Copyright 2006. Do not distribute or copy without permission.
Alternative Approach: All Subsets or “Exhaustive” Regression
Find the “best” model by estimating all possible 2K – 1 models.
Problem:
How to define “best?”  Typically, highest adjusted R2.
But, for K even relatively small (e.g. K > 10), the likelihood of the model with the
highest adjusted R2 being spurious is high.
Question:
The R2 measure is a “within model” measure as it is based on information contained
in a specific model.
Can a measure be constructed that utilizes information “across models” in an
attempt to guard against spurious results?
© Copyright 2006. Do not distribute or copy without permission.
Conclusion: Employing assumptions weaker than those of the Classical Linear
Model, we can obtain unbiased parameter estimates via taking the mean of
parameter estimates across models.
Question: What is the distribution of parameter estimates across models?
Construct a “cross-model test statistic” for each of the K candidate regressors.
H0 : i  0
Ha : i  0
2K 1
2
 ˆi , j
ci   
 s
j 1
 i
Estimate of ith coefficient derived from the jth subset of
candidate regressors.




2
~  22K 1
2
Standard deviation of the ith coefficient across
the subsets of candidate regressors.
© Copyright 2006. Do not distribute or copy without permission.
Download