Vienna Regression

advertisement
Regression Analysis: Outline
• Review on Regression Analysis
• Regression with Categorical explanatory
variables
• Pooled Regression: Fixed Effect and
Random Effect models
1
Regression Analysis in the overall context of Research
• Research Purpose
– Research questions, objectives, hypotheses
• Methodology
–
–
–
–
Type of Study
Sampling plan and sample size determination
Data collection methods
Data analysis plan
• Execution
–
–
–
–
Data collection and analysis
Data collection and Data analysis
Discussion and Conclusion
Research Evaluations
2
Regression Analysis: Review
•
What is Regression?
• Dependence measure~ estimate the overall relationships between the dependent and
independent variables
• Examples of dependent and independent variables?
• Regression and Causality (~ experiment, theory )
• Regression (~predict dependent) and Correlation (~ linear association)
•
Uses of Regression
•
•
•
•
Descriptive~ describe relationship and how strong?
Inference ~ which variables are most important/ significant?
Predictive ~ forecasting
Hypothesis Testing
• Sample Size
3
Type of Variables in Regression Analysis
•
•
•
•
•
Independent
Dependent
Moderating
Mediating
Moderation-mediation
4
Moderating Variables
• Moderating Variables
Testing Moderation
• Y = b0 + b1* X + b2* Z + b3* XZ +e
Y = [b1 + b3* Z] X + [b0+b2*Z]
5
Mediator Variables
• Mediator Variables
b
a
Attitude
B
BI
c
6
Multivariate Research Methods: Regression Analysis: Review
•
How it works?
• Formalization of regression model:
Systematic
•
part
y = b0 + b1 x1+ b2 X2+ …+bk Xk+ error
–
intercept, slope, error
–
Examples??
usystematic
part
• What do we observe? Y and X’s and estimate b’s
• Which variables to include?
–
–
Theory, Prior research, common sense
If you don’t have any idea?
» statistical criteria: stepwise, Forward and Backward ( in cases of only metric data??)
• Moderator Effects ~ Interaction Variables
• How to Obtain Estimates?
–
–
–
–
–
–
Least square method of Regression
Any straight line you fit will have some error
Objective is to minimize that errors e.g. sums of squared values of difference between Y and Y-predicted.
Or minimize the sum squares errors
Y = a + b*X + e leads to e = Y - a -b*X
e2 = (Y - a - b*X)2 ~ minimize sum of e2
7
Multivariate Research Methods: Regression Analysis:
Review
•
Interpretation of parameter estimates?
• Intercept
• mean of the dependent ~ when value of all independent variables are zero
• Mean of the dependent ~ when all slopes are zero
• Not always meaningful
• Slopes:
• Change in Y as we change one unit of X.
• zero slope ? X does not affect Y
• b1, b2,…..bk: partial regression coefficients
• e.g. b1 = Change in the value of Y if X1 is changed by one unit while all other
explanatory variables are ( X2 …Xk) kept constant.
8
Multivariate Research Methods: Regression Analysis:
Review
•
Interpretation of parameter estimates?
• Size of the regression coefficient
• depends on the scale of the explanatory variable
• Which variable is a good explanatory variables then size of the coefficient is not a
good predictor for that.
• Scale of the independent variables ~ within 10 times
•
Beta coefficients/ or standardized coefficients,
• provides relative importance
• Elasticity: This measures the percentage change in dependent variable for 1 %
change in the independent variable.
X
elasticity  
Y
9
Multivariate Research Methods: Regression Analysis: Review
•
Is Regression coefficient Significant?
•
Overall goodness of fit?
•
r2
r2 
ESS
RSS
 1
TSS
TSS
0  r2 1
•
Is Regression Significant?
•
r ~ coefficient of multiple correlation
•
adjusted r2
Y
RSS ( error)
TSS
ESS
Y= b0+bX
X
10
Multivariate Research Methods: Regression Analysis: Review
Majo r as s um pt io n s
He t e ro s c e das t ic it y
Au t o c o rre lat io n
Mu lt ic o llin e arit y
 Th e v arian c e o f t h e e rro r t e rm is c o n s t an t
 Th e re is n o au t o c o rre lat io n in t h e e rro r t e rm
 Th e re is n o e x ac t lin e ar re lat io n s h ip in t h e
in de pe n de n t v ariable s
 Th e re m u s t be v ariabilit y in t h e in de pe n de n t
v ariable s
 Th e re gre s s io n m o de l is c o rre c t ly s pe c ifie d
 Th e re gre s s io n m o de l is lin e ar in param e t e rs
 Th e m e an v alu e o f t h e e rro r t e rm is z e ro
 No c o v ariat io n be t we e n e rro rs an d
in de pe n de n t v ariable s
 Th e e rro r t e rm is n o rm ally dis t ribu t e d
11
Multivariate Research Methods: Regression Analysis:
Review
•
•
Detecting problems with the assumptions?
Heteroscedasticity
• error variances are not same
• when errors are related to either dependent or independent variables
• e.g more stable saving ( or consumption) with lower income families/ larger variances with
brand switchers than brand loyal customers
Variance
Saving
Income
•Remedy ?? If we know the nature of heteroscedasticity, we can use WLS
• Volatility ~ Finance ??
12
Regression Analysis : Review
•
•
Detecting problems with the assumptions?
Autocorrelation~ more a time-series problem
•
•
•
•
•
when errors are correlated with consecutive obs.
Reasons?
Omitted variables
Model mis-specification
Detection
• Graphical methods
• Durbin-Watson ~ DW= 2 (1-r), DW varies between 0 - 4
–
et
ideal number is 2
Y
Positive
Negative
Problem?
X
• Over estimate coeff. of determination and
underestimate the standard errors
et-1
13
Multivariate Research Methods: Regression Analysis: Review
•
•
Detecting problems with the assumptions?
Multicollinearity
X2
X1
Y
X1
X2
Y
•
•
presence of very high interrelations among explanatory variables (do not violate any assumption)
Symptoms:The standard errors are likely to be high, Estimates are not reliable?
•
Detection
• Bivariate correlation
• Variance Inflation Factor (VIF)~ 10
• Tolerance = 1/VIF
•
VIF 
1
1  ri2 .
Remedies
• Drop variables
• composite variables e.g. Family life cycles, Social Status
• Factor analysis
14
Multivariate Research Methods: Regression Analysis: Review
•
Detecting problems with the assumptions?
•
Linear in parameters
• Y = a + b*X2 + e ~ linear in parameters but non-linear in variables
• Y = a + b2 *X1 + b*X2+ e ~ non-linear in parameters: Non-linear regression
•
The Regression model is correctly specified
• Functional form, e.g. new consumer durable sales
• Influential observation
• outliers
• whether one or a few observations??
15
Regression Analysis: Review
•
•
•
Outliers: In linear regression, an outlier is an observation with large residual. Problem with
dependent variable??
Leverage: An observation with an extreme value on a independent variable is called a
point with high leverage. Leverage is a measure of how far an independent variable
deviates from its mean. These leverage points can have an unusually large effect on the
estimate of regression coefficients.
Influence: An observation is said to be influential if removing the observation substantially
changes the estimate of coefficients.
• Detection
• RESIDUAL CHECK
– Standardized residual
– Studentized residual
– Problem approx.: abs. value > 2
ei* 
ei* 
s
si
ei
1  hi
ei
1  hi
16
Regression Analysis: Review
•
Transformation of variables
– Dependent variable should be normally dist., constant variance etc
– e.g. GNP per capita, Log(Price) etc
– Retransformation ??
•
Forecasting
• model fit versus forecasting
• forecasting independent variables
•
Model Selection / comparing models
• adjusted R-sq
•
Model Validation
• Cross-validation
• Jackknife validation
17
Multivariate Research Methods: Regression Analysis:
Limitations
•
Nominal independent variables ~ dummy variable regression
–
•
gender, income groups, ethnicity, region, race etc.
Measurement error~ Structural equation models
• XTrue = Xobs + ex
• Y=b0 +b1 * XTrue + eY
• Y= b0 +b1 * (Xobs + ex) + eY
• Y= b0 +b1 * Xobs + b1*ex + eY
• Y= b0 +b1 * Xobs + b1*ex + eY
Error term is
correlated with
x-variable ~ this
violates the reg.
assumption
18
Regression Analysis: Limitations
• Limited dependent variable
– Censored dependent variable ~ lots of zeros
•
•
•
Expenditures in home buying
Demand in a supply restricted situation
vacation expenditures
Y (e.g housing exp.)
–
Tobit Regression














X (e.g. income)
Truncated dependent variable ~ duration analysis, available in LIMDEP
•
•
Interpurchase times
duration of unemployment
19
Regression with Categorical Explanatory Variables
• Some modeling problems
• Is gender important in determining the level of expenditure on medical
expenses?
• Do Nescafe’s supermarket coffee sales vary by state?
• How would you model the impact of local crime on housing prices if
crime rate were rated - none, moderate or high?
• How do I include income as a determinant of cigarette demand when
data have only been collected by income class?
• Examples
• Medical expenditure = intercept+ b1* Gender + b2* age group + error
• Sales=intercept+ b1*Provinces+ error
20
Interpretation of regression coefficients: Binary Coding
• Midterm exam scores by
sex
Yi 
• .
score
0  1 Di
Y  score
i
D  1, if male
i
 0, if
female
Yavg , fem  0
Yavg , male  0  1
1
0
female
male
21
Interpretation of regression coefficients: Effect Coding
• Midterm exam scores by sex• .
score
Yi   0  1D i
Yi  score
D i  1, if male
 1, if female
Yoverall mean   0
Yavg . male   0  1
1   2  0
 2  1
Note: we are not estimating
1
0
2
female
2
male
22
Regression Analysis: Non-Linear Regression
•
Example: Sales and Price dynamics of New Product Sales
First Purchase
Sales
Price
Time
Time
23
Pooled Regression: Fixed Effect and Random Effect
models
• Panel Data – Cross Sectional Time Series Data
• Observations on “n” individuals (or countries, firms etc), each
measured at T points in time (T can be different for each
measuring unit)
• Observations are not independent
• use panel structure to get better parameter estimates
• Control for fixed or random individual differences
• Example of Data Setup….
• Software : LIMDEP ( also SAS…)
• Example: Cross-sectional survey 50% Female Participation in
Labor Force??
24
y it     i   X 'it   e it
Pooled Regression: Fixed Effect and Random Effect
models
• Fixed Effect – individual
slopes are different - shifted
by “fixed” amount
y it     i   X 'it   e it
y it   i  X 'it   e it
• Random Effect – individual
differences are random rather than
fixed – random slope terms. The
slope is function of mean slope value
plus random error
y it    X 'it   (e it  u i )
- Unobserved heterogeneity
that is stable over time
- This ui is uncorrelated with X’s
25
Pooled Regression: Fixed Effect and Random Effect
models
• The Hausman Test:
• Model Selection – Fixed Effect vs Random
Effect
– H0: that random effects would be consistent
and efficient, versus
– H1: that random effects would be inconsistent.
Chi-Square Test Statistic.
26
Download