Notes for Regression Models with Excel

advertisement
Notes for Regression Models with Excel
Population  parameter (a characteristic)

Sample  a statistic
Statistics:
Parametric: assume a known distribution – mostly Normal
Non-Parametric: no assumption as to underlying distribution
Descriptive: no assumption or detailed analysis – mean, mode, median,
frequency, etc. use “explore” or “pivot tables” or “Chart Wizard” or “Crosstabs” or
scatter diagrams to see how data relates to each other
r= correlation coefficient -1 to +1
r2 = coefficient of determination 0 to 1
Variables:
Lots of different kinds – a symbol that can have one of many values
Independent
Dependent
Lurking
Moderating
Also Factors – is sort of a variable of variables
Scales for Variables:
Four major types:
Nominal – no scale or direction – male/female
Ordinal – no scale but does have a rank or direction – Likert scale
Least preferred 1 2 3 4 5 most preferred
Scale – interval – direction and magnitude – number of square feet in shopping
mall
ratio – same as interval but contains zero
Parametric statistics: use scale variables – t, z and F statistics
Non-parametric statistics: use ordinal, nominal variables - binomial distributions,
χ2 (chi) squared tests
MGTC74
Page 1 of 4
Regression R1
Applications:
Hypothesis testing
Experimental Design
Statistical Process Control
Factor Analysis
Cluster Analysis
Multidimensional Scaling
Parametric Assumptions
1. observations must be independent
2. observations are from a normally distributed population
3. populations should have equal variances σ2= 
4. measurements are at least arithmetic (scale)
Multiple Linear Regression
Single Linear Regression
Y=mx + b + 
Time Series
X axis is time in equal intervals
Multiple Linear Regression
MGTC74
Page 2 of 4
Regression R1
Assumptions of MLR
1. observations are independent
2. relationship between any two variables is linear
3. for each value of the independent variable there is a normal distribution for the
dependent variable
4. all distributions for the independent variables have same variance σ2= 
Very Very General Procedure to Build a MLR Model
1. check assumptions, choose variables, assess type of variable
2. perform exploratory statistics
- mean, mode, median
- scatter diagrams with Chart Wizard (histograms, stem-and-leaf plots etc)
3. run a correlation table
- determine order of adding variables
- eliminate highly correlated variables
4. Run a step-wise regression. Assess via r2, F and t
Excel’s regression Tool provides an extensive analysis of the regression
equation. Its first table lists a number of statistics, including three measures of fit.
As in the case of simple regression we can use r2 as a basic measure of fit. In
the case of multiple regression r2 is called the coefficient of multiple determination
and its square root, r is called the multiple r or the multiple correlation coefficient.
Excel reports both values. The table also reports an adjusted r2. The reason for
introducing another measure is that adding independent variables to the
regression will never decrease the r2 and will usually increase it even if there is
MGTC74
Page 3 of 4
Regression R1
no reason why the added variable should help predict the dependent variable. A
suitable adjustment accounts for the effects of the number of variables in the
equation and will cause the adjusted r2 to decrease unless there is a
compensating reduction in the differences between observed and predicted
values. One informal guideline for how many independent variables to include in
a regression equation is to find the largest value of the adjusted r2. However, we
strongly recommend including variables that can be justified on practical or
theoretical grounds, even if the adjusted r2 is not at its largest possible value.
The “Principle of Parsimony” holds that the best model is the one with the
fewest independent variables – after all, this is a business course and it costs
money to collect data – you want to sort of find the maximum r2 for the least cost;
adding more variables for small improvements is often not practical nor does it
make sense.
The regression report also lists a value for the F statistic (labeled significant F),
which has the same interpretation as in a simple regression: it measures the
probability of getting the observed r2 or higher, if, in fact all the regression
coefficients were zero. This is a measure of the overall fit of the regression model
to the data, but it is not a particularly stringent test. We would expect this
probability to be low in any well considered model for which relevant data were
being used.
The estimated coefficients of the regression equation, which we did noted as 
can be found under the coefficients heading in the Excel regression output.
Corresponding to each coefficient, the table provides a p-value which measures
the probability that we would get an estimate at least this large if the true
regression coefficient or zero. While the p-value does provide you with a quick
check on each coefficient, the respective confidence intervals generally provide
more useful information.
5. Write out model. Make appropriate plots and interpretations. .
MGTC74
Page 4 of 4
Regression R1
Download