Notes for Regression Models with Excel

Notes for Regression Models with Excel Population  parameter (a characteristic)  Sample  a statistic Statistics: Parametric: assume a known distribution – mostly Normal Non-Parametric: no assumption as to underlying distribution Descriptive: no assumption or detailed analysis – mean, mode, median, frequency, etc. use “explore” or “pivot tables” or “Chart Wizard” or “Crosstabs” or scatter diagrams to see how data relates to each other r= correlation coefficient -1 to +1 r2 = coefficient of determination 0 to 1 Variables: Lots of different kinds – a symbol that can have one of many values Independent Dependent Lurking Moderating Also Factors – is sort of a variable of variables Scales for Variables: Four major types: Nominal – no scale or direction – male/female Ordinal – no scale but does have a rank or direction – Likert scale Least preferred 1 2 3 4 5 most preferred Scale – interval – direction and magnitude – number of square feet in shopping mall ratio – same as interval but contains zero Parametric statistics: use scale variables – t, z and F statistics Non-parametric statistics: use ordinal, nominal variables - binomial distributions, χ2 (chi) squared tests MGTC74 Page 1 of 4 Regression R1 Applications: Hypothesis testing Experimental Design Statistical Process Control Factor Analysis Cluster Analysis Multidimensional Scaling Parametric Assumptions 1. observations must be independent 2. observations are from a normally distributed population 3. populations should have equal variances σ2=  4. measurements are at least arithmetic (scale) Multiple Linear Regression Single Linear Regression Y=mx + b +  Time Series X axis is time in equal intervals Multiple Linear Regression MGTC74 Page 2 of 4 Regression R1 Assumptions of MLR 1. observations are independent 2. relationship between any two variables is linear 3. for each value of the independent variable there is a normal distribution for the dependent variable 4. all distributions for the independent variables have same variance σ2=  Very Very General Procedure to Build a MLR Model 1. check assumptions, choose variables, assess type of variable 2. perform exploratory statistics - mean, mode, median - scatter diagrams with Chart Wizard (histograms, stem-and-leaf plots etc) 3. run a correlation table - determine order of adding variables - eliminate highly correlated variables 4. Run a step-wise regression. Assess via r2, F and t Excel’s regression Tool provides an extensive analysis of the regression equation. Its first table lists a number of statistics, including three measures of fit. As in the case of simple regression we can use r2 as a basic measure of fit. In the case of multiple regression r2 is called the coefficient of multiple determination and its square root, r is called the multiple r or the multiple correlation coefficient. Excel reports both values. The table also reports an adjusted r2. The reason for introducing another measure is that adding independent variables to the regression will never decrease the r2 and will usually increase it even if there is MGTC74 Page 3 of 4 Regression R1 no reason why the added variable should help predict the dependent variable. A suitable adjustment accounts for the effects of the number of variables in the equation and will cause the adjusted r2 to decrease unless there is a compensating reduction in the differences between observed and predicted values. One informal guideline for how many independent variables to include in a regression equation is to find the largest value of the adjusted r2. However, we strongly recommend including variables that can be justified on practical or theoretical grounds, even if the adjusted r2 is not at its largest possible value. The “Principle of Parsimony” holds that the best model is the one with the fewest independent variables – after all, this is a business course and it costs money to collect data – you want to sort of find the maximum r2 for the least cost; adding more variables for small improvements is often not practical nor does it make sense. The regression report also lists a value for the F statistic (labeled significant F), which has the same interpretation as in a simple regression: it measures the probability of getting the observed r2 or higher, if, in fact all the regression coefficients were zero. This is a measure of the overall fit of the regression model to the data, but it is not a particularly stringent test. We would expect this probability to be low in any well considered model for which relevant data were being used. The estimated coefficients of the regression equation, which we did noted as  can be found under the coefficients heading in the Excel regression output. Corresponding to each coefficient, the table provides a p-value which measures the probability that we would get an estimate at least this large if the true regression coefficient or zero. While the p-value does provide you with a quick check on each coefficient, the respective confidence intervals generally provide more useful information. 5. Write out model. Make appropriate plots and interpretations. . MGTC74 Page 4 of 4 Regression R1

Notes for Regression Models with Excel

Related documents

Products

Support

Notes for Regression Models with Excel

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib