UNIT 3 3.14.19 Assumption checking-cont Dw=2(1-r1) 4 2𝑛= 1.5-2.5 is considered large range Dw= autocorrelation durbin Watson test of 0 lag 1 autocorrelation; around 2 retain H0 Normality of residuals o Histogram of residuals & overlay with a normal curve o Normal q-q plot Example after slide 6: SPSS handout o We will not expect it to be perfectly linear o If the line was somewhat flat then we can expect there is no linear relationship o Since the line is going up we can conclude the slope is not 0 Unstandardized residual o They plotted it from a simple regression model Can you see the clustering problem Box plots The bar within the box is the median Blue cross is the mean The blue crosses are on different lines, there are ups and downs Page 3 SPSS handout of Unit 3 o Test of normality out puts if not significant will state whether it fits assumptions of normality Histogram and outlier plots to get the normality table o Skewness and kurtosis does the same o Any significant Dealing with violation of assumptions o We need to use generalized variables o We can do transformation for z scores: a linear transformation Changes scale not shape o Need to try different transformation, lots of trial and error going on o The main interpretation is it is more difficult to interpret outcome o Some transformation used more frequently in physical science Coal mining example o Per unit change on the LNX scale will result in change on the Y axis o If it makes sense you can use transformation Slide 7 Dealing with violation of assumptions o Non constant variance (conditional) o The problem is standard error which is underestimated o If we rely on t test which is estimate over standard error o That type of error will reject more often that is should o A robust test will provide which will account for normality as well as nonconstant Dealing with violation of assumptions SLIDE 8 o If you know they are not very well mearsured it will be a problem o To fix that use measures of each construct &SEM o Nonindependency problem Clustering units: multi level models Serial dependency is due to time Time series analysis Detecting outliers o Sometimes outliers may or may not be problematic o You can look at residual standardized If there is anything plus or minus 3 are outliers o Box plot is a way to determine 3 interquarter range from 1st and 3rd quarter o You can use critical value of .001 df=total number of variables included depended and independent variable o .001 is pretty extreme o How do we know if it is small or large Large if hii>2k/n in large samples or >3k/n in small samples Mahalanobis distance: (n-1)hii Influence on regression estimates o Inferential means it actually changes our regression estimates o Yhat is the predictor to fit o Without the outlier there is a change of the estimate to fit o DfFits Will the subject in questions vs the subjects not in questions Use that model to use the same subject we are interested in We are looking at the change Basically we are looking at how omitting the subject in question will effect the something in question How to interpret dfFit stat Estimates #SD by which yhat would change if case were deleted from the data Criterion: >1 Is considered large; or >K is big for large n o Cook’s D Cook’s D is not very sensitive It’s sort of like the square of that D statistic o Specific measure (how case i affects individual Bj) J stands from the specific parameter in our regression model o