Uploaded by Christie Tawiah

03.14.19 stats notes

advertisement
UNIT 3
3.14.19 Assumption checking-cont
 Dw=2(1-r1)
4
 2𝑛= 1.5-2.5 is considered large range








Dw= autocorrelation durbin Watson test of 0 lag 1 autocorrelation; around 2  retain
H0
Normality of residuals
o Histogram of residuals & overlay with a normal curve
o Normal q-q plot
Example after slide 6: SPSS handout
o We will not expect it to be perfectly linear
o If the line was somewhat flat then we can expect there is no linear relationship
o Since the line is going up we can conclude the slope is not 0
Unstandardized residual
o They plotted it from a simple regression model
 Can you see the clustering problem
 Box plots
 The bar within the box is the median
 Blue cross is the mean
 The blue crosses are on different lines, there are ups and downs
Page 3 SPSS handout of Unit 3
o Test of normality out puts if not significant will state whether it fits assumptions
of normality
 Histogram and outlier plots to get the normality table
o Skewness and kurtosis does the same
o Any significant
Dealing with violation of assumptions
o We need to use generalized variables
o We can do transformation for z scores: a linear transformation
 Changes scale not shape
o Need to try different transformation, lots of trial and error going on
o The main interpretation is it is more difficult to interpret outcome
o Some transformation used more frequently in physical science
Coal mining example
o Per unit change on the LNX scale will result in change on the Y axis
o If it makes sense you can use transformation
Slide 7 Dealing with violation of assumptions
o Non constant variance (conditional)
o The problem is standard error which is underestimated
o If we rely on t test which is estimate over standard error
o That type of error will reject more often that is should



o A robust test will provide which will account for normality as well as nonconstant
Dealing with violation of assumptions SLIDE 8
o If you know they are not very well mearsured it will be a problem
o To fix that use measures of each construct &SEM
o Nonindependency problem
 Clustering units: multi level models
 Serial dependency is due to time
 Time series analysis
Detecting outliers
o Sometimes outliers may or may not be problematic
o You can look at residual standardized
 If there is anything plus or minus 3 are outliers
o Box plot is a way to determine
 3 interquarter range from 1st and 3rd quarter
o You can use critical value of .001 df=total number of variables included
depended and independent variable
o .001 is pretty extreme
o How do we know if it is small or large
 Large if hii>2k/n in large samples or >3k/n in small samples
 Mahalanobis distance: (n-1)hii
Influence on regression estimates
o Inferential means it actually changes our regression estimates
o Yhat is the predictor to fit
o Without the outlier there is a change of the estimate to fit
o DfFits
 Will the subject in questions vs the subjects not in questions
 Use that model to use the same subject we are interested in
 We are looking at the change
 Basically we are looking at how omitting the subject in question will
effect the something in question
 How to interpret dfFit stat
 Estimates #SD by which yhat would change if case were deleted
from the data
 Criterion: >1 Is considered large; or >K is big for large n
o Cook’s D
 Cook’s D is not very sensitive
 It’s sort of like the square of that D statistic
o Specific measure (how case i affects individual Bj)
 J stands from the specific parameter in our regression model
o
Download