Laboratory 5 Regression Assumptions & Multicollinearity Assumption Checking Random sampling to get X1 (n=30) and E (n=30) from a normal distribution population, N (0,1). Let X2 = X1*X1; X3 = X1*X1*X1; Y1= X1 + X2 + X3 + E; Y2= X1 + X2 + X3 + E². Please run the following SAS program to fit the regression models Y1= X1 X2 X3 and Y2= X1 X2 X3. Check the assumption of multiple regression by examining and plotting the residuals. Report and interpret the results (both testing and graphs) for model Y2= X1 X2 X3 SAS program title1 '*********************************************'; title2 'Assumption Checking by Henian Chen 2002-09-03'; title3 '*********************************************'; data assumption; do i=1 to 100; x1=rannor(0); x2=(rannor(0))**2; x3=(rannor(0))**3; e=rannor(0); y1=x1+x2+x3+e; y2=x1+x2+x3+e**2; output; end; proc reg; model y1=x1 x2 x3/r; plot student.*x1; plot npp.*residual.; output out=reg1 r=resd1; model y2=x1 x2 x3/r; plot student.*x1; plot npp.*residual.; output out=reg2 r=resd2; data reg; set reg1 reg2; proc univariate normal; var resd1 resd2; run; Applied Epidemiologic Analysis – P8400 Page 1 Lab5: Regression Assumptions & Multicollinearity Henian Chen Fall 2002 Collinearity Random sampling to get X1 (n=30) from a normal distribution population, N (8,2²), and e (n=30) from another normal distribution population, N (0,1). Let X2 = 2X1 + e; Y = 3X1 + 1 + e. Please run the following SAS program to fit the regression models Y= X1 , Y= X2 , and Y= X1 X2. Why is X2 significant in model Y= X2, but not in model Y= X1 X2 ? Please run this program 10 times. Make a table to record the parameter estimate, SE, and P value of the final model for each model y=x1 x2/selection=forward. What did you learn from this computer experiment by comparing the β and SE? SAS program title1 '*****************************************'; title2 ''Collinearity by Henian Chen 2002-08-26'; title3 '*****************************************'; data collinearity; do i=1 to 30; x1=rannor(0)*2+8; x2=rannor(0)+x1*2; y=rannor(0)+x1*3+1; output; end; proc reg; model y=x1; model y=x2; model y=x1 x2/collin collinoint vif tol; run; proc reg; model y=x1 x2/selection=forward; run; We will practice Case-Control Analysis and Logistic regression in the next week’s laboratory. Please bring the dataset ‘case-control978.dat’, and the SAS program ‘case-control978.sas’ for labs 6 & 7. Applied Epidemiologic Analysis – P8400 Page 2 Lab5: Regression Assumptions & Multicollinearity Henian Chen Fall 2002