Psy 524 Ainsworth Psy 524 Lab #2 Data Screening

Psy 524 Ainsworth Psy 524 Lab #2 Data Screening Write any answers and paste any figures into this document after the appropriate question and either print it out and hand it in or email to by class on Tuesday, Feb. 17 th. 1. Create a .lsp file out of the forclass.sav data (download the creating a .lsp file document on the class webpage). a. Save forclass.sav as a tab delimited file (forclass.dat). b. Open arcinstruction.txt in NOTEPAD and make sure to insert actual information in any line that has a ! in front of it and delete the first two lines of directions. c. Open up forclass.dat in notepad as well (you may need to open up notepad separately so that both are open at the same time). Delete the variable names from the top of the data file and use them in the .lsp syntax. Copy all of the columns in the forclass.dat data and insert in between the parantheses. “Save as” forclass.lsp. Dataset = Forclass Begin description Data set I had lying around and thought I’d use as an example End description Begin variables Col 0 = gender = female = 1 and male =0 Col 1 = ethn = ethnicity Col 2 = sos = social opinion survey Col 3 = ego = egotism Col 4 = n = neroticism Col 5 = e = extroversion Col 6 = o = openess Col 7 = a = agreeableness Col 8 = c = conscientiousness col 9 = soitot End variable Begin data ( 1 0 89 162.5 26 35 36 36 0 1 52 151 38 36 34 40 0 0 57 185.5 32 43 43 52 1 0 57 179.5 28 46 31 35 1 0 90 167 42 53 47 37 …. … … … … … … … 51 33 48 48 31 … -.487677316090359 -.987537222508109 -.840172443534852 .53362825330652 -.362217276609241 ……………………. All the way to the end of the data file, make sure to include the close parantheses below ) 2. Looking at graphical relationship among variables. Open Arc and load forclass.lsp. Create a Multipanel plot of SOI (fixed axis on horizontal) versus N, E, A, O, C, ego and SOS, marking by gender (assume females are 1). To do this go to Graph and Fit, move SOI into “Fixed Axis” and the others (n to sos) into “changing axis”, move gender into “mark by” and then hit OK. Put OLS = 1 and use the bottom slide bar to change through predictors. Which variables have or appear to have relatively strong relationships with SOI? Copy and paste the graph with the strongest relationship below. Psy 524 Ainsworth 3. Looking at the same multiple panel plot from put OLS back at 0 and select “Remove Linear Trend”, do any of the variables seem to be heteroscedastic? There’s a slight pattern of tighter variance at the lower sos values. 4. Examine the histograms for each of the variables (except gender, ethn and case numbers) and identify any variables with outliers (disconnection) or skewness (use 10 bins in each histogram). Go to graph and fit, plot of and Psy 524 Ainsworth move each variable in one at a time. If there is a variable with outlier copy and paste it below and then delete the outliers from the distribution. If there is a variable that is skewed copy and paste it below. ego has an outlier soitot is skewed Psy 524 Ainsworth 5. If there was a variable from #3 that is skewed transform it. Go to forclass button (if this is what you put in the “dataset =” part of the .lsp syntax). Click on transform, move the variable over, change it to log transform, put the number 3 in the “c” box (this is a constant your adding to take it out of a scale that includes 0), and hit OK. Plot a histogram of this newly transformed variable and paste it below. 6. Download the salary data set from the class webpage. Using SPSS open it (it is called “salery.sav”, it was spelled that way when I got it). a. Create a casenum variable by using compute -> casenum = $casenum. b. Calculate mahalanobis distances. Analyze -> Regression -> Linear and move casenum into dependent and everything else except “female” into Independent(s). Hit “Save” and click on “Mahalanobis”. Continue -> OK. Close the output because you don’t need it. What is the cutoff value for multivariate outliers given these predictors? The cutoff would be 18.467 (chi square with 4 dfs and at alpha equals .001). And are there any cases that qualify as multivariate outliers? Yes If so which case(s)? Subject number 14 with a mahal score of 19.03417. c. Split the file by Data -> Split File click on “compare groups” and move “female” into “Groups based on”. What function does this serve? This allows you to screen women and men separately for outliers. Repeat “b” above. Did the values change? Yes the values changed. Are there any multivariate outliers this time? No. What did this exercise demonstrate? That screening by groups can lead to different results than screening without grouping. If you are Psy 524 Ainsworth performing a grouped analysis (ANOVA, MANOVA, etc.) screen by groups separately. 7. Download “social.sav” from the class webpage. Perform a missing value analysis. a. Transform -> Compute and enter num=nvalid(ciccomp to supcomp) -> OK. Data -> Select Cases -> If condition satisfies -> If, then enter num <> 0 into the window -> continue, select “unselected cases are” DELETED. This gets rid of any subject without scores on any quantitative variable. b. Analyze -> Missing Value Analysis, include everything except gender, order and num into Quantative Variables. c. Click on “Descriptives” and select “t-test with groups…” and “Include probabilities in table”, continue. d. Select “EM”, then hit new “EM” button -> “Save completed data” -> “File” and title it social2.sav, click on continue and then hit OK. e. Interpret the output and pick the variable causing the largest dependency on the other missing values. What is the probability of Little’s MCAR analysis? Repeat the steps above removing the variable causing biggest problem. What is the MCAR probability now? Pretend it’s OK and continue. CICCOMP QDICOMP SRCHCOMP EICOMP SUBCOMP OOCOMP SUPCOMP EICOMP SRCHCOMP QDICOMP CICCOMP Separate Variance t Testsa t df P(2-tail) # Present # Missing Mean(Present) Mean(Missing) t df P(2-tail) # Present # Missing Mean(Present) Mean(Missing) t df P(2-tail) # Present # Missing Mean(Present) Mean(Missing) t df P(2-tail) # Present # Missing Mean(Present) Mean(Missing) t df P(2-tail) # Present # Missing . . . 228 0 67.7412 . 1.4 13.0 .177 216 12 67.9213 64.5000 -.9 11.2 .392 217 11 67.6175 70.1818 .7 17.1 .466 212 16 67.8821 65.8750 -.7 33.6 .497 200 .2 31.5 .841 216 26 22.1574 22.0000 . . . 242 0 22.1405 . -.8 9.7 .451 232 10 22.0991 23.1000 .4 16.8 .677 225 17 22.1867 21.5294 -.5 33.1 .618 216 2.1 33.5 .046 217 28 22.2995 19.8571 -3.0 15.2 .009 232 13 21.8405 25.2308 . . . 245 0 22.0204 . 2.9 18.4 .009 229 16 22.2489 18.7500 -1.2 33.0 .245 218 1.0 29.0 .304 212 26 22.0094 20.7308 -2.2 14.6 .041 225 13 21.7378 24.1538 .1 8.5 .914 229 9 21.8777 21.6667 . . . 238 0 21.8697 . -2.0 33.2 .059 213 .2 30.7 .830 200 25 36.3800 36.0400 -.6 9.2 .582 216 9 36.2963 37.4444 .8 6.2 .466 218 7 36.4495 33.0000 1.6 11.5 .128 213 12 36.6244 31.3333 . . . 225 .5 30.6 .590 218 26 22.1514 21.5769 -.3 12.0 .782 232 12 22.0690 22.5000 -1.5 10.9 .162 233 11 21.9871 24.2727 1.5 17.3 .154 227 17 22.2555 19.8824 .1 28.6 .929 219 -1.0 29.3 .333 201 25 36.0199 37.7200 .9 11.0 .386 215 11 36.3163 34.0909 -1.0 5.3 .370 220 6 36.1273 39.1667 -1.5 15.7 .147 212 14 36.0425 38.7143 1.1 30.2 .290 202 Mean(Present) 67.5600 22.1019 21.8716 21.6901 36.3422 22.1005 36.3812 Mean(Missing) 69.0357 22.4615 23.2222 23.4000 . 22.0000 34.7500 t df P(2-tail) # Present # Missing Mean(Present) 1.8 9.5 .109 218 10 .4 10.8 .685 232 10 -1.2 11.6 .255 233 12 -.5 10.6 .619 227 11 .4 5.2 .725 219 6 . . . 244 0 .8 9.7 .464 216 10 SUBCOMP This is a piece of the output, if you look at the rows for eicomp and oocomp you’ll see that there is a significant dependency with srchcomp. You could25get rid of 28 26 27 25 0 24 Psy 524 Ainsworth eicomp and oocomp because both of them have missing dependent on srchcomp but you’re losing two variables. I would cheat get rid of search instead. Little's MCAR test: Chisquare = 200.386, df = 167, Prob = .04, is significant so you really shouldn’t inpute the data. If you remove srchcomp the dependencies seem to be OK, there’s one equaling p = .041 but it’s close enough not to worry. But Little’s test is still at a probability of .044, so that’s not too good but even though I can’t find a reference for it I bet you can interpret this as a violation when it is lower than .001 or something like that (don’t quote me). f. Highlight gender and order in the data window and copy them. Open social 2 and insert the variables into the new data set. 8. Explore the new data separately by gender. You should: a. Test for skewness and univariate outliers using Explore i. Calculate Z for skewness for each variable and tell me which variables violate this test. qdicomp, eicomp, subcomp and oocomp all violate skewness ii. Take care of any outliers as you see fit (paste graphs with outliers, tell me why it’s an outlier and explain what you did to fix it(delete or change, etc…). This is all up to you, there is no right and wrong with this so I’m not going to give you a definitive answer, because there isn’t one. Although it does seem like the skewness in qdicomp is fixed by simply fixing the one extreme outlier (delete or move in) b. Do a further test of normality by asking for a P-P plot (make sure and split the file first). Graphs -> Legacy Dialogues -> P-P, move the variables you want over and hit continue. Paste the graph that seems to show the worst violation and interpret it (tell me what it means, refer to the book for help). Any of the plots that show the dots do fall on the line indicates that they are not normally distributed and in the detrended plot most of the violation seems to indicate a curvilinear relationship in stead of linear. c. Perform square root transformations using Compute and page 83, “Syntax for Common Data Transformations” or the last page of the “Data screening check list” on the webpage (I would use “Paste” and run it from the syntax, hint hint). Run explore again and tell me which variables this worked for (normalized) if any by looking at the histograms and Z for skewness. Seems like oocomp and eicomp are the best candidates for transformation. With the square root transform sqrt(29-eicomp) and sqrt(29-oocomp) they get normalized in terms of the Z test even though they are rather odd looking. Psy 524 Ainsworth d. Do the same thing as in C for any variables square root did normalize, but using a log10 transformation of the original variables (not the square rooted ones). Does this help any? For which variables? Don’t really need to do it.

Psy 524 Ainsworth Psy 524 Lab #2 Data Screening

Related documents

Products

Support

Psy 524 Ainsworth Psy 524 Lab #2 Data Screening

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib