Worksheet SPSS Workshop Introductory use and describing data GSBRC Survey Analysis Workshop Dr Helen Klieve Lecturer, Research Methods School of Education & Professional Studies Griffith University 1 SPSS Workshop Notes These notes provide support for working through the analysis of a dataset – using SPSS “tools” and analytic techniques – there is no defined order and all aspects don’t need to be covered. I have provided examples from the sample data set but you can do test these techniques these with your own data. Sections: The SPSS System: o o Data Entry Layout o o o o o Data Quality – categorical or continuous variables Options for analysis – selecting the appropriate statistics Help on SPSS Analysing subgroups : the SPLIT data function/MEANS Transform - COMPUTE and RECODE functions o 1 variable: Tools Analysis o 1 variable - DESCRIPTIVE, FREQUENCIES 1 variable – MEANS (Use Tool: MEANS to do for subgroups) 2+ variables 2 variables: CROSSTABS 2 variables – T-TESTS (Paired, Independent) 3+ variables – ONE-WAY ANOVA o Comparing 2 variables – CORRELATE - Bivariate o Assessing a Scale – SCALE (Use Tool: TRANSFORM (COMPUTE) the Scale value and RECODE into a 1-5 value) Linking to other Applications o Moving data to other applications Graphing (EXCEL) Presentation (Powerpoint) Other Techniques – will discuss in session: Factor Analysis 2 SPSS Workshop Notes Reliability Discriminant Analysis THE SPSS SYSTEM Open SPSS-17 and open a known dataset START o PROGRAMS SPSS INC SPSS STATISTICS 17.0 ENTER Dialogue Box : What would you like to do? o OPEN AN EXISTING DATA SOURCE MORE FILES Identify E: “SPSS Workshop 1 10 09.sav” ENTER Open SPSS And open given data file Note – as soon as you open your file SPSS will maintain an output file (you need to save to retain) with all your requests and associated output tables. EXPLORE THE LAYOUT OF SPSS When SPSS opens the data file you will see two boxes at bottom left: Data View / Variable view Click on each in turn and look at the data: Rows are records from individuals, columns are records on variables Look at the description of data set – or your own set How many individuals were sampled? How many variables are there? What type of records (nominal, ordinal)? Look at the VALUES column1 – this provides labels for variable options – see variables Q1* to Q11* - compare the Value Labels for these to those for the last 4 Var*REV variables. Why are these different? Can you see that you can add comments as variables in a data set – note that these can be searched for words at a later stage ADD AN ADDITIONAL VARIABLE – eg COMMENT2 In “VARIABLE VIEW” place cursor on Q1EasyToLearn Go into EDIT on toolbar o Click on Insert Variable – it will add a variable above. Click on this and add name eg COMMENT2 o Move across columns and give it same characteristics as Comment from the drop down menus o Go to “DATA VIEW” and add comments to some of the cases – these can be edited at any time 1 It’s important to know how your data is coded, what it looks like and also how to edit it or add additional variables. NOTE – the names of variables must be 1 “word” – ie start with a letter, no spaces only alphanumerics. Make names short and meaningful – when you have a lot of variables this becomes critical for this dataset. efficient working. Note you can open/enter data in a number of ways including directly accessing Excel data. Additional cases can be added t Note in referring to variables, a * is used to refer to unstated text 3 SPSS Workshop Notes TOOLS TO WORK IN SPSS DATA QUALITY Consider the quality of each variable Nominal variable– defined categories, no order (eg 1=male, 2=female) Ordinal – basic order but no defined distance between numbers (eg Likert scale) Continuous – set scale with equal distance between spaces (eg Interval/ ratio data) OPTIONS FOR ANALYSIS BASED ON DATA QUALITY – selecting the appropriate statistics: This is important – you need to consider the assumptions of any analytical approach, for example: Is it of appropriate type (categorical (ie nominal) or continuous (interval))? Is the data approximately normal? Attachment 1 Table (From Field, 2005 “Discovering Statistics Using SPSS”) provides one useful decision tool to check selection against (most texts have such tables). Note that this one is based around the data quality and doesn’t clearly separate parametric and non-parametric statistics. HELP ON SPSS SPSS has an excellent and extensive help system. You can access this either by pressing “Help” on the top menu, eg HELP TOPICS box – enter Crosstabs RETURN You will then see a range of commands you can select to see more information on OR For more specific details access help within any “analyse” option. For example ANALYSE o DESCRIPTIVE STATISTICS CROSSTABS o HELP SHOW ME This will take you through a mini presentation on using Crosstabs, the data needs, the process etc. 4 SPSS Workshop Notes ANALYSING SUBGROUPS – THE “SPLIT” AND “MEANS” COMMANDS (you may wish to come back to these after you have started your analysis) Sometimes you will want to get the same analysis for different subgroups – eg you may want to mean values for each gender or age class, you may want to do a separate crosstabs for males and females. SPLIT – this allows you to split the data set: DATA o SPLIT FILE COMPARE GROUPS Move Gender into “Groups based on” box” o ENTER A Frequency by Q1 Easy to learn will now give separate results by gender: ANALYSE DESCRIPTIVE STATISTICS FREQUENCIES o Add Q1Easy to learn To “undo” this setting go back in and click on “ANALYSE ALL CASES”. If you want to get the means of subgroups use the MEANS function, eg lets compare males and female responses on Q1Easy to Learn: ANALYSE o COMPARE MEANS Move Q1EasytoLearn into the Dependent variable box Move Gender into the Independent variable box RETURN This provides the mean values for males and females on this variable separately. COMPUTING A VARIABLE (eg for use in a SCALE) One example of use: We will assume that the Scale provides a reasonable measure of attitudes to maths (if you have time you could do a “Reliability” analysis of the Scale, also a “Factor Analysis” of the items – see summary steps below). To calculate the Scale value all items must be the same “direction” – ie a low value (12) shows low level of comfort with maths. Look at the Scale items and check you agree that agreeing with Items 1/2/6 &8 DOES NOT reflect this. Thus we use the last 4 variables where these responses have been reversed (using RECODE). We want to COMPUTE the sum, for all respondents, on the Scale: Select TRANSFORM on the top menu o COMPUTE variable Type in “ScaleSCORE” in Target Variable – this is your variable name Then, variable by variable add the elements of the scale by highlighting, clicking the arrow, and making an equation, ie: Q1REV + Q2REV +Q3NoMathsMind +Q4HardToGet + Q5*** + Q6REV + Q7*** + Q8REV + Q9*** + Q10*** +Q11*** When equation complete Press OK Note I am using *** to represent the remainder of the variable name 5 SPSS Workshop Notes Look in the Data View/Variable View to see a new variable at the end of your list This is now a new variable at the end which you can include in analysis. Note you can move the order of a variable by just highlighting and moving while in the variable view. RECODING a variable You may now want to RECODE the new variable into a more manageable dimension, eg a 1-5 variable rather than an approx. interval value ranging from 11-55. This means that, for example, in comparing this value through CROSSTABS you manage the cell size and thus potential for statistical significance Select TRANSFORM on top menu o RECODE into Different values Select variable ScaleSCORE and move into input variable box Name output variable ScaleSCOREsht and press CHANGE Click OLD AND NEW VALUES In LH Old values, click “lowest thru” and enter “19” Enter “1” in new value box, and click ADD to include in “Old to New” definition, Click CONTINUE Now go back and enter other values in the Range box for old values: 20 to 28 new value 2 29 to 37 new value 3 38 to 46 new value 4 47 to 55 new value 5 Press OK - you now have a new variables with values of 1-5 which can, for example, be easily cross tabulated against gender. I am recoding into Different values to retain the original value – you can recode on top of the original variable but this means you lose the original variable You are recoding into 5 equal sized groups 11-19 20-28 29-37 38-46 47-55 =1 =2 =3 =4 =5 ANALYSES Doing simple analyses or “Playing with the dataset” Note you can go into the top toolbar from either Variable or Data View Describing the data – single variable: Click on ANALYSE o DESCRIPTIVE STATISTICS DESCRIPTIVES Highlight variables (eg Gender, Age School) click on arrow to transfer to box Click on Q1EasyToLearn, click arrow Click on the Options Box and select some descriptive eg, Mean, Min, Max, Variance, Skewness CONTINUE OK 6 SPSS Workshop Notes Now you can click on the Output Box on the bottom bar (it will have appeared) and view your results. This will collect all the requests you make and the results provided. ANALYSE o DESCRIPTIVE STATISTICS FREQUENCIES Transfer variables eg Age, Q1EasyToLearn Select some Statistics, press continue Click on Charts – select Bar Chart CONTINUE OK View Output – this will be at the bottom We also can just request the means of any variables, however, this is looking at the means of subsets (see above in Tools): Describing the Data – 2 variables: Lets look at the relationship between Age and Gender, and Gender and an attitude response. This provides a 2-way frequency presentation. It also provides the capacity to test statistically for a pattern between the 2 variables. ANALYSE o DESCRIPTIVES CROSSTABS Place Gender into the Columns box Place Age in the Rows box Click on EXACT – select the 3rd dot – Exact o CONTINUE Click on STATISTICS o Select CHI-SQUARED CONTINUE Click on Cells o On first box select Count o Next box select rows and also columns o CONTINUE o OK Analyse o DESCRIPTIVES CROSSTABS Place Gender into the Columns box Place Q1EasytoLearn in the Rows box Click on Exact – select the 3rd dot – Exact o CONTINUE Click on STATISTICS o Select CHI-SQUARED CONTINUE Click on Cells o On first box select Count o Next box select rows and also columns o CONTINUE o OK Note: you can test whether males and females have the same pattern of attitude on Q1. Note – in using Crosstabs, SPSS will give a warning if its calculated expected cell values are <5. If you have gender (2 cols) x Attitude (5 rows) you have 10 cells. You would need at least n=50 to satisfy this requirement. You may want to recode variables to reduce the number of cells, eg coded into 10yr categories or even 3 (eg <35, 35-55, >55). A scale might be recoded to a 1-5 or 1-3 range. 7 SPSS Workshop Notes Now go to the output file and look at the results This provides an initial box summarising the data (including valid cases) The 2nd box gives you the 2 way frequency table (in numbers and row and column %) The 3rd box then gives the statistics – for the Chi-Squared test. Of interest is the Chi-Sq value, the Degrees of Freedom (df) and it also provides the exact significance – if this is less than 0.05 we have a significant difference in pattern. Are the results for Age by Gender significant? Are the results for Gender by Q1 significant ? Do any of the cells have very small numbers of observations? See Note. Degree freedom – for a Chi Sq this is (rows-1)(cols-1) Thus a 2*2 df=1 A 5*2 has df (5-1)(2-1)=4 Calculating some simple parametric statistics Simple Statistics – parametric – we will assume that the ScaleSCORE is appropriately normal Parametric statistics – (look at a histogram to check this, also Skew and Kurtosis which should be close to 0) based on a distribution. A T-TEST to see if the mean score is different for males and females. Thus an independent t-test as we are comparing the means of 2 groups not, for example, This refers to data that before/after scores in all participants (as in a paired t-test) we assume comes from a distribution eg ANALYSE a normal distribution. o COMPARE MEANS Note that with large INDEPENDENT SAMPLES T-TEST sample size you can generally assume Select “test” variables ScaleSCORE approximate Select gender as a “grouping variable” Click on “define groups” and enter 1,2 in box (ie male is normality. scored as 2, females 2) If assumptions are not OK OK then use non Look at results - note options of equal variances, and parametric tests whether there is a significant difference. We also can check if there for significance on a measure such as ScaleSCORE in Classes 1-3 ANALYSE o COMPARE MEANS ONE-WAY ANOVA Select “test” variables ScaleSCORE Select school as a “factor” You also could request a post-hoc test (eg Bonfiorri) to see, if there was a difference between classes, which ones are significantly different to others. Continue OK Check Output Independent test if 2 groups are being compared (males/females). A paired T-TEST is used when the for example a PRE/POST test is administered with the same people doing both tests. 8 SPSS Workshop Notes It may also be interesting to compare how 2 interval variables interact together – using a Pearson’s Correlation ANALYSE o CORRELATE BIVARIATE Add 2 Interval variables (you can use 2 of the attitude statements, though not really Interval, to see how it works) Click Pearson’s Select 2 sided test (ie not suggesting 1 greater or less than the other) OK See results in output file To Graph to data: GRAPHS o CHART BUILDER OK (to define chart) Double click on “Simple Scatter” o Highlight and drag the variables into the X and Y axis label boxes o Press OK If you are going to do a correlation it is useful to look at the patterns visually. A correlation will be between -1 and +1 The size and sign are important. If the correlation is +ve then both variables increase/decrease together. If its –ve one increases as the other decreases. The number says how strong the correlation is eg Weak +- .3 Med +- .5 Strong +- .8 The sample size has a major impact on significance. Additional analysis – RELIABILITY of a SCALE ANALYSE o SCALE RELIABILITY ANALYSIS Include variables – Q1-Q11 and Q1,2,6,8 REVERSE Select Statistics Select Item, Scale if Item deleted and correlations o Continue Select Model o ALPHA CONTINUE o OK o Look at results, in particular alpha o You may want to delete Q1***, Q2***, Q6***, Q8*** o Rerun and look at output 9 SPSS Workshop Notes Are there any identifiable sub-factors? ANALYSE o DIMENSION REDUCTION FACTOR ANALYSIS Include variables – Q1-Q11 (remove Q1, Q2, Q6, Q8) and Q1,2,6,8 REVERSE ROTATION – select varimax method (see help for other options) CONTINUE OK Look at results o Look at the level of variation explained o How many factors (with eigenvectors >1) o How many factors >1 o Which variables load on which factors (eg use a weight of .6 as a cutoff) Linking to other applications Making a graph in Excel / transferring to PowerPoint o o Select a simple Crosstabs output table (eg Gender by Q1) Copy/paste to Excel Make a table from this (you may need to copy headings, columns separately to make table) o o o o o Note that you can also edit your graph/data in PowerPoint Highlight Table Click CHART WIZARD (bar graph icon on toolbar), follow steps You can edit table (right click on features - scale, fill, lines) Right Click on Chart, select copy Go to PowerPoint slide (on bottom files), right click Paste 10 SPSS Workshop Notes