Survey and Basic SPSS

advertisement
Worksheet
SPSS Workshop
Introductory use and describing data
GSBRC Survey Analysis Workshop
Dr Helen Klieve
Lecturer, Research Methods
School of Education & Professional Studies
Griffith University
1
SPSS Workshop Notes
These notes provide support for working through the analysis of a dataset – using SPSS
“tools” and analytic techniques – there is no defined order and all aspects don’t need to be
covered. I have provided examples from the sample data set but you can do test these
techniques these with your own data.
Sections:
The SPSS System:
o
o
Data Entry
Layout
o
o
o
o
o
Data Quality – categorical or continuous variables
Options for analysis – selecting the appropriate statistics
Help on SPSS
Analysing subgroups : the SPLIT data function/MEANS
Transform - COMPUTE and RECODE functions
o
1 variable:
Tools
Analysis


o
1 variable - DESCRIPTIVE, FREQUENCIES
1 variable – MEANS (Use Tool: MEANS to do for subgroups)
2+ variables



2 variables: CROSSTABS
2 variables – T-TESTS (Paired, Independent)
3+ variables – ONE-WAY ANOVA
o
Comparing 2 variables – CORRELATE - Bivariate
o
Assessing a Scale – SCALE (Use Tool: TRANSFORM (COMPUTE) the Scale value
and RECODE into a 1-5 value)
Linking to other Applications
o
Moving data to other applications


Graphing (EXCEL)
Presentation (Powerpoint)
Other Techniques – will discuss in session:
Factor Analysis
2
SPSS Workshop Notes
Reliability
Discriminant Analysis
THE SPSS SYSTEM
Open SPSS-17 and open a known dataset
 START
o PROGRAMS
 SPSS INC
 SPSS STATISTICS 17.0 ENTER
 Dialogue Box : What would you like to do?
o OPEN AN EXISTING DATA SOURCE
 MORE FILES
 Identify E: “SPSS Workshop 1 10 09.sav”
 ENTER
Open SPSS
And open given data file
Note – as soon as you
open your file SPSS will
maintain an output file
(you need to save to
retain) with all your
requests and associated
output tables.
EXPLORE THE LAYOUT OF SPSS
When SPSS opens the data file you will see two boxes at bottom left:
 Data View / Variable view
Click on each in turn and look at the data:
 Rows are records from individuals, columns are records on variables
 Look at the description of data set – or your own set
 How many individuals were sampled?
 How many variables are there?
 What type of records (nominal, ordinal)?
 Look at the VALUES column1 – this provides labels for variable options – see
variables Q1* to Q11* - compare the Value Labels for these to those for the
last 4 Var*REV variables. Why are these different?
 Can you see that you can add comments as variables in a data set – note that
these can be searched for words at a later stage
ADD AN ADDITIONAL VARIABLE – eg COMMENT2
 In “VARIABLE VIEW” place cursor on Q1EasyToLearn
 Go into EDIT on toolbar
o Click on Insert Variable – it will add a variable above. Click on this and
add name eg COMMENT2
o Move across columns and give it same characteristics as Comment from
the drop down menus
o Go to “DATA VIEW” and add comments to some of the cases – these
can be edited at any time
1
It’s important to know
how your data is coded,
what it looks like and also
how to edit it or add
additional variables.
NOTE – the names of
variables must be 1
“word” – ie start with a
letter, no spaces only
alphanumerics. Make
names short and
meaningful – when you
have a lot of variables this
becomes critical for this
dataset. efficient
working.
Note you can open/enter
data in a number of ways
including directly
accessing Excel data.
Additional cases can be
added t
Note in referring to variables, a * is used to refer to unstated text
3
SPSS Workshop Notes
TOOLS TO WORK IN SPSS
DATA QUALITY
Consider the quality of each variable
 Nominal variable– defined categories, no order (eg 1=male, 2=female)
 Ordinal – basic order but no defined distance between numbers (eg Likert scale)
 Continuous – set scale with equal distance between spaces (eg Interval/ ratio data)
OPTIONS FOR ANALYSIS BASED ON DATA QUALITY – selecting the appropriate statistics:
This is important – you need to consider the assumptions of any analytical approach, for
example:
 Is it of appropriate type (categorical (ie nominal) or continuous (interval))?
 Is the data approximately normal?
Attachment 1 Table (From Field, 2005 “Discovering Statistics Using SPSS”) provides one useful
decision tool to check selection against (most texts have such tables). Note that this one is
based around the data quality and doesn’t clearly separate parametric and non-parametric
statistics.
HELP ON SPSS
SPSS has an excellent and extensive help system. You can access this either by
pressing “Help” on the top menu, eg
 HELP

TOPICS box – enter Crosstabs
 RETURN
You will then see a range of commands you can select to see more information on
OR
For more specific details access help within any “analyse” option.
For example
 ANALYSE
o DESCRIPTIVE STATISTICS
 CROSSTABS
o HELP
 SHOW ME
This will take you through a mini presentation on using Crosstabs, the data needs, the
process etc.
4
SPSS Workshop Notes
ANALYSING SUBGROUPS – THE “SPLIT” AND “MEANS” COMMANDS
(you may wish to come back to these after you have started your analysis)
Sometimes you will want to get the same analysis for different subgroups – eg you
may want to mean values for each gender or age class, you may want to do a
separate crosstabs for males and females.
SPLIT – this allows you to split the data set:
 DATA
o SPLIT FILE
 COMPARE GROUPS
 Move Gender into “Groups based on” box”
o ENTER
A Frequency by Q1 Easy to learn will now give separate results by gender:
 ANALYSE
 DESCRIPTIVE STATISTICS
 FREQUENCIES
o Add Q1Easy to learn
To “undo” this setting go back in and click on “ANALYSE ALL CASES”.
If you want to get the means of subgroups use the MEANS function, eg lets compare
males and female responses on Q1Easy to Learn:
 ANALYSE
o COMPARE MEANS
 Move Q1EasytoLearn into the Dependent variable box
 Move Gender into the Independent variable box
 RETURN
This provides the mean values for males and females on this variable separately.
COMPUTING A VARIABLE (eg for use in a SCALE)
One example of use:
We will assume that the Scale provides a reasonable measure of attitudes to maths (if
you have time you could do a “Reliability” analysis of the Scale, also a “Factor
Analysis” of the items – see summary steps below).
To calculate the Scale value all items must be the same “direction” – ie a low value (12) shows low level of comfort with maths. Look at the Scale items and check you
agree that agreeing with Items 1/2/6 &8 DOES NOT reflect this. Thus we use the last
4 variables where these responses have been reversed (using RECODE).
We want to COMPUTE the sum, for all respondents, on the Scale:
 Select TRANSFORM on the top menu
o COMPUTE variable
 Type in “ScaleSCORE” in Target Variable – this is your variable name
 Then, variable by variable add the elements of the scale by
highlighting, clicking the arrow, and making an equation, ie:
Q1REV + Q2REV +Q3NoMathsMind +Q4HardToGet + Q5*** + Q6REV + Q7*** + Q8REV +
Q9*** + Q10*** +Q11***
 When equation complete Press OK
Note I am using ***
to represent the
remainder of the
variable name
5
SPSS Workshop Notes


Look in the Data View/Variable View to see a new variable at the end
of your list
This is now a new variable at the end which you can include in
analysis. Note you can move the order of a variable by just
highlighting and moving while in the variable view.
RECODING a variable
You may now want to RECODE the new variable into a more manageable dimension, eg a 1-5
variable rather than an approx. interval value ranging from 11-55. This means that, for
example, in comparing this value through CROSSTABS you manage the cell size and thus
potential for statistical significance
 Select TRANSFORM on top menu
o RECODE into Different values
 Select variable ScaleSCORE and move into input variable box
 Name output variable ScaleSCOREsht and press CHANGE
 Click OLD AND NEW VALUES
 In LH Old values, click “lowest thru” and enter “19”
 Enter “1” in new value box, and click ADD to include in “Old to New”
definition, Click CONTINUE
 Now go back and enter other values in the Range box for old values:
 20 to 28 new value 2
 29 to 37 new value 3
 38 to 46 new value 4
 47 to 55 new value 5
 Press OK - you now have a new variables with values of 1-5 which
can, for example, be easily cross tabulated against gender.
I am recoding into
Different values to
retain the original
value – you can
recode on top of the
original variable but
this means you lose
the original variable
You are recoding
into 5 equal sized
groups
11-19
20-28
29-37
38-46
47-55
=1
=2
=3
=4
=5
ANALYSES
Doing simple analyses or “Playing with the dataset”
Note you can go into the top toolbar from either Variable or Data View
Describing the data – single variable:
 Click on ANALYSE
o DESCRIPTIVE STATISTICS
 DESCRIPTIVES
 Highlight variables (eg Gender, Age School) click on arrow to
transfer to box
 Click on Q1EasyToLearn, click arrow
 Click on the Options Box and select some descriptive eg,
Mean, Min, Max, Variance, Skewness
 CONTINUE
 OK
6
SPSS Workshop Notes
Now you can click on the Output Box on the bottom bar (it will have appeared) and view your
results. This will collect all the requests you make and the results provided.
 ANALYSE
o DESCRIPTIVE STATISTICS
 FREQUENCIES
 Transfer variables eg Age, Q1EasyToLearn
 Select some Statistics, press continue
 Click on Charts – select Bar Chart
 CONTINUE
 OK
 View Output – this will be at the bottom
We also can just request the means of any variables, however, this is looking at the means of
subsets (see above in Tools):
Describing the Data – 2 variables:
Lets look at the relationship between Age and Gender, and Gender and an attitude response.
This provides a 2-way frequency presentation. It also provides the capacity to test statistically
for a pattern between the 2 variables.
 ANALYSE
o DESCRIPTIVES
 CROSSTABS
 Place Gender into the Columns box
 Place Age in the Rows box
 Click on EXACT – select the 3rd dot – Exact
o CONTINUE
 Click on STATISTICS
o Select CHI-SQUARED
 CONTINUE
 Click on Cells
o On first box select Count
o Next box select rows and also columns
o CONTINUE
o OK
 Analyse
o DESCRIPTIVES
 CROSSTABS
 Place Gender into the Columns box
 Place Q1EasytoLearn in the Rows box
 Click on Exact – select the 3rd dot – Exact
o CONTINUE
 Click on STATISTICS
o Select CHI-SQUARED
 CONTINUE
 Click on Cells
o On first box select Count
o Next box select rows and also columns
o CONTINUE
o OK
Note: you can test
whether males and
females have the
same pattern of
attitude on Q1.
Note – in using
Crosstabs, SPSS will
give a warning if its
calculated expected
cell values are <5. If
you have gender (2
cols) x Attitude (5
rows) you have 10
cells. You would
need at least n=50 to
satisfy this
requirement.
You may want to
recode variables to
reduce the number
of cells, eg coded
into 10yr categories
or even 3 (eg <35,
35-55, >55). A scale
might be recoded to
a 1-5 or 1-3 range.
7
SPSS Workshop Notes
Now go to the output file and look at the results
This provides an initial box summarising the data (including valid cases)
The 2nd box gives you the 2 way frequency table (in numbers and row and column %)
The 3rd box then gives the statistics – for the Chi-Squared test.
Of interest is the Chi-Sq value, the Degrees of Freedom (df) and it also provides the
exact significance – if this is less than 0.05 we have a significant difference in pattern.
Are the results for Age by Gender significant?
Are the results for Gender by Q1 significant ?
Do any of the cells have very small numbers of observations? See Note.
Degree freedom –
for a Chi Sq this is
(rows-1)(cols-1)
Thus a 2*2 df=1
A 5*2 has df
(5-1)(2-1)=4
Calculating some simple parametric statistics
Simple Statistics – parametric – we will assume that the ScaleSCORE is appropriately normal Parametric statistics –
(look at a histogram to check this, also Skew and Kurtosis which should be close to 0)
based on a
distribution.
A T-TEST to see if the mean score is different for males and females. Thus an
independent t-test as we are comparing the means of 2 groups not, for example,
This refers to data that
before/after scores in all participants (as in a paired t-test)
we assume comes
from a distribution eg
 ANALYSE
a normal distribution.
o COMPARE MEANS
Note that with large
 INDEPENDENT SAMPLES T-TEST
sample size you can
generally assume
 Select “test” variables ScaleSCORE
approximate
 Select gender as a “grouping variable”
 Click on “define groups” and enter 1,2 in box (ie male is normality.
scored as 2, females 2)
If assumptions are not
 OK
OK then use non Look at results - note options of equal variances, and
parametric tests
whether there is a significant difference.
We also can check if there for significance on a measure such as ScaleSCORE in Classes 1-3
 ANALYSE
o
COMPARE MEANS
 ONE-WAY ANOVA
 Select “test” variables ScaleSCORE
 Select school as a “factor”
 You also could request a post-hoc test (eg Bonfiorri) to
see, if there was a difference between classes, which
ones are significantly different to others.
 Continue
 OK
Check Output
Independent test if 2
groups are being
compared
(males/females). A
paired T-TEST is used
when the for example
a PRE/POST test is
administered with the
same people doing
both tests.
8
SPSS Workshop Notes
It may also be interesting to compare how 2 interval variables interact together – using a
Pearson’s Correlation
 ANALYSE
o
CORRELATE
 BIVARIATE
 Add 2 Interval variables (you can use 2 of the attitude
statements, though not really Interval, to see how it
works)
 Click Pearson’s
 Select 2 sided test (ie not suggesting 1 greater or less
than the other)
 OK
 See results in output file
To Graph to data:
 GRAPHS
o CHART BUILDER
 OK (to define chart)
 Double click on “Simple Scatter”
o Highlight and drag the variables into the X and Y
axis label boxes
o Press OK
If you are going to do a
correlation it is useful
to look at the patterns
visually.
A correlation will be
between -1 and +1
The size and sign are
important.
If the correlation is +ve
then both variables
increase/decrease
together. If its –ve
one increases as the
other decreases.
The number says how
strong the correlation
is eg
Weak +- .3
Med
+- .5
Strong +- .8
The sample size has a
major impact on
significance.
Additional analysis – RELIABILITY of a SCALE
 ANALYSE
o SCALE
 RELIABILITY ANALYSIS
 Include variables – Q1-Q11 and Q1,2,6,8 REVERSE
 Select Statistics
 Select Item, Scale if Item deleted and correlations
o Continue
 Select Model
o ALPHA
 CONTINUE
o OK
o Look at results, in particular alpha
o You may want to delete Q1***, Q2***, Q6***, Q8***
o Rerun and look at output
9
SPSS Workshop Notes
Are there any identifiable sub-factors?
 ANALYSE
o DIMENSION REDUCTION
 FACTOR ANALYSIS
 Include variables – Q1-Q11 (remove Q1, Q2, Q6, Q8) and Q1,2,6,8
REVERSE
 ROTATION – select varimax method (see help for other options)
 CONTINUE
 OK
 Look at results
o Look at the level of variation explained
o How many factors (with eigenvectors >1)
o How many factors >1
o Which variables load on which factors (eg use a weight of
.6 as a cutoff)
Linking to other applications
Making a graph in Excel / transferring to PowerPoint
o
o
Select a simple Crosstabs output table (eg Gender by Q1)
Copy/paste to Excel
Make a table from this (you may need to copy headings, columns separately to
make table)
o
o
o
o
o
Note that you
can also edit
your graph/data
in PowerPoint
Highlight Table
Click CHART WIZARD (bar graph icon on toolbar), follow steps
You can edit table (right click on features - scale, fill, lines)
Right Click on Chart, select copy
Go to PowerPoint slide (on bottom files), right click Paste
10
SPSS Workshop Notes
Download