SPSS course. Exercises

advertisement
UCL
EDUCATION & INFORMATION SUPPORT DIVISION
INFORMATION SYSTEMS
SPSS v12
Using SPSS
Exercises
Document No. IS-077 v4
UCL Information Systems
1
SPSS course. Exercises
Contents
SPSS course exercises ........................................................................................................................................... 3
Exercise 1: Defining variables and entering data.............................................................................................. 3
Exercise 3: Missing data ....................................................................................................................................... 4
Exercise 4: Importing an Excel file ................................................................................................................... 4
Exercise 5: Sort Cases and Select Cases ............................................................................................................. 4
Exercise 6: Recoding variables ............................................................................................................................ 5
Exercise 7: Computing variables ......................................................................................................................... 5
Exercise 8: Creating and saving output .............................................................................................................. 5
Exercise 9: Frequencies command ..................................................................................................................... 6
Exercise 10: Descriptives command .................................................................................................................. 6
Exercise 11: Crosstabulations command ........................................................................................................... 6
Exercise 12: Means command ............................................................................................................................. 7
Exercise 13: T Tests .............................................................................................................................................. 7
Exercise 14: Correlation ....................................................................................................................................... 7
Exercise 15: Regression ........................................................................................................................................ 8
Exercise 16: Graphical plots ................................................................................................................................ 8
UCL Information Systems
1
SPSS course. Exercises
SPSS course. Exercises
2
UCL Information Systems
SPSS course exercises
The following exercises form part of the Using SPSS course. Refer to the relevant section in the course
notes before performing each exercise.
Location of data files: r:\training.dir\spss
Please save/load all your work in this folder during the training session
Exercise 1: Defining variables and entering data
You are responsible for collecting data from a clinical trial of Drug X. Drug X is postulated to affect
blood levels of a certain hormone (hormone H), so levels of the hormone will be measured before and
after treatment with X. In addition to the hormone data, five other pieces of information will be
collected from each participant in the trial.
1. Switch to the Variable View and define the seven variables listed in the table below. Use numeric
variables in SPSS for categorical data.
Variable name
Surname
Gender
Age
Income
Smoker
Hbefore
Hafter
Data type
Categorical (categories are Female and Male)
Continuous
Continuous
Categorical (categories are Smoker and Non-smoker)
Continuous
Continuous
2. The table below shows the data which you have collected from five patients who took part in a
pilot study for the clinical trial. Switch to the Data Editor and enter the data on these five
patients. Remember to use numeric codes where necessary.
Surname
ROBBINS
Gender
Female
Age
32
Income
46000
MCGREGO
R
KUMAR
ALLINSONHENRY
OLDER
Male
33
58000
Male
Female
38
51
47000
55000
Male
44
28000
Smoker
Nonsmoker
Nonsmoker
Smoker
Nonsmoker
Nonsmoker
Hbefore
94.58
Hafter
88.79
106.12
78.25
88.11
83.62
102.45
63.82
72.31
77.50
Save the data as pilotgroup.sav.
Exercise 2: Variable and value labels
Open the data file medicaltrialX.sav in SPSS.
1. Create the following labels for the variables:
income
Household income
smoker
UCL Information Systems
Smoker or non-smoker
3
SPSS course. Exercises
hbefore
Blood levels of H before treatment
hafter
Blood levels of H after treatment
2. Create the following value labels for the gender variable:
1 = Female
2 = Male
3. Save the file with the new definitions.
Exercise 3: Missing data
Some of the data in medicaltrialX.sav contains missing values.
1. Inspect the data in your data sheet and spot any missing values. Take note of which variables
have missing values.
2. For each of the variables identified above, decide on an appropriate coding for missing values.
For example, for a numeric variable which cannot be negative, -1 might be used. For text data, an
X might be used.
3. In Variable View, specify missing values for each variable identified in paragraph 1. above.
4. After specifying the missing values, return to the Data Editor and change any blank cells to the
appropriate missing value code.
5. Run the Frequencies command which shows that the missing values have been coded. (Hint: Use
the menu option Analyse | Descriptive statistics | Frequencies, move all the variables into the righthand box, then click OK.)
6. Close the output window (do not save the output). Save the changed data file, giving it the new
name fixed.sav .
Exercise 4: Importing an Excel file
1. Start a new, blank data sheet.
2. Import the file results.xls into SPSS, using the column headings as variable names.
3. Switch to Variable View. Add a new variable called SEN, and give it the label “Special
Educational Needs”.
4. The SEX variable (as in an earlier exercise) uses values of 1 and 2 to indicate “Female” and
“Male” respectively. Enter the value labels for the SEX variable.
5. Save the file to results2.sav.
Exercise 5: Sort Cases and Select Cases
Load the data file medicaltrialX.sav.
1. Sort the data in order of age (oldest first).
2. Sort the file in order of smoker within gender (i.e. gender is the primary ordering).
3. Select all the males in the group.
4. Select “all cases” again.
5. Now select the patients whose hormone levels were greater after the treatment than they were
before.
SPSS course. Exercises
4
UCL Information Systems
There is no need to save the data file at this point.
If you have extra time, see if you can select all the smokers.
Exercise 6: Recoding variables
Helpful hint:
If you still have a subset of the cases selected (from the previous exercise), make sure you select all cases
before you proceed.
We need to recode some of the data in medicaltrialX.sav (or fixed.sav). The information in the smoker
column is coded as text (Y and N). It would be better to code it as numeric data (e.g. 1 and 0), and to use
labels to indicate the meaning of these numbers.
1. Use Recode to convert the smoker information into a new variable called smoker1, so that 'Y'
becomes 1 and 'N' becomes 0. (Define the label for this new variable as "Smoker or nonsmoker")
2. Create value labels for the smoker1 variable so that 1 is displayed as 'Smoker', and 0 is displayed as
'Non-smoker'.
3. Check that the recode has worked properly. If it has, then delete the old variable smoker from
your data sheet.
4. Using Recode, create a new variable incband (with label "Income band") to categorise the
household income: up to $25,000 as band 1, between $25,000 and $40,000 as band 2, and more
than $40,000 as band 3.
5. Save the data file if you are happy with the results of this exercise.
Exercise 7: Computing variables
Open the data file results.sav created in Exercise 2. This shows the exam scores for a class of high-school
students.
1. Each student has a percentage mark for Maths, English, and History. Compute a new variable,
named total, to calculate their total score out of 300.
2. We wish to compute the average (mean) mark over the three tests for each student. Compute a
new variable, average, calculating this information.
3. Save the file.
Exercise 8: Creating and saving output
This exercise uses the file results.sav created in Exercise 2.
1. Use the Case Summaries command to produce a listing of your data.
2. Save the SPSS output in a file called result1.spo.
3. Produce a printout of the SPSS Viewer window. (Or use Print preview if you don't have access to a
printer.)
UCL Information Systems
5
SPSS course. Exercises
Exercise 9: Frequencies command
This exercise uses the medicaltrialX data, and the incband variable as calculated earlier. If you do not have
this, you can load the data file medicaltrialX-part2.sav.
1. Create a Frequencies command with a bar chart to find the following:
 The numbers within each of the three incband classifications.
 How many subjects are male, and how many female.
2. Create a Frequencies command for the two hormone level variables (i.e. hbefore and hafter).
Include the following in the output:
 Do not display the actual frequency tables.
 A histogram with a normal curve superimposed for each variable.
 Produce values for the mean, mode and median.
Objects from the SPSS Viewer can be copied and pasted into other applications (e.g. Microsoft Word or
Excel). If you have time, open a new document in Microsoft Word. Copy and paste one of the
histograms which you have just produced into the Word document. Save it with the title histogram.doc.
Do not save the SPSS Viewer output.
Exercise 10: Descriptives command
Use the medicaltrialX data you have been working on — or you may wish to load the data file
medicaltrialX-part2.sav.
1. Use the Descriptives command to display the default information for age and income.
 How old is the oldest participant?
 And the youngest?
 What is the average household income of the participants?
2. Produce a Descriptives command for the variables hbefore and hafter. This time use the Options to
include the skewness and range in the output.
 Which of the two measurements has the largest range of readings?
What is the range?
 Use the Help menu to find out what skewness means. Using this
information, which of the two measurements do you think is closest to
being normally distributed? Explain why.
Exercise 11: Crosstabulations command
This exercise uses the incband column as defined earlier. If you do not have this variable in your data, you
can load the data file medicaltrialX-part2.sav.
1. Run a Crosstabs command for the variables incband and gender, including the following
information:
 Each cell of the table should list the observed values, the expected
values, and the unstandardised residuals.
 Also run the chi-square test.
2. Run a second Crosstabs command for the same variables. This time do not run the chi-square
test, but include the following within each cell of the table:
SPSS course. Exercises
6
UCL Information Systems
 The row, column and total percentages.
Exercise 12: Means command
Use the medicaltrialX data you have been working on — or you may wish to load the data file
medicaltrialX-part2.sav.
1. Build a Means command where the dependent variables are the two sets of hormone level
measurements (i.e. hbefore and hafter) and the independent variable is incband. Answer the following
questions:
 Which group achieves the highest average hormone level, and is this
before or after treatment?
 Which group has the lowest mean hormone level before the treatment?
 Which group shows the most varied hormone levels before the
treatment? And after? (Hint: look at the standard deviations.)
2. Build a similar means command as the previous one. This time make gender the independent
variable. Answer the following question:
 Which gender showed the largest increase in blood hormone levels, on
average?
3. Build another Means command which will again analyse the results for each subject, but this time
looking at the results for each gender within each incband group. (Hint: enter one category variable
in the independent list and click on the Next button before inserting the second category
variable.)
 Which group of men show the highest mean level of H before
treatment? What is that mean level?
 In the highest income category, which gender shows the highest mean
level of H after treatment?
Exercise 13: T Tests
Use the medicaltrialX data you have been working on - or you may wish to load the data file medicaltrialXpart2.sav.
1. Perform a T Test to show whether there is a significant difference in the inital hormone levels
(hbefore) between men and women. Is there?
2. Perform a T Test across all the cases, to decide whether there is a significant difference in the
mean hormone concentrations before and after the treatment.
3. Now use the Select cases function to select only the women in the study, and repeat step (2).
What do you find?
If you have time, repeat the test, selecting men instead of women.
Exercise 14: Correlation
Before we test for correlations, we need to calculate a new variable. Refer to the Computing new data section in
the course workbook if you need to. Use the medicaltrialX data you have been working on — or you may wish
to load the data file medicaltrialX-part2.sav.
1. We need to calculate the change in blood levels of hormone H using the two values we already
have (the blood levels before and after). Create a new variable dh using the Compute function to
calculate the "after" level minus the "before" level.
UCL Information Systems
7
SPSS course. Exercises
Now to perform the correlations.
2. Measure the strength of association between income and age with a correlation coefficient and its
significance.
3. Measure the strength of association between age and dh with a correlation coefficient and its
significance.
4. Produce a scatter plot of dh against age (placing dh on the vertical axis), and:
 Include a title.
 Produce a fit line on the graph (Hint: Double-click on the chart to edit
it, then click on one of the data points to select the data points, and
from the menus choose Chart | Add chart element | Fit line at
total).
Exercise 15: Regression
Use the medicaltrialX data you have been working on — or you may wish to load the data file
medicaltrialX-part2.sav.
Perform a linear regression analysis, analysing the dependence of dh upon age (i.e. the same variables as in
the previous exercise), drawing some conclusions about the regression line produced.
Exercise 16: Graphical plots
Use the medicaltrialX data you have been working on — or you may wish to load the data file
medicaltrialX-part2.sav.
1. Produce a pie chart showing the number of people that fall into each of the three income bands.
 Include a title in the output.
 Produce labels for the frequency of each income band
and the percentage of the population.
2. Produce a clustered bar chart summarising the mean "before" and "after" measurements
separately, for men and for women (hint: gender is the category variable).
SPSS course. Exercises
8
UCL Information Systems
Download