Exploring Check-All Questions: Frequencies, Multiple

advertisement
Exploring Check-All Questions: Frequencies, Multiple Response,
and Aggregation
Target Software & Version: SPSS v19
Last Updated on May 4, 2012
Created by Laura Atkins
Sometimes several responses or measurements are recorded for a single question. For example, there may
be questions in a questionnaire that will allow a respondent to select each of the responses as an answer,
as in the example below:
Q1. Where else, other than your home, do you use the internet? (Check all that apply).
Library
School
Workplace
Internet on a cell phone
Other
Many people commonly refer to these as ‘Check all that Apply’ or ‘Checklist’ questions. These questions
are often used when it is important to the researcher that the respondents consider each of the possibilities.
The respondent could use the internet at all of these places, none of them, or any combination of these
locations. Because all of these combinations are possible, it’s necessary to make sure that the data is
written in such a way that each category is written out as a separate variable in the dataset -- Essentially as
a yes/no answer. If five people were asked this question, the data would look something like this:
ID Q1_1
(library)
1
1
2
3
4
5
1
Q1_2
(school)
2
2
2
Q1_3
(workplace)
2
3
Q1_4 (internet on a cell
phone)
Q1_5
(other)
4
5
3
Once the data is recorded properly there are three ways to explore this variable in SPSS: running basic
Frequencies, using the Multiple Response command, or using the Aggregate command.
I. Frequencies
The frequencies procedure generates a frequency table illustrating how cases are distributed across the
values of a variable.
1. In the menu bar select:
Analyze> Descriptive Statistics>Frequencies
2. Place each variable (Q1_1 to Q1_5) into the Variables box. This would treat each variable as a
separate entity and would yield five separate summary tables:
II. Multiple Response
A simple, but limited and temporary approach is to use the Multiple Response option. This procedure
creates a single summary table of counts and percents based on several variables that contain responses to
one question. This would create one table that combines all five variables, rather than five separate tables.
1. First, make note of how the variables of interest are coded. For this example there are five categories
(1-5).
2. Next, instruct SPSS that a set of variables represents responses to a single question of interest. In the
menu bar, go to Analyze>Multiple Response>Define Variable Sets. To define a multiple response set in
SPSS we must specify the list of variables that make up the set, the type of coding used, and a name.
3. Using the arrow button, place variables Q1_1 through Q1_5 in the “Variables in Set” box.
4. Click “Categories” and add “1-5” for the range.
5. Give the new collapsed variable a name (ex. Where_Internet). Next, give the variable a label and click
“Add”. Notice that the set name now appears in the Multiple Response Sets list box. The $ prefix
distinguishes the set name from an ordinary SPSS variable name.
6. Click “Close”.
7. Return to Analyze>Multiple Response. You will now see that two options have been activated:
Frequencies and Crosstabs. Below is an example of frequency output for the Where_Internet variable:
$Where_Internet Frequencies
Responses
N Percent Percent of Cases
Where else do you use the internet? a library
school
Total
a.
2
20.0%
40.0%
4
40.0%
80.0%
workplace
2
20.0%
40.0%
internet on a cell phone
1
10.0%
20.0%
other
1
10.0%
20.0%
10 100.0%
200.0%
Group
A single table was created based on responses to the five variables (Q1_1 to Q1_5). The N column
indicates how many respondents mentioned each location. School is the most commonly mentioned
location. The Percent of Responses column indicates what percentage of the total number of locations
mentioned is contained in each category. The Percent of Cases indicates what percentage of respondents
used locations of each given type. The column for interpretation is Percent of Cases. The reason that it is
possible to have over 100% is because each respondent can select more than one category. Theoretically,
if everyone selected all categories this percentage would be equal to 500%. Note that the multiple
response set that was created will remain active until a different data file is opened or you exit SPSS. One
limitation of with this method of exploring these variables is that the chi-square test of statistical
significance as well as graphs cannot be obtained using Multiple Response.
III. Aggregate
A third option for exploring this data is to create a combination variable which would give all of the
unique combinations listed in the dataset. The Aggregate Data command aggregates groups of cases in
the active dataset into single cases and creates a new, aggregated file or creates new variables in the active
dataset that contain aggregated data. Cases are aggregated based on the value of one or more break
(grouping) variables. To do this will take some effort and requires more advanced skills. However, if you
feel that you will be using this variable for statistical tests or beyond a single time period, this could be
the right decision. This option will allow you to save your work and use the variable for later analysis.
The steps are outlined below:
1. Data>Aggregate. Place Q1_1 to Q1_5 in the “Break Variables” box to find unique combinations.
Break Variable(s) are cases which are grouped together based on the values of the break variables. Each
unique combination of break variable values defines a group. When creating a new, aggregated data file,
all break variables are saved in the new file with their existing names and information.
2. Check the “Number of cases” box. This will activate the N_Break command. The N_Break command
tells SPSS by which variable to collapse the data.
3. Check the “Create a new dataset” button and give the new dataset a name. This will specify a new file
into which the aggregated data will be placed. When finished, click OK.
4. In Data View of the new dataset, each row represents a unique response. The cases are represented as
N_Break. Sorting N_break in descending order would show where the most common responses fall.
Additionally, if there were rows with all missing values, these could be deleted because they do not need
to be assigned a unique identifier, which is the next step.
5. To assign a unique identifier to each row, go to: Transform>Compute Variable
Assign a name to the Target Variable (ex.“Values”). In the Function Group box, select “All” and place
$CASENUM into the Numeric Expression box by double-clicking it or using the arrow button.
$Casenum assigns a unique identifier to each row/unique combination of responses.
In Data View, you will now see a new column of unique values for each row/unique combination of
responses:
Note that when sorting N_break in descending order, the unique identifier would be structured such that
the most common response =1, the second most common response =2, etc.
6. The next step is to merge the aggregated data file into the original file. It is always necessary to sort
cases in both files before merging. In both files, go to:
Data>Sort Cases> Sort by: Q1_1 through Q1_5
7. In the original dataset, go to Data>Merge Files> Add Variables. Highlight the open dataset you wish
to merge and click Continue.
8. In the Add Variables dialog box, Select “Match cases on key variables in sorted files” and select
“Non-active dataset is keyed table”. Add Q1_1 through Q1_5 to “Key Variables” box. Add N-Break
to “Excluded Variable” box. Click OK. You will get a warning that the merge will not work if the cases
were not sorted first. Click OK.
9. The new variable “Values” can now be seen in Variable View of the original dataset. Change the
decimals to “0”, and add the missing values.
10. The final step is to add values and labels accordingly, by referencing the other data file that shows
unique responses.
Again, the Aggregate command is a useful option if you feel that you will be using the variable for
statistical tests or beyond a single time period. Just save the changes made to your original dataset, and
the new variable will be saved in your dataset.
Download