Lecture 1: Theory, variables, and datasets

advertisement
SPS 580 Lecture 1
Theories, Variables, Data Sets
I. Public Policy and quantitative research methods
Role of research methods – to empirically test an idea to see if it is right
A. Idea = belief, speculation, prediction about how the social world works
1. Younger people have more interest in recycling
2. People from the suburbs oppose permits to use Chicago beaches
3. Poor people are more likely to use public transit
B. Theory = a predicted causal relationship among variables
To make an idea a theory you have to have an operational definition of:
1. the variables
2. the relationships
3. the unit of analysis
II.
Variable = a measurement process that sorts individuals (UNITS) into mutually
exclusive and exhaustive categories. Parts of a variable . . .
Interest in Recycling
1 Great Deal
3,314
2 Some
2,003
3 Hardly Any
400
4 None
377
8 Don't Know
41
6,135
Total
III.
Frequency
name . . . Interest in recycling
verbal definition . . . answer to the question: How much
interest do you have in recycling, a great deal, some, hardly
any, none?
set of 2+ categories : Great Deal Some Hardly any None
procedure for sorting  measuring, coding rule 
Survey
Relationship = statement of what causes what
1. Verbal . . .
Age
causes
Interest in recycling
2. Causal diagram . . .
Age
Interest in recycling
3. WARNING: theories are always expressed in terms of variables
4. Ideas are frequently expressed in terms of categories of variables – young people like
recycling
5. Abstractly
X
The independent variable
causes
causes
Y
The dependent variable
6. Causal ordering
i. The variable that causes the change is the independent variable = X
ii. The variable for which change is being caused is the dependent variable
iii. If you change X do you change Y; or vice versa ????
IV.
Identify the variables and the causal relationship in the following . . .
People from the suburbs oppose permits to use Chicago beaches
Place of residence
causes
Opinion on beach permits
Poor people are more likely to use public transit
Income
causes
Transportation behavior
1
SPS 580 Lecture 1
V.
Theories, Variables, Data Sets
Unit of analysis = the group the prediction is about
In this class until further notice all the theories will be about people at one point in
time, all if the underlying ideas have the word “people” as the subject
1. Younger people have more interest in recycling
2. People from the suburbs oppose permits to use Chicago beaches
3. Poor people are more likely to use public transit
Toward the end of the class we’ll get into more complex problems
4. Renters have a higher housing cost burden
Unit = ??? household
5. Poor neighborhoods have less health insurance
Unit = Community area
6. Wealthy villages get more microfinance loans
Unit = Village
7. American is becoming more in favor of environmental spending
i. One of the variables is time
VI.
Identifying data to test theories –
A. Pick (or create) a Data Set – MCIC metro survey
B. To identify variables in the MCIC Metro Survey, go to . . .
The MCIC Metro Survey Summary Report 1991 – 2002 -- Download from d2L
Sample pages from the MCIC Metro Survey Summary Report . . .
We want the variable . . . Interest in recycling
In the bookmarks go to Table 2 . . .
2
SPS 580 Lecture 1
Theories, Variables, Data Sets
SPSS variable
name
Variables of
interest
Categories of response
(not necessarily all
available)
Total
for
Chicago
region
Crosstabs for Chicago region, by
location, race/eth and income
C. To locate exact question wordings, go to . . .
MCIC Metro Survey Combined Codebook -- Download from d2L
Search on SPSS variable name, or words in question
SPSS
name
Wording
Skip
pattern?
All
categories
for response
Summary data for years question was asked
3
SPS 580 Lecture 1
VII.
Theories, Variables, Data Sets
Finding data to test theories -- STEPS IN THE PROCESS
So far, have identified the data set and the variables we want (STEP 1 . . .)
Steps 2-4 have to do with accessing the data from the data set
Steps 5-6 have to do with taking the output from the data set and creating a finished product
1. Survey data
2. Data File
MCIC report,
MCIC codebook
SPSS Data
File
3. Programming
Commands
4. Software
Output
5. Post
Processing
6. Presentation
Quality Output
SPSS Syntax File
SPSS Output
file
Excel table,
chart, graph
Word document
Step 2 . . . SPSS Data File . . .
1. The SPSS Data File contains all the survey data. It has been laboriously and
painstakingly created for you so that you can concentrate on the fun stuff. Other data files
we might be getting in to . . . General Social Survey, ACS, Indian Microfinance Data
base
2. Download the SPSS Data File MCIC METRO SURVEY 1991 2002 (from d2L, or
from your flash drive). To create a working copy, do FILE/SAVE AS/<NAME> into
My Documents.
3. Go to the saved Data File; CLICK on the FILE to open it with the SPSS program. (This is
what you do on the computer in my office you we’ll test in LAB what you need to do
there).
4. The File will open. It will have a Variable View which lists all the variables in
alphabetical order, and a Data View which shows all the data for all the variables for all
the cases in the survey. You can Search for Text in the Variable View to find a specific
variable of interest. E.g. i_recy
a. The Label column can give you helpful information Interest in Recycling
5. The Values column can give you helpful information. All values on the data set
1 Great Deal 2 Some 3 Hardly Any 4 None as well as 7 REF 8 DK 9 NA
(note: 7, 8, 9 are potentially problematic)
6. But mostly we don’t do anything directly with the data file. Instead we work with . . .
Step 3 . . . SPSS Syntax File
1. The SPSS Syntax File keeps track of all the commands you give to the SPSS program to
prepare and analyze your data. The SPSS SYNTAX FILE is cumulative – it keeps a
record of all the commands you have given. This is helpful when you are working with
the same variables over and over. It is like a WORD file – you can add lines to it from
other files, delete lines that are mistakes or no longer needed. You can COPY and
PASTE from WORD or from Excel.
2. The first time ever you give an SPSS command using this data set – e.g.
ANALYZE/DESCRIPTIVE STATISTICS/FREQUENCIES/ i_recy click PASTE and
the SPSS program will automatically open an SPSS Syntax File for you with the
command in it . . . e.g.,
FREQUENCIES VARIABLES=i_recy
/ORDER=ANALYSIS.
4
SPS 580 Lecture 1
Theories, Variables, Data Sets
3. This SPSS Syntax File has no name attached to it. So, the first time ever you give an
SPSS command using this data set use SAVE AS <NAME> to save a copy of this file
to your flash drive. This is your SPSS Syntax File that goes with this data set. After the
first time you open the SPSS Data File, you will then use
SPSS/FILE/OPEN/SYNTAX/<NAME> to open your SPSS Syntax File. During the
semester I will be providing SPSS Syntax commands to illustrate how to do specific
things I want you to do, this will be in the form of an SPSS Syntax File called mcic -download it frequently from d2L
4. The way you execute a command in SPSS is to highlight the lines in the Syntax File that
you want to execute and then RIGHT CLICK/RUN CURRENT
Step 4 . . . SPSS Output File
1. The SPSS Output File is where the SPSS program writes the tables and text that result
from the commands you give to the program to execute during this SPSS session. You
get a different Output File each SPSS session.
2. The first time you are in the Syntax File and RIGHT CLICK/RUN CURRENT the SPSS
program automatically opens the Output File. The computer will jump to the Output File.
3. On the right hand side of the Output File you will see a listing of the commands from the
Syntax File you executed, followed by the output the SPSS program generates (assuming
the commands are in correct syntax – otherwise you will get error messages that are even
less helpful that Microsoft error messages).
FREQUENCIES VARIABLES=i_recy
/ORDER=ANALYSIS.
Frequencies
The command you executed
The beginning of the output
[DataSet1] C:\Documents and Settings\DTAYLOR8\My Documents\AA DePaul DGT courses and
lectures\SPS 580 Statistics\CCC Data sets and other uploads\MCIC METRO SURVEY 1991
2002.sav
The data set the output is from (duh!)
Statistics
i_recy Interest in Recycling
N
Valid
Missing
The number of valid and missing cases for the
command from the original data set
6135
30555
5
SPS 580 Lecture 1
Variable name
Theories, Variables, Data Sets
Variable label
i_recy Interest in Recycling
Cumulative
Frequency
Valid
Valid Percent
Percent
1 Great Deal
3314
9.0
54.0
54.0
2 Some
2003
5.5
32.6
86.7
3 Hardly Any
400
1.1
6.5
93.2
4 None
377
1.0
6.1
99.3
41
.1
.7
100.0
6135
16.7
100.0
7 Refused
3
.0
9 No Answer
8
.0
System
30544
83.2
Total
30555
83.3
36690
100.0
8 Don't Know
Total
Missing
Percent
Total
categories
counts
Other (sometimes) helpful information
4. On the left hand side of the Output File you will see a metadata-type summary of the
contents of the right hand side. You can point and click on the left hand side to jump
around the output on the right hand side. You can delete no-longer-useful output by
RIGHT CLICK/CUT on either the left hand side metadata or on the right hand side
output.
5. If you want to look at your output later, you can SAVE the SPSS Output File with
FILE/SAVE AS/<NAME> It’s a good idea to do this at the beginning of the semester
until you are comfortable with how the STEPS IN THE PROCESS work, or if you are
having a problem and need to show your work to the tutor.
6. But the main reason we use the SPSS Output File is for . . .
Step 5 . . . Post Processing with Excel
1. This is where you take the SPSS Output and create tables, charts and graphs in Excel
according to Rules for Presentation Quality.
2. Mouse over the part of the SPSS Output File you want, RIGHT CLICK/COPY specific
piece of output (table, list, etc.) from the Output File and then go to Excel and RIGHT
CLICK/PASTE into an Excel File for Post Processing.
6
SPS 580 Lecture 1
Theories, Variables, Data Sets
3. When you PASTE to EXCEL you will get exactly the same information as above, but
now in rows and columns of a spreadsheet. Something like this . . .
Interest in Recycling
Valid
Frequency
1 Great Deal
3,314
2 Some
2,003
3 Hardly Any
400
4 None
377
41
8 Don't Know
6,135
Total
Missing
7 Refused
3
9 No Answer
8
System
30,544
Total
30,555
36,690
Total
4. From this you need to make a presentation quality percentage table and a presentation
quality column chart. Also, when we calculate percents for a table or chart, we will
always EXCLUDE the DK, NA and REFUSED from the case base for the
calculations. This means you need to turn the table above into something (exactly) like
this presentation quality univariate percentage table . . .
Interest in
Recycling
Categories
Great Deal
54%
Some
33%
Hardly Any
7%
None
6%
Variable name
Total to show these are
meant to add to 100%
100%
5. Every percentage table in this class will also be accompanied by a chart or graph that
visually shows the same information. To use Excel to make a univariate column chart,
highlight this part of the univariate percentage table . . .
Interest in
Recycling
Great Deal
54%
Some
33%
Hardly Any
7%
None
6%
I.e., don’t include the total
7
SPS 580 Lecture 1
Theories, Variables, Data Sets
6. And INSERT/COLUMN/2D COLUMN you will get this . . .
Interest in Recycling
60%
50%
40%
30%
Interest in Recycling
20%
10%
0%
Great Deal
Some
Hardly Any
None
Which is made into a Presentation Quality Univariate Column Chart by editing in Excel
so it looks like this . . .
Interest in Recycling
54%
33%
Great Deal
Some
7%
6%
Hardly Any
None
8
Remove Legend
Remove grid lines
Remove Y Axis
Remove X Axis Tick Marks
Add data labels, no decimals
Make smaller
SPS 580 Lecture 1
Theories, Variables, Data Sets
D. For the AGE variable, go to . . . MCIC Metro Survey Summary Report 1991 – 2002
This gives you the SPSS variable name
You can probably go directly to the SPSS Data File, but for some variables in the MCIC survey
there is additional useful information in the MCIC Metro Survey Combined Codebook . . .
Question wording
Explanatory text on
how it was asked
and coded
Summary data –
shows this variable
was asked in all years
9
SPS 580 Lecture 1
Theories, Variables, Data Sets
E. Armed with the variable name, go to SPSS and ask for the Frequencies, you will get. . .
(THE COMMAND) 
age01 Age Of Respondent
Frequency
Valid
14
1
16
5
17
5
18
44
19
274
20
274
21
342
FREQUENCIES VARIABLES=age01
/ORDER=ANALYSIS.
This isn’t fun because . . .
There are too many categories
Data in between for each individual year
96
3
97
2
98
2
And it includes DK in the counts
998 Do Not Know 23
35,874
Total
Missing
Total
997 Refused
736
999 No Answer
10
System
70
Total
816
36,690
F. Use SPSS to RECODE the age01 variable into fewer categories . . .
You decide how many new categories you want, and how to group them together,
A new, smaller set of mutually exclusive and exhaustive categories
then go to TRANSFORM/ RECODE INTO DIFFERENT VARIABLES/
Choose age01 as INPUT VARIABLE
Make up OUTPUT VARIABLE NAME . . . AGE4
ADD A LABEL
Go to OLD AND NEW VALUES
Map old values into the new values – mine are below – note: always keep control of your
“missing value” category
CONTINUE
CHANGE
PASTE  this will write the instructions to the syntax file so you can see, save, modify
RECODE age01 (10 thru 30=1) (31 thru 45=2) (46 thru 64=3) (65 thru 98=4) (ELSE=9) INTO AGE4.
VARIABLE LABELS AGE4 'age4 '.
I added these commands to the SPSS Syntax File . . .
VALUE LABELS age4 1 'under 30' 2 '31-45' 3 '46-64' 4 '65+' 9 'not valid'.
MISSING VALUES age4 (9) .
10
SPS 580 Lecture 1
Theories, Variables, Data Sets
G. Then in the SPSS Syntax File use the mouse to highlight the commands that RECODE
age01 into AGE4 and then RIGHT CLICK/ RUN CURRENT
RECODE age01 (10 thru 30=1) (31 thru 45=2) (46 thru 64=3) (65 thru 98=4) (ELSE=9) INTO AGE4.
VARIABLE LABELS AGE4 'age4 '.
VALUE LABELS age4 1 'under 30' 2 '31-45' 3 '46-64' 4 '65+' 9 'not valid'.
MISSING VALUES age4 (9) .
If you did it right SPSS will say TRANSFORMATIONS PENDING
If you didn’t do it right, SPSS will write a completely useless error message to the Output File
 You need to fix the error, by studying what you are supposed to be doing
H. Get the frequencies for the revised variable AGE4
FREQUENCIES VARIABLES=AGE4
/ORDER=ANALYSIS.
The key part of the SPSS Output will look like this . . .
AGE4 age4
Cumulative
Frequency
Valid
1.00 under 30
Valid Percent
Percent
7370
20.1
20.6
20.6
2.00 31-45
13574
37.0
37.9
58.4
3.00 46-64
9968
27.2
27.8
86.2
4.00 65+
4939
13.5
13.8
100.0
35851
97.7
100.0
839
2.3
36690
100.0
Total
Missing
Percent
9.00 not valid
Total
I. After post-processing, the presentation quality work product should look like this
Age
Under 30
21%
31-45
38%
46-64
28%
65+
14%
Age
38%
28%
21%
14%
100%
Under 30
11
31-45
46-64
65+
SPS 580 Lecture 1
Theories, Variables, Data Sets
ASSIGNMENT : Write a report (3 pages maximum) that proposes three ideas about how public
policy works that you are going to test with MCIC Metro Survey data.
Write the paper in flowing English, for each idea explain the following. . .
1. What is the idea, what are the variables, what is the theory – i.e., a formal statement of the
predicted causal relationship among the variables
2. Draw the causal diagram, clearly labeling the independent, dependent variables
3. Use the MCIC Metro Survey Data to show how you will operationally test the theory, for
each variable . . .
a. What question from the survey is used (include wording)
b. Describe efficiently how the variable is coded, or recoded, for your proposed research
c. Using data from the MCIC Metro Survey, include a presentation quality univariate
percentage table and a presentation quality univariate column chart
4. For each dependent variable . . .
a. Comment on the findings from the univariate percentage table and/or column chart
and implications they might have for the test of your theory
12
Download