SPS 580 Lecture 1 Theories, Variables, Data Sets I. Public Policy and quantitative research methods Role of research methods – to empirically test an idea to see if it is right A. Idea = belief, speculation, prediction about how the social world works 1. Younger people have more interest in recycling 2. People from the suburbs oppose permits to use Chicago beaches 3. Poor people are more likely to use public transit B. Theory = a predicted causal relationship among variables To make an idea a theory you have to have an operational definition of: 1. the variables 2. the relationships 3. the unit of analysis II. Variable = a measurement process that sorts individuals (UNITS) into mutually exclusive and exhaustive categories. Parts of a variable . . . Interest in Recycling 1 Great Deal 3,314 2 Some 2,003 3 Hardly Any 400 4 None 377 8 Don't Know 41 6,135 Total III. Frequency name . . . Interest in recycling verbal definition . . . answer to the question: How much interest do you have in recycling, a great deal, some, hardly any, none? set of 2+ categories : Great Deal Some Hardly any None procedure for sorting measuring, coding rule Survey Relationship = statement of what causes what 1. Verbal . . . Age causes Interest in recycling 2. Causal diagram . . . Age Interest in recycling 3. WARNING: theories are always expressed in terms of variables 4. Ideas are frequently expressed in terms of categories of variables – young people like recycling 5. Abstractly X The independent variable causes causes Y The dependent variable 6. Causal ordering i. The variable that causes the change is the independent variable = X ii. The variable for which change is being caused is the dependent variable iii. If you change X do you change Y; or vice versa ???? IV. Identify the variables and the causal relationship in the following . . . People from the suburbs oppose permits to use Chicago beaches Place of residence causes Opinion on beach permits Poor people are more likely to use public transit Income causes Transportation behavior 1 SPS 580 Lecture 1 V. Theories, Variables, Data Sets Unit of analysis = the group the prediction is about In this class until further notice all the theories will be about people at one point in time, all if the underlying ideas have the word “people” as the subject 1. Younger people have more interest in recycling 2. People from the suburbs oppose permits to use Chicago beaches 3. Poor people are more likely to use public transit Toward the end of the class we’ll get into more complex problems 4. Renters have a higher housing cost burden Unit = ??? household 5. Poor neighborhoods have less health insurance Unit = Community area 6. Wealthy villages get more microfinance loans Unit = Village 7. American is becoming more in favor of environmental spending i. One of the variables is time VI. Identifying data to test theories – A. Pick (or create) a Data Set – MCIC metro survey B. To identify variables in the MCIC Metro Survey, go to . . . The MCIC Metro Survey Summary Report 1991 – 2002 -- Download from d2L Sample pages from the MCIC Metro Survey Summary Report . . . We want the variable . . . Interest in recycling In the bookmarks go to Table 2 . . . 2 SPS 580 Lecture 1 Theories, Variables, Data Sets SPSS variable name Variables of interest Categories of response (not necessarily all available) Total for Chicago region Crosstabs for Chicago region, by location, race/eth and income C. To locate exact question wordings, go to . . . MCIC Metro Survey Combined Codebook -- Download from d2L Search on SPSS variable name, or words in question SPSS name Wording Skip pattern? All categories for response Summary data for years question was asked 3 SPS 580 Lecture 1 VII. Theories, Variables, Data Sets Finding data to test theories -- STEPS IN THE PROCESS So far, have identified the data set and the variables we want (STEP 1 . . .) Steps 2-4 have to do with accessing the data from the data set Steps 5-6 have to do with taking the output from the data set and creating a finished product 1. Survey data 2. Data File MCIC report, MCIC codebook SPSS Data File 3. Programming Commands 4. Software Output 5. Post Processing 6. Presentation Quality Output SPSS Syntax File SPSS Output file Excel table, chart, graph Word document Step 2 . . . SPSS Data File . . . 1. The SPSS Data File contains all the survey data. It has been laboriously and painstakingly created for you so that you can concentrate on the fun stuff. Other data files we might be getting in to . . . General Social Survey, ACS, Indian Microfinance Data base 2. Download the SPSS Data File MCIC METRO SURVEY 1991 2002 (from d2L, or from your flash drive). To create a working copy, do FILE/SAVE AS/<NAME> into My Documents. 3. Go to the saved Data File; CLICK on the FILE to open it with the SPSS program. (This is what you do on the computer in my office you we’ll test in LAB what you need to do there). 4. The File will open. It will have a Variable View which lists all the variables in alphabetical order, and a Data View which shows all the data for all the variables for all the cases in the survey. You can Search for Text in the Variable View to find a specific variable of interest. E.g. i_recy a. The Label column can give you helpful information Interest in Recycling 5. The Values column can give you helpful information. All values on the data set 1 Great Deal 2 Some 3 Hardly Any 4 None as well as 7 REF 8 DK 9 NA (note: 7, 8, 9 are potentially problematic) 6. But mostly we don’t do anything directly with the data file. Instead we work with . . . Step 3 . . . SPSS Syntax File 1. The SPSS Syntax File keeps track of all the commands you give to the SPSS program to prepare and analyze your data. The SPSS SYNTAX FILE is cumulative – it keeps a record of all the commands you have given. This is helpful when you are working with the same variables over and over. It is like a WORD file – you can add lines to it from other files, delete lines that are mistakes or no longer needed. You can COPY and PASTE from WORD or from Excel. 2. The first time ever you give an SPSS command using this data set – e.g. ANALYZE/DESCRIPTIVE STATISTICS/FREQUENCIES/ i_recy click PASTE and the SPSS program will automatically open an SPSS Syntax File for you with the command in it . . . e.g., FREQUENCIES VARIABLES=i_recy /ORDER=ANALYSIS. 4 SPS 580 Lecture 1 Theories, Variables, Data Sets 3. This SPSS Syntax File has no name attached to it. So, the first time ever you give an SPSS command using this data set use SAVE AS <NAME> to save a copy of this file to your flash drive. This is your SPSS Syntax File that goes with this data set. After the first time you open the SPSS Data File, you will then use SPSS/FILE/OPEN/SYNTAX/<NAME> to open your SPSS Syntax File. During the semester I will be providing SPSS Syntax commands to illustrate how to do specific things I want you to do, this will be in the form of an SPSS Syntax File called mcic -download it frequently from d2L 4. The way you execute a command in SPSS is to highlight the lines in the Syntax File that you want to execute and then RIGHT CLICK/RUN CURRENT Step 4 . . . SPSS Output File 1. The SPSS Output File is where the SPSS program writes the tables and text that result from the commands you give to the program to execute during this SPSS session. You get a different Output File each SPSS session. 2. The first time you are in the Syntax File and RIGHT CLICK/RUN CURRENT the SPSS program automatically opens the Output File. The computer will jump to the Output File. 3. On the right hand side of the Output File you will see a listing of the commands from the Syntax File you executed, followed by the output the SPSS program generates (assuming the commands are in correct syntax – otherwise you will get error messages that are even less helpful that Microsoft error messages). FREQUENCIES VARIABLES=i_recy /ORDER=ANALYSIS. Frequencies The command you executed The beginning of the output [DataSet1] C:\Documents and Settings\DTAYLOR8\My Documents\AA DePaul DGT courses and lectures\SPS 580 Statistics\CCC Data sets and other uploads\MCIC METRO SURVEY 1991 2002.sav The data set the output is from (duh!) Statistics i_recy Interest in Recycling N Valid Missing The number of valid and missing cases for the command from the original data set 6135 30555 5 SPS 580 Lecture 1 Variable name Theories, Variables, Data Sets Variable label i_recy Interest in Recycling Cumulative Frequency Valid Valid Percent Percent 1 Great Deal 3314 9.0 54.0 54.0 2 Some 2003 5.5 32.6 86.7 3 Hardly Any 400 1.1 6.5 93.2 4 None 377 1.0 6.1 99.3 41 .1 .7 100.0 6135 16.7 100.0 7 Refused 3 .0 9 No Answer 8 .0 System 30544 83.2 Total 30555 83.3 36690 100.0 8 Don't Know Total Missing Percent Total categories counts Other (sometimes) helpful information 4. On the left hand side of the Output File you will see a metadata-type summary of the contents of the right hand side. You can point and click on the left hand side to jump around the output on the right hand side. You can delete no-longer-useful output by RIGHT CLICK/CUT on either the left hand side metadata or on the right hand side output. 5. If you want to look at your output later, you can SAVE the SPSS Output File with FILE/SAVE AS/<NAME> It’s a good idea to do this at the beginning of the semester until you are comfortable with how the STEPS IN THE PROCESS work, or if you are having a problem and need to show your work to the tutor. 6. But the main reason we use the SPSS Output File is for . . . Step 5 . . . Post Processing with Excel 1. This is where you take the SPSS Output and create tables, charts and graphs in Excel according to Rules for Presentation Quality. 2. Mouse over the part of the SPSS Output File you want, RIGHT CLICK/COPY specific piece of output (table, list, etc.) from the Output File and then go to Excel and RIGHT CLICK/PASTE into an Excel File for Post Processing. 6 SPS 580 Lecture 1 Theories, Variables, Data Sets 3. When you PASTE to EXCEL you will get exactly the same information as above, but now in rows and columns of a spreadsheet. Something like this . . . Interest in Recycling Valid Frequency 1 Great Deal 3,314 2 Some 2,003 3 Hardly Any 400 4 None 377 41 8 Don't Know 6,135 Total Missing 7 Refused 3 9 No Answer 8 System 30,544 Total 30,555 36,690 Total 4. From this you need to make a presentation quality percentage table and a presentation quality column chart. Also, when we calculate percents for a table or chart, we will always EXCLUDE the DK, NA and REFUSED from the case base for the calculations. This means you need to turn the table above into something (exactly) like this presentation quality univariate percentage table . . . Interest in Recycling Categories Great Deal 54% Some 33% Hardly Any 7% None 6% Variable name Total to show these are meant to add to 100% 100% 5. Every percentage table in this class will also be accompanied by a chart or graph that visually shows the same information. To use Excel to make a univariate column chart, highlight this part of the univariate percentage table . . . Interest in Recycling Great Deal 54% Some 33% Hardly Any 7% None 6% I.e., don’t include the total 7 SPS 580 Lecture 1 Theories, Variables, Data Sets 6. And INSERT/COLUMN/2D COLUMN you will get this . . . Interest in Recycling 60% 50% 40% 30% Interest in Recycling 20% 10% 0% Great Deal Some Hardly Any None Which is made into a Presentation Quality Univariate Column Chart by editing in Excel so it looks like this . . . Interest in Recycling 54% 33% Great Deal Some 7% 6% Hardly Any None 8 Remove Legend Remove grid lines Remove Y Axis Remove X Axis Tick Marks Add data labels, no decimals Make smaller SPS 580 Lecture 1 Theories, Variables, Data Sets D. For the AGE variable, go to . . . MCIC Metro Survey Summary Report 1991 – 2002 This gives you the SPSS variable name You can probably go directly to the SPSS Data File, but for some variables in the MCIC survey there is additional useful information in the MCIC Metro Survey Combined Codebook . . . Question wording Explanatory text on how it was asked and coded Summary data – shows this variable was asked in all years 9 SPS 580 Lecture 1 Theories, Variables, Data Sets E. Armed with the variable name, go to SPSS and ask for the Frequencies, you will get. . . (THE COMMAND) age01 Age Of Respondent Frequency Valid 14 1 16 5 17 5 18 44 19 274 20 274 21 342 FREQUENCIES VARIABLES=age01 /ORDER=ANALYSIS. This isn’t fun because . . . There are too many categories Data in between for each individual year 96 3 97 2 98 2 And it includes DK in the counts 998 Do Not Know 23 35,874 Total Missing Total 997 Refused 736 999 No Answer 10 System 70 Total 816 36,690 F. Use SPSS to RECODE the age01 variable into fewer categories . . . You decide how many new categories you want, and how to group them together, A new, smaller set of mutually exclusive and exhaustive categories then go to TRANSFORM/ RECODE INTO DIFFERENT VARIABLES/ Choose age01 as INPUT VARIABLE Make up OUTPUT VARIABLE NAME . . . AGE4 ADD A LABEL Go to OLD AND NEW VALUES Map old values into the new values – mine are below – note: always keep control of your “missing value” category CONTINUE CHANGE PASTE this will write the instructions to the syntax file so you can see, save, modify RECODE age01 (10 thru 30=1) (31 thru 45=2) (46 thru 64=3) (65 thru 98=4) (ELSE=9) INTO AGE4. VARIABLE LABELS AGE4 'age4 '. I added these commands to the SPSS Syntax File . . . VALUE LABELS age4 1 'under 30' 2 '31-45' 3 '46-64' 4 '65+' 9 'not valid'. MISSING VALUES age4 (9) . 10 SPS 580 Lecture 1 Theories, Variables, Data Sets G. Then in the SPSS Syntax File use the mouse to highlight the commands that RECODE age01 into AGE4 and then RIGHT CLICK/ RUN CURRENT RECODE age01 (10 thru 30=1) (31 thru 45=2) (46 thru 64=3) (65 thru 98=4) (ELSE=9) INTO AGE4. VARIABLE LABELS AGE4 'age4 '. VALUE LABELS age4 1 'under 30' 2 '31-45' 3 '46-64' 4 '65+' 9 'not valid'. MISSING VALUES age4 (9) . If you did it right SPSS will say TRANSFORMATIONS PENDING If you didn’t do it right, SPSS will write a completely useless error message to the Output File You need to fix the error, by studying what you are supposed to be doing H. Get the frequencies for the revised variable AGE4 FREQUENCIES VARIABLES=AGE4 /ORDER=ANALYSIS. The key part of the SPSS Output will look like this . . . AGE4 age4 Cumulative Frequency Valid 1.00 under 30 Valid Percent Percent 7370 20.1 20.6 20.6 2.00 31-45 13574 37.0 37.9 58.4 3.00 46-64 9968 27.2 27.8 86.2 4.00 65+ 4939 13.5 13.8 100.0 35851 97.7 100.0 839 2.3 36690 100.0 Total Missing Percent 9.00 not valid Total I. After post-processing, the presentation quality work product should look like this Age Under 30 21% 31-45 38% 46-64 28% 65+ 14% Age 38% 28% 21% 14% 100% Under 30 11 31-45 46-64 65+ SPS 580 Lecture 1 Theories, Variables, Data Sets ASSIGNMENT : Write a report (3 pages maximum) that proposes three ideas about how public policy works that you are going to test with MCIC Metro Survey data. Write the paper in flowing English, for each idea explain the following. . . 1. What is the idea, what are the variables, what is the theory – i.e., a formal statement of the predicted causal relationship among the variables 2. Draw the causal diagram, clearly labeling the independent, dependent variables 3. Use the MCIC Metro Survey Data to show how you will operationally test the theory, for each variable . . . a. What question from the survey is used (include wording) b. Describe efficiently how the variable is coded, or recoded, for your proposed research c. Using data from the MCIC Metro Survey, include a presentation quality univariate percentage table and a presentation quality univariate column chart 4. For each dependent variable . . . a. Comment on the findings from the univariate percentage table and/or column chart and implications they might have for the test of your theory 12