Data Reference (the very, very basics) Data-reference: what do we need? Tools Strategies Terminology Understanding of what we are looking for: not books or articles -- or facts. Data-reference: what do we need? Understanding of what we are looking for: not books or articles -- or facts. Terminology Strategies Tools La trahison des images, The treachery of images, Rene Magritte Ceci n’est pas les “data.” C’est les statistiques! Data Statistics Raw (for analysis) Cooked (facts) Intended for use by computer For human use: Eye-readable, charts, tables, graphs Computer- Can be print, micro, readable computer readable Collected based on social Produced science methodologies or from data administrative procedures Data Statistics Where do statistical babies come from? + = Data or Statistics: Why does it matter? Different search strategies and tools. Defines your goal. Helps you know when you've found it! Tip: Data or Statistics? Determine if the user wants (needs) statistics or data. – – – Do you want want one number? Are you looking for a fact or figure? Do you want to know “how many?” Tip: Data or Statistics? Determine if the user wants (needs) statistics or data. – – – Or… do you want a series of numbers? Do you want to identify trends, make comparisons, model relationships? Will you be using statistical software (not Excel)? http://factfinder.census.gov/ http://www.census.gov/compendia/statab/elections/election.pdf http://www.census.gov/compendia/statab/tables/06s0405.xls ftp://ftp.bls.gov/pub/special.requests/lf/aat44.txt http://www.bls.gov/webapps/legacy/cpsatab7.htm From survey to data to statistics… Survey instrument Q1. [enter zip code ] Q2. [enter R’s first name ] Q3. [enter sex of R ] Q4. What was your major in College? Q5. What was your income last year? Q6. Did you go to church last week? Answers to Questions Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name Sex Wilma F Barney M Betty F Ethel F Fred M. M Lucy F Ricky M Fred A. M Ginger F Major income church lit 0 y engin 10 n . 0 n theater 1000 y PE 10000 y lit 700 y music 11000 y dance 10500 n math 9500 y Must anonymize the data! Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name Sex Wilma F Barney M Betty F Ethel F Fred M. M Lucy F Ricky M Fred A. M Ginger F Major income church lit 0 y engin 10 n . 0 n theater 1000 y PE 10000 y lit 700 y music 11000 y dance 10500 n math 9500 y Must anonymize the data! Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name 001 002 003 004 005 006 007 008 009 Sex F M F F M F M M F Major income church lit 0 y engin 10 n . 0 n theater 1000 y PE 10000 y lit 700 y music 11000 y dance 10500 n math 9500 y Change Text to Numeric Codes Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name 001 002 003 004 005 006 007 008 009 Sex F M F F M F M M F Major income church lit 0 y engin 10 n . 0 n theater 1000 y PE 10000 y lit 700 y music 11000 y dance 10500 n math 9500 y Change Text to Numeric Codes Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name 001 002 003 004 005 006 007 008 009 Sex 1 2 1 1 2 1 2 2 1 Major income church lit 0 y engin 10 n . 0 n theater 1000 y PE 10000 y lit 700 y music 11000 y dance 10500 n math 9500 y Change Text to Numeric Codes The “codebook” must Zip Name Sex Major income church document the29002 numeric 001 1 lit 0 y codes used! 99005 002 2 engin 10 n 99005 For example:92005 12534 Variable: 12534 “sex” 25000 1 = female 2 = male20000 15000 003 004 005 006 007 008 009 1 1 2 1 2 2 1 . 0 theater 1000 PE 10000 lit 700 music 11000 dance 10500 math 9500 n y y y y n y Change Text to Numeric Codes Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name 001 002 003 004 005 006 007 008 009 Sex 1 2 1 1 2 1 2 2 1 Major 0075 0070 . 0076 0001 0075 0077 0078 0050 income church 0 y 10 n 0 n 1000 y 10000 y 700 y 11000 y 10500 n 9500 y Change Text to Numeric Codes Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name 001 002 003 004 005 006 007 008 009 Sex 1 2 1 1 2 1 2 2 1 Major 0075 0070 . 0076 0001 0075 0077 0078 0050 income church 0 1 10 2 0 2 1000 1 10000 1 700 1 11000 1 10500 2 9500 1 Change Text to Numeric Codes Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name 001 002 003 004 005 006 007 008 009 Sex 1 2 1 1 2 1 2 2 1 Major income church lit 0 y engin 10 n . 0 n theater 1000 y PE 10000 y lit 700 y music 11000 y dance 10500 n math 9500 y Change Text to Numeric Codes Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name 001 002 003 004 005 006 007 008 009 Sex 1 2 1 1 2 1 2 2 1 Major income church 0075 0 y engin 10 n . 0 n theater 1000 y PE 10000 y 0075 700 y music 11000 y dance 10500 n math 9500 y Change Text to Numeric Codes Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name 001 002 003 004 005 006 007 008 009 Sex 1 2 1 1 2 1 2 2 1 Major 0075 0070 . 0076 0001 0075 0077 0078 0050 income church 0 y 10 n 0 n 1000 y 10000 y 700 y 11000 y 10500 n 9500 y Change Text to Numeric Codes Sometimes, even Zip Name Sex Major income church numeric variables 29002 001 1 0075 0 1 are encoded 99005 in 002 2 0070 10 2 003 1 . 0 2 ranges. For 99005 example: 92005 004 005 Variable: 12534 “income” 1 = less12534 than 006 1000 007 2 = 100025000 - 4999 008 3 = 500020000 - 10000 4 = more15000 than 009 10000 9 = not reported 1 2 1 2 2 1 0076 0001 0075 0077 0078 0050 1000 10000 700 11000 10500 9500 1 1 1 1 2 1 Change Text to Numeric Codes Sometimes, even Zip Name Sex Major income church numeric variables 29002 001 1 0075 1 1 are encoded 99005 in 002 2 0070 1 2 003 1 . 1 2 ranges. For 99005 example: 92005 004 005 Variable: 12534 “income” 1 = less12534 than 006 1000 007 2 = 100025000 - 4999 008 3 = 500020000 - 10000 4 = more15000 than 009 10000 9 = not reported 1 2 1 2 2 1 0076 0001 0075 0077 0078 0050 2 3 1 4 4 3 1 1 1 1 2 1 Data Files do not need “headers” Zip 29002 99005 99005 92005 12534 12534 25000 20000 15000 Name 001 002 003 004 005 006 007 008 009 Sex 1 2 1 1 2 1 2 2 1 Major 0075 0070 . 0076 0001 0075 0077 0078 0050 income church 1 1 1 2 1 2 2 1 3 1 1 1 4 1 4 2 3 1 Data Files do not need “headers” 29002 99005 99005 92005 12534 12534 25000 20000 15000 001 002 003 004 005 006 007 008 009 1 2 1 1 2 1 2 2 1 0075 0070 . 0076 0001 0075 0077 0078 0050 1 1 1 2 3 1 4 4 3 1 2 2 1 1 1 1 2 1 Data Files do not need extra space 29002 99005 99005 92005 12534 12534 25000 20000 15000 001 002 003 004 005 006 007 008 009 1 2 1 1 2 1 2 2 1 0075 0070 . 0076 0001 0075 0077 0078 0050 1 1 1 2 3 1 4 4 3 1 2 2 1 1 1 1 2 1 Data Files do not need extra space 290020011 990050022 990050031 920050041 125340052 125340061 250000072 200000082 150000091 0075 0070 . 0076 0001 0075 0077 0078 0050 1 1 1 2 3 1 4 4 3 1 2 2 1 1 1 1 2 1 Data Files do not need extra space 2900200110075 9900500220070 990050031. 9200500410076 1253400520001 1253400610075 2500000720077 2000000820078 1500000910050 1 1 1 2 3 1 4 4 3 1 2 2 1 1 1 1 2 1 Data Files do not need extra space 29002001100751 99005002200701 990050031. 1 92005004100762 12534005200013 12534006100751 25000007200774 20000008200784 15000009100503 1 2 2 1 1 1 1 2 1 Data Files do not need extra space 290020011007511 990050022007012 990050031. 12 920050041007621 125340052000131 125340061007511 250000072007741 200000082007842 150000091005031 Codebook must document locations 290020011007511 990050022007012 990050031. 12 920050041007621 125340052000131 125340061007511 250000072007741 200000082007842 150000091005031 For example: Variable: “sex” location: column 9 width: 1 Codebook must document locations 123456789 290020011007511 990050022007012 990050031. 12 920050041007621 125340052000131 125340061007511 250000072007741 200000082007842 150000091005031 For example: Variable: “sex” location: column 9 width: 1 Codebook documents question, location, codes. 290020011007511 990050022007012 990050031. 12 920050041007621 125340052000131 125340061007511 250000072007741 200000082007842 150000091005031 For example: Q3. [enter sex of R ] Variable: “sex” location: column 9 width: 1 Variable: “sex” 1 = female 2 = male To Use Data You Need 3 Things Data: the datafile (the raw numbers) Metadata: the “codebook” (where the numbers are and what they mean) Statistical Software (for reading the datafile and analyzing the data) Data + Codebook + Statistical software 90020011007511 990050022007012 990050031. 12 920050041007621 125340052000131 125340061007511 250000072007741 200000082007842 150000091005031 Q3. [enter sex of R ] Variable: “sex” location: column 9 width: 1 Variable: “sex” 1 = female 2 = male And produces charts, tables, analysis, etc. Student writes SPSS program to analyze data… SPSS reads the data. SPSS commands 90020011007511 990050022007012 990050031. 12 920050041007621 125340052000131 125340061007511 250000072007741 200000082007842 150000091005031 100 90 80 70 recoded question 7 60 Very Good / Good 50 Fair Count SPSS reads the program 40 Poor / Very Poor 30 no opinion MALE RESPONDENTS SEX Cases w eighted by WGHT FEMALE Female 49 years old Codebook entry for variable PRES92 Question text Responses Codebook entry for variable DEGREE Question text Responses Voted for Clinton 49 years old Junior college Female Pres92 Degree Tip: "variables" contain the essential, important content of data files Tip: Data-reference is not about searching for an answer… Data reference is often less about searching to find an answer. (That's a statistical reference question.) Data reference is often more about exploring to find data that will enable users to ask a question. What have we learned? Data and statistics are not the same Data reference leads to primary research material, not facts or statistics. To use data, a user must have data, metadata, and statistical software. A-and… What have we learned? "Variables" are what contain critical, important content of data files. And that means that the gold-standard of datareference is variable-level searching. http://gort.ucsd.edu/calpol/ Study of July 2003 Question Text (Variable 34)