Data Reference
(the very, very basics)
Data-reference: what do we need?




Tools
Strategies
Terminology
Understanding of what we are looking for: not
books or articles -- or facts.
Data-reference: what do we need?




Understanding of what we are looking for: not
books or articles -- or facts.
Terminology
Strategies
Tools
La trahison des images, The treachery of images, Rene Magritte
Ceci n’est pas les “data.”
C’est les statistiques!
Data Statistics
Raw (for analysis) Cooked (facts)
Intended for use by
computer
For human use:
Eye-readable, charts,
tables, graphs
Computer- Can be print, micro,
readable computer readable
Collected based on social Produced
science methodologies or from data
administrative procedures
Data
Statistics
Where do statistical babies come
from?
+
=
Data or Statistics: Why does it
matter?



Different search strategies and tools.
Defines your goal.
Helps you know when you've found it!
Tip: Data or Statistics?

Determine if the user wants (needs) statistics
or data.
–
–
–
Do you want want one number?
Are you looking for a fact or figure?
Do you want to know “how many?”
Tip: Data or Statistics?

Determine if the user wants (needs) statistics
or data.
–
–
–
Or… do you want a series of numbers?
Do you want to identify trends, make comparisons,
model relationships?
Will you be using statistical software (not Excel)?
http://factfinder.census.gov/
http://www.census.gov/compendia/statab/elections/election.pdf
http://www.census.gov/compendia/statab/tables/06s0405.xls
ftp://ftp.bls.gov/pub/special.requests/lf/aat44.txt
http://www.bls.gov/webapps/legacy/cpsatab7.htm
From survey to data to statistics…
Survey instrument
Q1. [enter zip code ]
Q2. [enter R’s first name ]
Q3. [enter sex of R ]
Q4. What was your major in College?
Q5. What was your income last year?
Q6. Did you go to church last week?
Answers to Questions
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name Sex
Wilma
F
Barney M
Betty
F
Ethel
F
Fred M. M
Lucy
F
Ricky
M
Fred A. M
Ginger F
Major income church
lit
0
y
engin
10
n
.
0
n
theater 1000
y
PE
10000
y
lit
700
y
music 11000
y
dance 10500
n
math
9500
y
Must anonymize the data!
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name Sex
Wilma
F
Barney M
Betty
F
Ethel
F
Fred M. M
Lucy
F
Ricky
M
Fred A. M
Ginger F
Major income church
lit
0
y
engin
10
n
.
0
n
theater 1000
y
PE
10000
y
lit
700
y
music 11000
y
dance 10500
n
math
9500
y
Must anonymize the data!
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name
001
002
003
004
005
006
007
008
009
Sex
F
M
F
F
M
F
M
M
F
Major income church
lit
0
y
engin
10
n
.
0
n
theater 1000
y
PE
10000
y
lit
700
y
music 11000
y
dance 10500
n
math
9500
y
Change Text to Numeric Codes
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name
001
002
003
004
005
006
007
008
009
Sex
F
M
F
F
M
F
M
M
F
Major income church
lit
0
y
engin
10
n
.
0
n
theater 1000
y
PE
10000
y
lit
700
y
music 11000
y
dance 10500
n
math
9500
y
Change Text to Numeric Codes
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name
001
002
003
004
005
006
007
008
009
Sex
1
2
1
1
2
1
2
2
1
Major income church
lit
0
y
engin
10
n
.
0
n
theater 1000
y
PE
10000
y
lit
700
y
music 11000
y
dance 10500
n
math
9500
y
Change Text to Numeric Codes
The “codebook” must
Zip
Name Sex Major income church
document the29002
numeric
001
1 lit
0
y
codes used! 99005 002
2 engin
10
n
99005
For example:92005
12534
Variable: 12534
“sex”
25000
1 = female
2 = male20000
15000
003
004
005
006
007
008
009
1
1
2
1
2
2
1
.
0
theater 1000
PE
10000
lit
700
music 11000
dance 10500
math
9500
n
y
y
y
y
n
y
Change Text to Numeric Codes
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name
001
002
003
004
005
006
007
008
009
Sex
1
2
1
1
2
1
2
2
1
Major
0075
0070
.
0076
0001
0075
0077
0078
0050
income church
0
y
10
n
0
n
1000
y
10000
y
700
y
11000
y
10500
n
9500
y
Change Text to Numeric Codes
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name
001
002
003
004
005
006
007
008
009
Sex
1
2
1
1
2
1
2
2
1
Major
0075
0070
.
0076
0001
0075
0077
0078
0050
income church
0
1
10
2
0
2
1000
1
10000
1
700
1
11000
1
10500
2
9500
1
Change Text to Numeric Codes
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name
001
002
003
004
005
006
007
008
009
Sex
1
2
1
1
2
1
2
2
1
Major income church
lit
0
y
engin
10
n
.
0
n
theater 1000
y
PE
10000
y
lit
700
y
music 11000
y
dance 10500
n
math
9500
y
Change Text to Numeric Codes
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name
001
002
003
004
005
006
007
008
009
Sex
1
2
1
1
2
1
2
2
1
Major income church
0075
0
y
engin
10
n
.
0
n
theater 1000
y
PE
10000
y
0075
700
y
music 11000
y
dance 10500
n
math
9500
y
Change Text to Numeric Codes
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name
001
002
003
004
005
006
007
008
009
Sex
1
2
1
1
2
1
2
2
1
Major
0075
0070
.
0076
0001
0075
0077
0078
0050
income church
0
y
10
n
0
n
1000
y
10000
y
700
y
11000
y
10500
n
9500
y
Change Text to Numeric Codes
Sometimes, even
Zip
Name Sex Major income church
numeric variables
29002 001
1 0075
0
1
are encoded 99005
in
002
2 0070
10
2
003
1 .
0
2
ranges. For 99005
example:
92005 004
005
Variable: 12534
“income”
1 = less12534
than 006
1000
007
2 = 100025000
- 4999
008
3 = 500020000
- 10000
4 = more15000
than 009
10000
9 = not reported
1
2
1
2
2
1
0076
0001
0075
0077
0078
0050
1000
10000
700
11000
10500
9500
1
1
1
1
2
1
Change Text to Numeric Codes
Sometimes, even
Zip
Name Sex Major income church
numeric variables
29002 001
1 0075
1
1
are encoded 99005
in
002
2 0070
1
2
003
1 .
1
2
ranges. For 99005
example:
92005 004
005
Variable: 12534
“income”
1 = less12534
than 006
1000
007
2 = 100025000
- 4999
008
3 = 500020000
- 10000
4 = more15000
than 009
10000
9 = not reported
1
2
1
2
2
1
0076
0001
0075
0077
0078
0050
2
3
1
4
4
3
1
1
1
1
2
1
Data Files do not need “headers”
Zip
29002
99005
99005
92005
12534
12534
25000
20000
15000
Name
001
002
003
004
005
006
007
008
009
Sex
1
2
1
1
2
1
2
2
1
Major
0075
0070
.
0076
0001
0075
0077
0078
0050
income church
1
1
1
2
1
2
2
1
3
1
1
1
4
1
4
2
3
1
Data Files do not need “headers”
29002
99005
99005
92005
12534
12534
25000
20000
15000
001
002
003
004
005
006
007
008
009
1
2
1
1
2
1
2
2
1
0075
0070
.
0076
0001
0075
0077
0078
0050
1
1
1
2
3
1
4
4
3
1
2
2
1
1
1
1
2
1
Data Files do not need extra space
29002
99005
99005
92005
12534
12534
25000
20000
15000
001
002
003
004
005
006
007
008
009
1
2
1
1
2
1
2
2
1
0075
0070
.
0076
0001
0075
0077
0078
0050
1
1
1
2
3
1
4
4
3
1
2
2
1
1
1
1
2
1
Data Files do not need extra space
290020011
990050022
990050031
920050041
125340052
125340061
250000072
200000082
150000091
0075
0070
.
0076
0001
0075
0077
0078
0050
1
1
1
2
3
1
4
4
3
1
2
2
1
1
1
1
2
1
Data Files do not need extra space
2900200110075
9900500220070
990050031.
9200500410076
1253400520001
1253400610075
2500000720077
2000000820078
1500000910050
1
1
1
2
3
1
4
4
3
1
2
2
1
1
1
1
2
1
Data Files do not need extra space
29002001100751
99005002200701
990050031.
1
92005004100762
12534005200013
12534006100751
25000007200774
20000008200784
15000009100503
1
2
2
1
1
1
1
2
1
Data Files do not need extra space
290020011007511
990050022007012
990050031.
12
920050041007621
125340052000131
125340061007511
250000072007741
200000082007842
150000091005031
Codebook must document locations
290020011007511
990050022007012
990050031.
12
920050041007621
125340052000131
125340061007511
250000072007741
200000082007842
150000091005031
For example:
Variable: “sex”
location: column 9
width:
1
Codebook must document locations
123456789
290020011007511
990050022007012
990050031.
12
920050041007621
125340052000131
125340061007511
250000072007741
200000082007842
150000091005031
For example:
Variable: “sex”
location: column 9
width:
1
Codebook documents question,
location, codes.
290020011007511
990050022007012
990050031.
12
920050041007621
125340052000131
125340061007511
250000072007741
200000082007842
150000091005031
For example:
Q3. [enter sex of R ]
Variable: “sex”
location: column 9
width:
1
Variable: “sex”
1 = female
2 = male
To Use Data You Need 3 Things
Data: the datafile (the raw numbers)
 Metadata: the “codebook” (where the
numbers are and what they mean)
 Statistical Software (for reading the
datafile and analyzing the data)

Data
+
Codebook
+
Statistical
software
90020011007511
990050022007012
990050031.
12
920050041007621
125340052000131
125340061007511
250000072007741
200000082007842
150000091005031
Q3. [enter sex of R ]
Variable: “sex”
location: column 9
width:
1
Variable: “sex”
1 = female
2 = male
And produces
charts, tables,
analysis, etc.
Student writes
SPSS program
to analyze
data…
SPSS reads
the data.
SPSS commands
90020011007511
990050022007012
990050031.
12
920050041007621
125340052000131
125340061007511
250000072007741
200000082007842
150000091005031
100
90
80
70
recoded question 7
60
Very Good / Good
50
Fair
Count
SPSS reads
the program
40
Poor / Very Poor
30
no opinion
MALE
RESPONDENTS SEX
Cases w eighted by WGHT
FEMALE
Female
49 years old
Codebook entry for variable
PRES92
Question
text
Responses
Codebook entry for variable
DEGREE
Question
text
Responses
Voted for
Clinton
49 years old
Junior college
Female
Pres92
Degree
Tip: "variables" contain the essential,
important content of data files
Tip: Data-reference is not about
searching for an answer…


Data reference is often less about searching to
find an answer. (That's a statistical reference
question.)
Data reference is often more about exploring to
find data that will enable users to ask a
question.
What have we learned?




Data and statistics are not the same
Data reference leads to primary research
material, not facts or statistics.
To use data, a user must have data, metadata,
and statistical software.
A-and…
What have we learned?


"Variables" are what contain critical, important
content of data files.
And that means that the gold-standard of datareference is variable-level searching.
http://gort.ucsd.edu/calpol/
Study of July 2003
Question Text
(Variable 34)