American Community Survey and Understanding Sampling Error

advertisement
Class prep



Go to S:\classes\UEP_ENV
Copy whole folder “American Community
Survey Error Exploration” to your Desktop
Make writable: right-click on folder =>
properties => uncheck read-only
Class prep
1
Using Windows Explorer, go to the following
folder:
American Community Survey Error Exploration
\AFF_data_tables\ Median_HH_Income_tract
2
Open:
a
a
ACS_10_SF4_B19013_metadata.csv – this is the metadata
file for the ACS data
ACS_10_SF4_B19013_Med_HH_Income.xlsx – this is the
data table (median household income)
Today




Census mapping basics review and
questions
Understanding American Community Survey
margin of errors
Calculating a reliability index (coefficient of
variation or CV)
Visualizing the CV on a map
Questions about joining tables to
geography?
Federal Information Processing Standards
(FIPS) Codes
Area
Name
FIPS
State
Massachusetts 25
County
Suffolk
25025
Tract
000601
25025000601
We JOIN the data
table to the geography
table using the
common ID column
Mapping Numbers
Graduated color
Graduated Colors…number of renters
Graduated Symbols…number of renters
Normalization (“divide by”)
Number of population in rental units normalized by
total population in occupied housing units
Fraction of
renters living in
each tract out of
total population
in occupied
housing units
Using “Normalization”


Normalize by means “divide by”
Percentage – e.g., number of renters over
total population in occupied housing




Result is a fraction, e.g., .45
Fractions are translated into percentages by
multiplying by 100
.45 = 45%
Density – population normalized by area
(e.g., sq mi, acre)
Classes and Classification Methods
Classes and Classification Methods
Classification methods

Details from ArcGIS 10.1 Help – standard
classification methods
1.
Natural breaks – good for skewed data
2.
Equal interval, defined interval, and standard
deviation – good for evenly distributed data to show
differences
3.
Quantiles - good for evenly distributed data to show
relative difference (e.g., top and bottom 20 percentile
4.
Geometric interval – compromise that attempts to
have similar number of features in each class with
intervals being roughly the same
Classification methods

Details from ArcGIS 10.1 Help – standard
classification methods

Equal interval

Defined interval

Natural breaks

Quantiles

Standard deviation

Geometric interval
Try them out!
Which classification methods
is best?
Formatting numeric labels
But make it better!
Clutter and data speak!
Clearer and cleaner
Review

Categories versus numbers

Proportional versus graduated symbols

Understanding classification methods


No “right” method – explore

Different methods => very different results

Number of classes – hard to distinguish over 6
Understanding normalization (“divide by”)
Mapping a particular area – two selection
options:

Select the town first, then perform select by
location to get all tracts that intersect that town
(or have their centroid in that town)

Zoom into an area slightly larger than the region
you want to map, then interactively select all the
tracts from in that area (e.g., use the select tool
to make a box around them)

Then Create Layer from Selected Features
Copying and pasting the same layer in your
table of contents

If you want to map several variables that are
within the same joined table(s), you can simply
copy and paste the layer so that you have
another copy

Then create maps from a different variable in
each layer

Make sure to change title, legend
American Community Survey
What users need to know
Test: why do we need to use
ACS data in policy /
environmental analysis?
Because it has important information
about our communities…
Because it has important information
about our communities…
So we need to learn to use the
information reliably…
And especially to understand the margin
of error for ACS estimates
Review – What is the ACS?




American Community Survey
A continuous monthly survey of households
Long set of questions covering many topics
Data is released once a year



1 Year averages – areas with a population 65,000+
3 Year averages – areas with a population 20,000+
5 Year averages - all other areas (including census
tracts and blockgroups)
E.g., average number of people commuting by
bicycle for 2007-2011
Use Census 2010 data where possible
because it is 100% survey, thus has
smaller sampling error

Population Counts




Age
Race / Hispanic Ethnicity
Housing Unit Counts and Tenure (rented,
owner-occupied)
Household and Family Relationships
ACS: Use the highest aggregation you can in terms of
tables (can be hard to find)
ACS and Margin of Error
Workers 16 and Over
Means of transportation for commute – Tract Level - ACS 2005-2009 5 year estimates
Universe is workers 16 and over
Open the Excel files…
a
a
ACS_10_SF4_B19013_Med_HH_Income.xlsx
– this is the data table (median household
income)
ACS_10_SF4_B19013_metadata.csv – this is
the metadata file for the ACS data
Metadata file and data table…
So let’s understand the
margin of error…
What is Sampling Error?
Definition
The uncertainty associated with an estimate
that is based on data gathered from a sample
of the population rather than the full
population
39
Illustration of Sampling Error
Estimate average number of children per
household for a population with 3 households
living in a block:
Household A has
1 child
Household B has
Household C has
2 children
3 children
The block average based on the full population
is two children per household: (1+2+3)/3
40
Conceptualizing Sampling Error
Three different samples of 2 households:
1.
2.
3.
Households A and B (1 child, 2 children)
Households B and C (2 children, 3 children)
Households A and C (1 child, 3 children)
Three different averages based on which
sample is used:
1.
2.
3.
(1 + 2) / 2 = 1.5 children
(2 + 3) / 2 = 2.5 children
(1 + 3) / 2 = 2 children
41
Sampling Error




Census 2010 is a 100% survey so has
smaller error
ACS data is based on samples – error is
larger
The smaller the geography, the larger the
error (because the sample is smaller)
Especially true for variables that sample a
small number of people, e.g., bike commuters
ACS and Margin of Error
Workers 16 and Over
Means of transportation for commute – Tract Level - ACS 2005-2009 5 year estimates
Universe is workers 16 and over
American Community Survey
and sampling error
The margin of error is calculated and
included with each estimate
Calculated at 90% confidence level
What does that mean?
ACS and Margin of Error
Workers 16 and Over
Means of transportation for commute – Tract Level - ACS 2005-2009 5 year estimates
Universe is workers 16 and over
Confidence level of 90%

We don’t know for sure how many people in
Tract 3.02 take public transit to work

Based on the ACS sample, our estimate over 5
years is that an average of 747 people take
transit, +/- 226 at 90% confidence level

If we did many, many samples of that same
tract, 90% of the time the resulting range (521973 people) would contain the real number of
commuters taking transit.

10% of the time it would not
Confidence level of 90%


The confidence level of a margin of error
indicates the likelihood that the true
population value (real number) falls within the
margin of error
We can be 90% confident that somewhere
between 571 and 973 people take transit to
work in tract 3.02
Also we know that Tract 3.02 has
somewhere between 1958 and 2684
workers)
So maybe half the workers take transit,
or maybe just a fifth of them do. Ugh!!!
If using ACS data, pay
attention to margin of error!
ACS table from American
Factfinder….
Use metadata file plus AFF web site
This table is showing
Educational
Attainment for
universe of people 25
years and older
Use AFF web site plus metadata file
Bottom line for ACS




More up to date information
Continuous versus point in time
measurement
5 year estimates are the most reliable
because they have the largest samples
But…


Poorer precision at finer scales (e.g., census tract)
or areas of low population (rural areas)
Poorer precision for variables with low numbers
(e.g., people who bike to work)
Don’t go any lower than
tracts for mapping ACS data
Geographic Hierarchy
Measures associated with
sampling error
56
Look at Excel file for
Med_HH_Income
Measures Associated with Sampling
Error

Standard Error (SE)

Margin of Error (MOE)

Coefficient of Variation (CV)
58
Standard Error (SE)
Definition
A measure of the variability of an estimate
due to sampling
Depends on variability in the population and
sample size
Formula
SE = MOE / 1.645 (for 90% confidence
level)
59
Margin of Error (MOE)
Definition
A measure of the precision of an estimate at
a given level of confidence (90%, 95%, 99%)
MOEs at the 90% confidence level are
published for all ACS estimates
60
Coefficient of Variation (CV)
Definition
The relative amount of sampling error
associated with a sample estimate
A measure of reliability
Formula
CV = Standard Error / Estimate * 100%
61
CV% is a measure of reliability. So
what is a good CV %?

No agreement

Depends on purpose

Census case studies:

less than 15% may be reliable

15-30% - not reliable, be very careful

Over 30% - not reliable, use with extreme caution
To calculate CV, we first calculate the
SE:
SE = (MOE / 1.645)
Then the CV% formula is:

CV = (SE / estimate)*100
Two examples
Median household income and biking to
work
Why do you think median household
income generally show lower CVs
(more reliable estimates)?
Exploring Error and the
American Community Survey
Your turn!
The American Community Survey
Margin of Error Tutorial goes through all
this, so do this on your own time for
practice
Census data table modifications

Preparing data takes understanding and time

Probably best to do it in Excel ahead of time

Always remember to process the GeoID2
field to make it text

To be compatible with shape file:

Column names – 10 characters max, no spaces
or symbols
Close Excel tables before
opening ArcMap
From desktop, open the following
mapfile

American Community Survey Error
Exploration \ Exploring Error in the American
Community Survey.mxd
Showing in ArcMap





Join the fixed Household Median Income
table to Census Tract shape file
Create a map of Household Median income –
5 classes by quantiles
Right-click and copy tract layer
Right-click on Layers and choose Paste
Layers
Map CV – 3 classes, with breaks at 15, 30,
and max value
Symbolizing
CV with
hatch
patterns
Hands on exploration of
commute data
GIS Tools for Mapping ACS
Estimates and Data Quality
Information
http://gesg.gmu.edu/
For your census mapping assignment




You need to make 6 maps
6 different census variables (not necessarily
from 6 different tables)
At least two of the maps have to show ACS
variables
You don’t have to show CV on your maps but
if you want to experiment, it’s good practice!
For your census mapping assignment


You can use census data you find from GIS
clearinghouses – e.g., MassGIS
Instructions for clipping coastal tracts on GIS
Tips and Tutorials web site
ACS and Error





Always be aware of error
Have a statement about error if you are
making maps
Might be good to visualize the CV as well, at
least as an inset?
In tables, include the margin of error
It’s your reputation that’s at stake!
Download