ACS Public Use Microdata Samples of 2005 and 2006

ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR Conference, New Orleans May 16, 2008 1 Public Data • The American Community Survey (ACS) produces an annual Public Use Microdata Sample (PUMS) file. • You can download these files for free. • Write your own program to tally and analyze data. 2 Key Points • PUMS data users want to know the reliability of an estimate. • This paper explains how to use PUMS replicate weights to estimate standard errors. 3 Outline • the American Community Survey (ACS) • the Public Use Microdata Sample (PUMS) – – – – – sample design confidentiality weights standard errors issues with standard errors 4 The American Community Survey • The 2005 ACS – Sample of 250,000 housing units per month. – Every county represented in the fifty states, District of Columbia and Puerto Rico. – Collects population and housing characteristics • The 2006 ACS was similar but added – A sample of both institutional and noninstitutional Group Quarters population. – GQ sample size was 16,000 persons per month 5 PUMS Sample Design • PUMS is a subsample of ACS – Sort the ACS interviews on geography, mode of interview, types of housing units, demographics – Sample size: • one percent of the total HUs and HH persons in 2005 and 2006. • one percent of total GQ persons in 2006 – Systematic sampling at the state and PUMA level. 6 PUMA Definition • PUMA - Public Use Microdata Area – Designed for public release of information by local state officials. – Large enough to achieve disclosure avoidance. • An area of 100,000 population or more as of the 2000 Census. 7 PUMS Protects Confidentiality • PUMS does not reveal: – – – – Names of persons. Address. Detailed Type of group quarters. Geographic data below the PUMA level. • The respondent’s identity is protected. – – – – Top-coding of age, income and other variables. Data swapping Synthetic data Perturbation of data 8 Rural PUMAs in KY 9 PUMAs in Baltimore Co., MD 10 PUMS Weighting • The PUMS initial weight was equal to the ACS final weight times the sampling interval. • The 2006 PUMS file was ratio-estimated to ACS – persons in households by sex by PUMA – housing units by vacant/occupied by PUMA – persons in group quarters by institutional/ noninstitutional by state 11 How to Program an Estimate – Counts, Aggregates, Ratios, Medians • Totals (counts) – Sum the PUMS weights (for the characteristic). • Aggregates – Sum the product of the PUMS weight times the value • Ratios – Form the total or aggregate for the numerator – Sum the PUMS weights for the characteristic in the denominator – Divide • Medians – use weighted distributions 12 ACS Standard Errors • The ACS uses the successive difference model of replicate weights to estimate standard errors. • The successive difference model of Kirk Wolter was developed for ACS by Robert Fay and George Train. http://www.census.gov/hhes/www/saipe/asapaper/FayTrain95.pdf 13 Two Methods for PUMS Standard Errors • Design factor method – Design factors are factors to multiply times the standard error of a simple random sample. – Easier to use than the replicate weights • Replicate weight method – Generally, you get a more accurate standard error estimate by using the replicate weights. – Somewhat more work than design factors. http://acsweb2.acs.census.gov/acs/www/Downloads/2006/AccuracyPUMS.pdf 14 Three Steps to Standard Errors Using Replicate Weights • Write a program to derive an estimate using the PUMS weight. • Run the program 80 more times using each of the 80 replicate weights. • Use the PUMS estimate and the 80 replicate estimates in the Standard Error formula. 15 ACS PUMS Replicate Weight Formula for a Standard Error SE  80 4 2 X r  X   80 r 1 • where: – X is the estimate formed from the PUMS weight – Xr is the estimate formed from the rth replicate weight. 16 Standard Errors of Differences • There are two estimates, A and B. • You want to use a Z-test to see if the difference (A – B) is significant. • The Z-test requires the standard error of the difference. 17 For Independent Estimates Use the standard errors of the two estimates to estimate the standard error of the difference. SE (A - B)  SEA  SEB 2 2 • SEA-B – the standard error of (A – B) • SEA – the standard error of estimate A • SE B– the standard error of estimate B 18 For Correlated Estimates • Directly use the replicate weights to calculate the standard error of the difference. – Let X = (A - B) = the difference – Let Xr = (Ar – Br ) • for the 80 replicate differences X1 … X80 • Use the replicate weight formula (seen earlier). 19 Replicate Weight Issues • Estimate is zero, standard error is not zero. – Cannot use replicate weights to estimate the standard error. – See the PUMS Accuracy document for a formula. • The replicate standard error is zero, estimate is not zero. – Zero means that if you reselected the sample the answer would be the same. – Acceptable if estimate controlled in the weighting. – Not acceptable if the estimate is a median. Often a direct median gives a zero standard error. 20 Standard Error Options for Medians • Direct median with replicate weights may give a zero standard error. This is not good. • Categorical median with replicate weights will give a more stable standard error, but still some zero standard errors. • Design factor method – Start with either the direct or categorical median, use design factors for the standard error. 21 Conclusion • Replicate weights for ACS PUMS are: – Available for 2005 PUMS and later. – Easy to use for most estimates. – Few issues • For medians – Replicate weight standard errors may be zeros. – To avoid the zeros use the design factor method. 22 References • US Census Bureau: Accuracy of the Data (2006) for ACS is found at: – http://www.census.gov/acs/www/Downloads/ACS/accuracy2006.pdf • US Census Bureau: PUMS Accuracy of the Data (2006) is found at: – http://acsweb2.acs.census.gov/acs/www/Downloads/2006/AccuracyPUMS. pdf • US Census Bureau: Design and Methodology: American Community Survey, Technical Paper 67, May 2006, – http://www.census.gov/acs/www/Downloads/tp67.pdf • Fay & Train, Aspects of Survey and Model-Based Postcensal Estimation of Income and Poverty Characteristics for States and Counties, 1995 – http://www.census.gov/hhes/www/saipe/asapaper/FayTrain95.pdf 23 Contact Information • For questions about this presentation or for an example program to generate standard errors. • Contact me at B.Dale.Garrett@census.gov Views expressed in this paper are those of the authors and not necessarily those of the U.S. Census Bureau. 24 How to Derive an Estimate – Direct Medians • The direct median is the weighted sample median or the distributional median. • Sum the weights for the characteristic total. • Sort the file on the value of interest. • Sum the weights until the 50% point. • The direct median is the value of the record which crosses the 50% point. • Or a point between the values of two records that divide the file into two exact halves. 25 How to Derive an Estimate – Categorical Medians • Categorical or interpolated medians. – Used for published ACS statistics in Factfinder. • Categorical medians are interpolations: – A weighted distribution of the characteristic. – Each bin or row is assigned a range of values. – Uses linear interpolation for most variables. 26 Direct Median Example Based on 5 Records Record # Percent of Total Income Direct from median record 1 18% 18,000 2 22% 33,000 3 20% 41,000 4 15% 49,000 5 25% 62,000 41,000 27 Direct and Categorical Medians Example Based on 5 Records Income Range Record Percent # of Total Income Direct Categorical from median median record -59,000 to 20,000 1 18% 18,000 20,000 to 40,000 2 22% 33,000 40,000 to 60,000 3 20% 41,000 4 15% 49,000 60,000 + 5 25% 62,000 41,000 45,700 28

ACS Public Use Microdata Samples of 2005 and 2006

Related documents

Products

Support

ACS Public Use Microdata Samples of 2005 and 2006

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib