Weighting and imputation

Weighting and imputation PHC 6716 July 13, 2011 Chris McCarty Weighting • Weighting is the process of adjusting the contribution of each observation in a survey sample based on independent knowledge about appropriate distributions • Before weighting the implied weight of each observation is 1.0 • After weighting, some observations will have weights >1.0 and some <1.0, and some at 1.0 • No observations should have a weight of 0 • Two general types of weighting: – Design weights -- Adjusting for differences due to intentional disproportionate sampling (e.g. over-sampling African Americans or certain regions) – Post-stratification weights -- Adjusting for differences in population or households when release of sample is intended to be representative (e.g. adjustments for non-response of young people) Common sources for calculating weights • U.S. Census • Current Population Survey • American Community Survey • For Florida County, Age, Race, Ethnicity the BEBR Population Program How frequency procedures use weights • All statistical packages have options on procedures to incorporate weights • For frequency procedures the weights are multiplied by the unweighted frequencies, then percentages are calculated on the result How to make a simple weight A Region B Frequency C Percent Of Sample D Percent From Other Source E Weight (D/C) F Adjusted Frequency (B*E) G Adjusted Percent North 1 10.0 25.0 2.5 2.5 25.0 South 4 40.0 25.0 0.625 2.5 25.0 East 2 20.0 25.0 1.25 2.5 25.0 West 3 30.0 25.0 0.833 2.5 25.0 Total 10 100.00 100.00 - 10 100.00 What that would look like in data set Observation Region Employed Weight 1 N Y 2.5 2 S N 0.625 3 S Y 0.625 4 S Y 0.625 5 S N 0.625 6 E N 1.25 7 E N 1.25 8 W Y 0.833 9 W Y 0.833 10 W N 0.833 Total - - 9.99 Original and adjusted frequency of Employment variable Employed Frequency Percent Frequency Adjusted to Weights* Percent Adjusted to Weights Y 5 50.00 5.416 54.21 N 5 50.00 4.583 45.87 Total 10 100.00 9.99 100.00 *This is the sum of the weights for the category Notes on weighting • Typically you don’t want weights to make enormous differences • Keep in mind that with weighting you are saying you have information extraneous to the survey process that informs you of the proper distribution • You could conceivably up-weight results from a small sample strata • Weights are typically used for accurate estimates of prevalence • Models where you test relationships do not need weights if you include the variables you would use to weight Weighting with more than one variable • Combined weight with multiplication – Create individual weights for each variable then multiply weights to get a single weight (Wage*Wgender) – Not a good solution with a lot of variables • Combine weights iteratively – Calculate weight for a variable using frequency table – Use that weight in frequency of second variable to create weight – Use that weight in frequency of third variable to create weight – And so on Consumer Confidence • Survey of approximately 500 Florida households each month • RDD Landline Survey • Five questions (components) averaged into an Overall Index • Until now only post-stratification weighting by proportion of households by county Potential weighting variables County • Typically we get underrepresentation from large south Florida counties (Miami-Dade) and overrepresentation from northern counties (Alachua) • Household proportions are estimated between census years by BEBR • Weights June 2011.xls Potential weighting variables Age • RDD tends to lead to oversampling of seniors with landlines • Cell phones emerged as a problem around 2005 • No reliable age group data until 2010 Census • Elderly tend to be less confident than younger respondents due to fixed incomes • Weights June 2011.xls Potential weighting variables Hispanic Ethnicity • Cell phones tend to be used disproportionately by Hispanics • 2010 Census provided reliable data about proportion of Hispanic Floridians • Hispanics tend to have lower confidence than non-Hispanics • Weights June 2011.xls Potential weighting variables Gender • Cell phones are disproportionately used by young males • Monthly CCI uses youngest male/oldest female respondent selection • Weights June 2011.xls 2011-3 2010-12 2010-9 2010-6 2010-3 2009-12 2009-9 2009-6 2009-3 2008-12 2008-9 2008-6 2008-3 2011-4 2007-9 2007-6 2007-3 2006-12 2006-9 2006-6 2006-3 2005-12 2005-9 2005-6 2005-3 YM Overall Index 120 100 80 indexus 60 indexus_cnty indexus_cnty_a indexus_cnty_a_h 40 indexus_cnty_a_h_s 20 0 YM 2005-2 2005-4 2005-6 2005-8 2005-10 2005-12 2006-2 2006-4 2006-6 2006-8 2006-10 2006-12 2007-2 2007-4 2007-6 2007-8 2007-10 2011-4 2008-2 2008-4 2008-6 2008-8 2008-10 2008-12 2009-2 2009-4 2009-6 2009-8 2009-10 2009-12 2010-2 2010-4 2010-6 2010-8 2010-10 2010-12 2011-2 2011-4 Personal Finances Now Comapred to a Year Ago 100 90 80 70 60 icurfin 50 icurfin_cnty 40 icurfin_cnty_a icurfin_cnty_a_h 30 icurfin_cnty_a_h_s 20 10 0 YM 2005-2 2005-4 2005-6 2005-8 2005-10 2005-12 2006-2 2006-4 2006-6 2006-8 2006-10 2006-12 2007-2 2007-4 2007-6 2007-8 2007-10 2011-4 2008-2 2008-4 2008-6 2008-8 2008-10 2008-12 2009-2 2009-4 2009-6 2009-8 2009-10 2009-12 2010-2 2010-4 2010-6 2010-8 2010-10 2010-12 2011-2 2011-4 Personal Finances Expected a Year From Now 120 100 80 ifutfin 60 ifutfin_cnty ifutfin_cnty_a ifutfin_cnty_a_h 40 ifutfin_cnty_a_h_s 20 0 YM 2005-2 2005-4 2005-6 2005-8 2005-10 2005-12 2006-2 2006-4 2006-6 2006-8 2006-10 2006-12 2007-2 2007-4 2007-6 2007-8 2007-10 2011-4 2008-2 2008-4 2008-6 2008-8 2008-10 2008-12 2009-2 2009-4 2009-6 2009-8 2009-10 2009-12 2010-2 2010-4 2010-6 2010-8 2010-10 2010-12 2011-2 2011-4 US Economic Conditions Over Next Year 100 90 80 70 60 iusfufi 50 iusfufi_cnty 40 iusfufi_cnty_a iusfufi_cnty_a_h 30 iusfufi_cnty_a_h_s 20 10 0 YM 2005-2 2005-4 2005-6 2005-8 2005-10 2005-12 2006-2 2006-4 2006-6 2006-8 2006-10 2006-12 2007-2 2007-4 2007-6 2007-8 2007-10 2011-4 2008-2 2008-4 2008-6 2008-8 2008-10 2008-12 2009-2 2009-4 2009-6 2009-8 2009-10 2009-12 2010-2 2010-4 2010-6 2010-8 2010-10 2010-12 2011-2 2011-4 US Economic Conditions Over Next 5 years 100 90 80 70 60 iusnex5 50 iusnex5_cnty 40 iusnex5_cnty_a iusnex5_cnty_a_h 30 iusnex5_cnty_a_h_s 20 10 0 YM 2005-2 2005-4 2005-6 2005-8 2005-10 2005-12 2006-2 2006-4 2006-6 2006-8 2006-10 2006-12 2007-2 2007-4 2007-6 2007-8 2007-10 2011-4 2008-2 2008-4 2008-6 2008-8 2008-10 2008-12 2009-2 2009-4 2009-6 2009-8 2009-10 2009-12 2010-2 2010-4 2010-6 2010-8 2010-10 2010-12 2011-2 2011-4 Good Time to Buy Big Ticket Items 140 120 100 80 igbtime igbtime_cnty 60 igbtime_cnty_a igbtime_cnty_a_h 40 igbtime_cnty_a_h_s 20 0 2011-3 2010-12 2010-9 2010-6 2010-3 2009-12 2009-9 2009-6 2009-3 2008-12 2008-9 2008-6 2008-3 2011-4 2007-9 2007-6 2007-3 2006-12 2006-9 2006-6 2006-3 2005-12 2005-9 2005-6 2005-3 YM Overall Index -- Closeup 100 95 90 85 80 indexus indexus_cnty 75 indexus_cnty_a indexus_cnty_a_h 70 indexus_cnty_a_h_s 65 60 55 YM 2005-2 2005-4 2005-6 2005-8 2005-10 2005-12 2006-2 2006-4 2006-6 2006-8 2006-10 2006-12 2007-2 2007-4 2007-6 2007-8 2007-10 2011-4 2008-2 2008-4 2008-6 2008-8 2008-10 2008-12 2009-2 2009-4 2009-6 2009-8 2009-10 2009-12 2010-2 2010-4 2010-6 2010-8 2010-10 2010-12 2011-2 2011-4 Personal Finances Now Comapred to a Year Ago - Closeup 105 95 85 75 icurfin icurfin_cnty 65 icurfin_cnty_a icurfin_cnty_a_h 55 icurfin_cnty_a_h_s 45 35 Example 2- FHIS • The state of Florida wanted to estimate rates of the uninsured • They stratified the state into 17 regions and wanted to be able to make estimates for the state and each region with a tolerable margin of error • On the state level they wanted to be able to say something about Blacks, Hispanics and those under 200 percent of the poverty level • Data on these demographics for each Florida telephone exchange were obtained prior to sampling • Strata were created from exchanges • This made it possible to create weights based on known households in each exchange • This required design weights to adjust for disproportionate sampling Example 3 – Medicaid survey • The state wanted to evaluate Medicaid Reform being conducted in Duval and Broward counties • They wanted to administer a modified CAHPS instrument to Adults and Children separately • They wanted to stratify by plan as well, sampling a minimum number of observations per plan • In the end they wanted to compare plans, counties and adults and children • These weights required knowledge about total enrollment for each one of these characteristics (plan, age, county) Imputation • Like weighting, imputation involves adjusting the analysis after data collection • Unlike weighting, imputation is the deliberate creation of data that were not actually collected • The main reason for imputation is to retain observations in a statistical analysis that would otherwise be left out • Your ability to discover significant results may be compromised by too many missing values • In some case there may be systematic bias associated with missing data so that not imputing presents an unrepresentative result Imputation and regressions • Imputation is particularly common when data are analyzed with regression analysis • A regression model explains the variability in a dependent variable using one or more independent variables • Observations can only be included in the regression if they have values for all variables in the model • Models with a lot of variables increase the probability that an observation will have at least one missing value for them Example Model Income = β1(Age) + β2(Education)+ β3(Employed) Imputation algorithms • Two general categories – Random imputation assigns values randomly, often based on a desired statistical distribution – Deterministic imputation typically assigns values based on existing knowledge • Existing knowledge could be in the data – Single imputation fills missing data with one value, such as the mean of all non-missing values for a continuous variable – Multiple imputation fills in missing data with a set of plausible values – Hot deck imputation fills in missing values with those of an observation that matches on key variables

Weighting and imputation

Related documents

Products

Support

Weighting and imputation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib