National Health and Nutrition Examination Survey: A Very General Overview Taken from various NHANES sources and Lein’s comments. NHANES Objective To measure and assess the health and nutritional status of adults and children in the United States When did NHANES start? • The Health Examination Survey – the forerunner in the 1960’s • The first three National Health Examination Surveys (NHES) were conducted between 1960 and 1970. These surveys were known as NHES I, II, and III. • Between 1971 and 1975, a large nutrition component was added. Name was changed to NHANES. Sample • Civilian, non-institutionalized household population: • Residents of all states and the District of Columbia • All ages • Unique in combining a home interview with health examinations conducted in a Mobile Examination Center (MEC) • New Survey Available: 2009-2010 Six Principal Data Collection Methods • • • • • • Household interview Personal interviews Physical examination (MEC) Anthropometry (MEC) Diagnostic screening (MEC) Laboratory analysis (MEC) NHANES Mobile Exam Center (MEC) MEC examination components • Dietary interviews/MEC interviews • • • • • • • • Phlebotomy Urine collection Blood pressure Physician’s exam Hearing Eye exam Dental exam DXA • • • • Muscle strength Balance Anthropometry Skin disease/Melanoma • TB skin test • Cognitive testing • Cardiorespiratory fitness • Peripheral vascular disease • Peripheral neuropathy The major categories of NHANES data files • Demographics files: survey design and demographic variables • Examination files: information collected through physical exams, dental exams, and dietary interview components • Laboratory files: results from specimens such as blood, urine, hair, air, tuberculosis skin test, and household dust and water specimens • Questionnaire files: household interview and mobile examination center (MEC) interview Examples of NHANES Findings and Uses Landmark findings and public health results • High blood lead levels Lead out of gasoline • Low folate levels Mandatory food fortification • Rising levels of obesity Public health action plan • Racial and ethnic disparities in Hepatitis B Universal vaccination of all infants & children Trends in Child and Adolescent Overweight Percent 20 15 6-11 y 12-19 y 10 5 0 1963-5 1966-70 1971-4 1976-80 1988-94 1999-00 2001-2 2003-4 Note: Overweight is defined as BMI >= gender- and weight-specific 95th percentile from the 2000 CDC Growth Charts. Source: National Health Examination Surveys II (ages 6-11) and III (ages 12-17), National Health and Nutrition Examination Surveys I, II, III and 1999-2004, NCHS, CDC. OH9900 NHANES Complex Survey Design (sometimes called “multi-stage” survey design) NHANES data are NOT obtained using a simple random sample. Rather, a “complex”, multistage, probability sampling design is used to select participants representative of the civilian, non-institutionalized US population. In Brief.. • The entire US is broken into about 40 strata. • Each stratum is divided into many primary sampling units (PSUs) – mostly single counties. • Within each stratum two PSUs are selected. • Within each of the selected PSUs, individual households are selected and then individual subjects are selected. Household/Individual Oversampling In some geographic areas the proportion of some age, ethnic, or income groups are oversampled to provide for accurate subgroup reporting. E.g.: Native American subjects More on Individual Weights The sample weight is assigned to each individual subject. It is a measure of the number of people in the population represented by that sample person in NHANES. This design creates three sampling weighting variables STRATA (will have names with the letters “STRA” at the end ) PSUs (will have “PSU” at the end) Individual WEIGHTs (will begin with the prefix “WT”) Do I have to use sample weights and other survey design variables? • Yes. For NHANES datasets, the use of sampling weights and sample design variables is recommended for all analyses. • If you fail to account for the sampling parameters, you may obtain biased estimates and overstate significance levels. Selecting the Correct Weight To produce estimates appropriately adjusted for survey non-response, it is important to check all of the variables in your analysis and select the weight of the smallest analysis subpopulation. Using Weighting Variables in SAS SAS allows for three weighting statements in its survey procedures.. • • • STRATA Statement (for Strata Vars) CLUSTER Statement (for PSU Vars) WEIGHT Statement (for Weight Vars) An example of a SAS Survey Procedure with NHANES data (always begins with the prefix “survey”) proc surveymeans; var kcal; cluster SDMVPSU; strata SDMVSTRA; weight WTDRD1; run; What kinds of NHANES documents are available online? • Codebook -- The codebook portion lists all the variables in the data file. • Data file documentation – Provides a brief description of the file. • Frequency Tables -- Contains the frequency count for each item in the data file and can be used to verify the sample size. Where and how can I access NHANES data files? • NHANES data can be downloaded from the NHANES website. http://www.cdc.gov/nchs/nhanes.htm • NHANES files are in SAS transport file format (.xpt). • xpt files are easily read directly by SAS or converted to a .sas7bdat file with StatTransfer. How do you read .xpt files directly with SAS? libname demog xport 'c:\nhanes\demo_e.xpt'; data demo; set demog.demo_e; run; 1 libname demog xport 'c:\nhanes\demo_e.xpt'; NOTE: Libref DEMOG was successfully assigned as follows: Engine: XPORT Physical Name: c:\nhanes\demo_e.xpt 2 data demo; set demog.demo_e; 3 run; NOTE: There were 10149 observations read from the data set DEMOG.DEMO_E. NOTE: The data set WORK.DEMO has 10149 observations and 43 variables. StatTransfer: another option for using SAS .xpt files You can also use StatTransfer to convert SAS .xpt files into .sas7bdat StatTransfer Screen Shot Do I have to format and label all variables? • NHANES provides variable labels built into their data sets. • Formats are not included so you must create your own formats by using PROC FORMAT and the FORMAT Statement. Lots of Merging with NHANES The data files remain separate by type of measurement. This requires that you merge files together for analysis. Things to know about Survey Analysis • Not all software packages are equipped to analyze complex survey data. • See “Summary of Survey Analysis Software” at Harvard Med’s site for list and limitations: www.hcp.med.harvard.edu/statistics/survey-soft/ • All have limitations More… • Stata allows for an interactive “survey mode”. • SPSS provides limited menu-driven procedures. • SAS provides limited procedures. • SUDAAN is a program that can work with SAS to expand its survey procedures. Not widely available at UCB. Analyzing Sub-Population in Survey Analysis When analyzing sub-populations in complex survey design, it’s important NOT to subset your data. Instead, create a sub-population indicator variable. Correctly written statistical software will allow for the subpopulation variable to be included in the model specification. Quote from AJE Article (Graubard and Korn, 1996) “One frequently analyzes a subset of the data collected in a survey when interest focuses on individuals in a certain subpopulation of the sampled population. Although it may seem natural to eliminate from the data set all data from individuals outside the subpopulation before analysis, this procedure may yield incorrect standard errors and confidence intervals. “ Why is this? It seems counterintuitive. In complex (multi-stage) survey designs, the weighting variables for all subjects are used to compute the standard errors for subpopulations. The mathematics of this is complex – in general terms, though, the relative weight of each subject can only be fully accounted for by analyzing all of the subjects. How does SAS deal with subpopulations? Most SAS Survey procedures use a DOMAIN statement for sub-population analysis. This statemnet identifies a sub-population indicator variable. Your SAS Review Assignment will provide some practice in using NHANES data with SAS