Design and Methodology Current

Current Population Survey Technical Paper 63RV Design and Methodology TP63RV U.S. Department of Labor BUREAU OF LABOR STATISTICS U.S. Department of Commerce Economics and Statistics Administration U.S. CENSUS BUREAU Acknowledgments This technical paper was written under the supervision of Donna L. Kostanich of the U.S. Census Bureau and Cathryn S. Dippo of the Bureau of Labor Statistics. It has been produced through the combined efforts of many individuals. Some wrote sections of the paper, some reviewed sections of the paper, and others did both. Some are still with their agencies and some have since left. Contributing from the U.S. Census Bureau were Adelle Berlinger, Chet Bowie, John Bushery, Lawrence S. Cahoon, Maryann Chapin, Jeffrey S. Corteville, Deborah Fenstermaker, Robin Fisher, Richard Griffiths, Carol A. Gunlicks, Frederick W. Hollmann, David V. Hornick, Vicki J. Huggins, Melinda Kraus, J. Mackey Lewis, Khandaker A. Mansur, Pamela D. McGovern, Thomas F. Moore, John Paletta, Maria Reed, Brad L. Rigby, Jennifer M. Rothgeb, Robert Rothhaas, Blair Russell, Wendy Scholetzky, Irwin Schreiner, Harland H. Shoemaker Jr., Katherine J. Thompson, Ronald R. Tucker, Gregory D. Weyland, Patricia Wilson, Andrew Zbikowski, and the many other people who have held positions in the Current Population Survey Branch of the Demographic Statistical Methods Division, as well as other divisions throughout the Census Bureau. Contributing from the Bureau of Labor Statistics were John E. Bregger, Martha Duff, Gloria P. Green, Harvey R. Hamel, Brian Harris-Kojetin, Diane E. Herz, Vernon C. Irby, Bodhini R. Jayasuriya, Janice K. Lent, Robert J. McIntire, Sandi Mason, Thomas Nardone, Anne E. Polivka, Ed Robison, Philip L. Rones, Richard Tiller, Clyde Tucker, and Tamara Sue Zimmerman. Sharon Berwager, Mary Ann Braham, and Cynthia L. Wellons-Hazer provided assistance in preparing the many drafts of this paper. John S. Linebarger was the Census Bureau’s Contracting Officer’s Technical Representative. The review of the final manuscript prior to publication was coordinated by Antoinette Lubich of the Census Bureau. Patricia Dean Brick served as an editorial consultant and Joseph Waksberg of Westat, Inc. assisted in the document’s review for technical accuracy. Kenton Kilgore and Larry Long of the Census Bureau provided final comments. EEI Communications constructed the subject index and list of acronyms. Kim D. Ottenstein, Cynthia G. Brooks, Frances B. Scott, Susan M. Kelly, Neeland Queen, Patricia Edwards, Gloria T. Davis and Laurene V. Qualls of the Administrative and Customer Services Division, Walter C. Odom, Chief, provided publications and printing management, graphics design and composition, and editorial review for print and electronic media. General direction and production management were provided by Michael G. Garland, Assistant Division Chief, and Gary J. Lauffer, Chief, Publications Services Branch. William Hazard and Alan R. Plisch of the Census Bureau provided assistance in placing electronic versions of this document on the Internet (see http://www.census.gov/prod/2002pubs/tp63rv.pdf). We are grateful for the assistance of these individuals, and all others who are not specifically mentioned, for their help with the preparation and publication of this document. Current Population Survey Technical Paper 63RV Design and Methodology TP63RV Issued March 2002 U.S. Department of Labor Elaine L. Chao, Secretary Bureau of Labor Statistics Lois Orr, Acting Commissioner U.S. Department of Commerce Donald L. Evans, Secretary Samuel W. Bodman, Deputy Secretary Economics and Statistics Administration Kathleen B. Cooper, Under Secretary for Economic Affairs U.S. CENSUS BUREAU William G. Barron, Jr., Acting Director ECONOMICS AND STATISTICS ADMINISTRATION Economics and Statistics Administration Kathleen B. Cooper, Under Secretary for Economic Affairs U.S. CENSUS BUREAU BUREAU OF LABOR STATISTICS William G. Barron, Jr., Acting Director Lois Orr, Acting Commissioner Lois Orr, Deputy Commissioner William G. Barron, Jr., Deputy Director Cathryn S. Dippo, Associate Commissioner for Survey Methods Research John M. Galvin, Associate Commissioner Office of Employment and Unemployment Statistics Philip L. Rones, Assistant Commissioner Office of Current Employment Analysis Nancy M. Gordon, Associate Director for Demographic Programs Cynthia Z. F. Clark, Associate Director for Methodology and Standards Foreword The Current Population Survey (CPS) is one of the oldest, largest, and most well-recognized surveys in the United States. It is immensely important, providing information on many of the things that define us as individuals and as a society—our work, our earnings, our education. It is also immensely complex. Staff of the Census Bureau and the Bureau of Labor Statistics have attempted, in this publication, to provide data users with a thorough description of the design and methodology used in the CPS. The preparation of this technical paper was a major undertaking, spanning several years and involving dozens of statisticians, economists, and others from the two agencies. This paper is the first major update of CPS documentation in more than two decades, and, while the basic approach to collecting labor force and other data through the CPS has remained intact over the intervening years, much has changed. In particular, a redesigned CPS was introduced in January 1994, centered around the survey’s first use of a computerized survey instrument by field interviewers. The questionnaire itself was rewritten to better communicate CPS concepts to the respondent, and to take advantage of computerization. This document describes the design and methodology that existed for the CPS as of December 1995. Some of the appendices cover updates that have been made to the survey since then. Users of CPS data should have access to up-to-date information about the survey’s methodology. The advent of the Internet allows us to provide updates to the material contained in this report on a more timely basis. Please visit our CPS web site at http://www.bls.census.gov/cps, where updated survey information will be made available. Also, we welcome comments from users about the value of this document and ways that it could be improved. Kenneth Prewitt Director U.S. Census Bureau March 2000 Katharine G. Abraham Commissioner Bureau of Labor Statistics March 2000 v Contents Summary of Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1. Background Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–1 Chapter 2. History of the Current Population Survey Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Major Changes in the Survey: A Chronology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–1 2–1 2–6 Chapter 3. Design of the Current Population Survey Sample Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Survey Requirements and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . First Stage of the Sample Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Second Stage of the Sample Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Third Stage of the Sample Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rotation of the Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–1 3–1 3–2 3–7 3–14 3–15 3–17 Chapter 4. Preparation of the Sample Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Listing Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Third Stage of the Sample Design (Subsampling). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interviewer Assignments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–1 4–4 4–6 4–6 Chapter 5. Questionnaire Concepts and Definitions for the Current Population Survey Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structure of the Survey Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Concepts and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–1 5–1 5–1 5–7 Chapter 6. Design of the Current Population Survey Instrument Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motivation for the Redesign of the Questionnaire Collecting of Labor Force Data . . . . . . Objectives of the Redesign. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Highlights of the Questionnaire Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Continuous Testing and Improvements of the Current Population Survey and Its Supplements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–1 6–1 6–1 6–3 6–8 6–8 Chapter 7. Conducting the Interviews Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noninterviews and Household Eligibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Initial Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subsequent Months’ Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–1 7–1 7–3 7–4 vi Chapter 8. Transmitting the Interview Results Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transmission of Interview Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–1 8–1 Chapter 9. Data Preparation Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daily Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Industry and Occupation (I&O) Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Edits and Imputations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–1 9–1 9–1 9–1 Chapter 10. Estimation Procedures for Labor Force Data Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unbiased Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adjustment for Nonresponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ratio Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . First-Stage Ratio Adjustment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Second-Stage Ratio Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Composite Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seasonal Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monthly State and Substate Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Producing Other Labor Force Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1 10–1 10–2 10–3 10–3 10–5 10–9 10–10 10–10 10–12 10–14 Chapter 11. Current Population Survey Supplemental Inquiries Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Criteria for Supplemental Inquiries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Recent Supplemental Inquiries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Annual Demographic Supplement (March Supplement). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–1 11–1 11–2 11–4 11–8 Chapter 12. Data Products From the Current Population Survey Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bureau of Labor Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U.S. Census Bureau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12–1 12–1 12–2 Chapter 13. Overview of Data Quality Concepts Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality Measures in Statistical Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality Measures in Statistical Process Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13–1 13–1 13–2 13–3 13–3 Chapter 14. Estimation of Variance Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variance Estimates by the Replication Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Method for Estimating Variance for 1990 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variances for State and Local Area Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generalizing Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variance Estimates to Determine Optimum Survey Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Total Variances As Affected by Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design Effects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14–1 14–1 14–2 14–3 14–3 14–5 14–6 14–7 14–9 vii Chapter 15. Sources and Controls on Nonsampling Error Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sources of Coverage Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Controlling Coverage Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sources of Error Due to Nonresponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Controlling Error Due to Nonresponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sources of Response Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Controlling Response Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sources of Miscellaneous Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Controlling Miscellaneous Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15–1 15–2 15–2 15–4 15–5 15–6 15–6 15–8 15–8 15–9 Chapter 16. Quality Indicators of Nonsampling Errors Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Coverage Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonresponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Response Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mode of Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time in Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Proxy Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16–1 16–1 16–3 16–5 16–6 16–7 16–10 16–10 16–10 Figures 3–1 4–1 5–1 7–1 7–2 7–3 7–4 7–5 7–6 7–7 7–8 7–9 7–10 8–1 11–1 16–1 16–2 F–1 J–1 CPS Rotation Chart: January 1996 - April 1998. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of Systematic Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Questions for Employed and Unemployed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introductory Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noninterviews: Types A, B, and C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noninterviews: Main Housing Unit Items Asked for Types A, B, and C. . . . . . . . . . . . . . Interviews: Main Housing Unit Items Asked in MIS 1 and Replacement Households. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary Table for Determining Who Is To Be Included As a Member of the Household . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interviews: Main Demographic Items Asked in MIS 1 and Replacement Households. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Demographic Edits in the CPS Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interviews: Main Items (Housing Unit and Demographic) Asked in MIS 5 Cases. . . Interviewing Results (December 1996) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Telephone Interview Rates (December 1996) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of CPS Monthly Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diagram of the ADS Weighting Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average Yearly Noninterview and Refusal Rates for the CPS 1964 - 1996 . . . . . . . . . Monthly Noninterview and Refusal Rates for the CPS 1984 - 1996 . . . . . . . . . . . . . . . . . 2000 Decennial and Survey Boundaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SCHIP Rotation Chart: September 2000 to December 2002 . . . . . . . . . . . . . . . . . . . . . . . . . 3–16 4–3 5–6 7–2 7–3 7–4 7–5 7–6 7–7 7–8 7–9 7–10 7–10 8–3 11–6 16–2 16–4 F–2 J–5 Illustrations 1 2 3 4 5 Segment Folder, BC-1669(CPS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit/Permit Listing Sheet (Unit Segment) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit/Permit Listing Sheet (Multiunit Structure) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit/Permit Listing Sheet (Large Special Exception) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Incomplete Address Locator Actions, BC-1718(ADP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B–2 B–4 B–5 B–6 B–7 viii Illustrations—Con. 6 7 8 9 10 11 12 13 Area Segment Listing Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Area Segment Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Segment Locator Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Map Legend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Group Quarters Listing Sheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit/Permit Listing Sheet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Permit Address List. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Permit Sketch Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B–8 B–10 B–11 B–12 B–13 B–14 B–15 B–16 Tables 3–1 3–2 3–3a 3–3b 3–4 3–5 3–6 3–7 3–8 10–1 10–2 10–3 11–1 11–2 11–3 12–1 14–1 14–2 14–3 14–4 14–5 16–1 16–2 16–3 Number and Estimated Population of Strata for 792-PSU Design by State . . . . . . . . . 3–6 Summary of Sampling Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–10 Old Construction Within-PSU Sorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–11 Permit Frame Within-PSU Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–12 Sampling Example Within a BPC for Any of the Three Old Construction Frames . . 3–12 Index Numbers Selected During Sampling for Code Assignment Examples . . . . . . . . 3–13 Example of Postsampling Survey Design Code Assignments Within a PSU . . . . . . . . 3–14 Example of Postsampling Operational Code Assignments Within a PSU . . . . . . . . . . . 3–14 Proportion of Sample in Common for 4-8-4 Rotation System . . . . . . . . . . . . . . . . . . . . . . . . 3–16 State First-Stage Factors for CPS 1990 Design (September - December 1995) . . . 10–5 Initial Cells for Hispanic/Non-Hispanic by Age and Sex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–6 Initial Cells for Black/White/Other by Age, Race, and Sex. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–6 Current Population Survey Supplements September 1994 - December 2001 . . . . . . 11–3 Summary of ADS Interview Month . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–5 Summary of SCHIP Adjustment Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–8 Bureau of Labor Statistics Data Products From the Current Population Survey . . . . 12–3 Parameters for Computation of Standard Errors for Estimates of Monthly Levels . . 14–5 Components of Variance for FSC Monthly Estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14–6 Effects of Weighting Stages on Monthly Relative Variance Factors. . . . . . . . . . . . . . . . . . 14–7 Effect of Compositing on Monthly Variance and Relative Variance Factors . . . . . . . . . 14–8 Design Effects for Total Monthly Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14–8 CPS Coverage Ratios by Age, Sex, Race, and Ethnicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16–2 Components of Type A Nonresponse Rates, Annual Averages for 1993 - 1996 . . . . 16–3 Percentage of Households by Number of Completed Interviews During the 8 Months in the Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16–4 16–4 Labor Force Status by Interview/Noninterview Status in Previous and Current Month . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16–5 16–5 Percentage of CPS Items With Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16–5 16–6 Comparison of Responses to the Original Interview and the Reinterview . . . . . . . . . . . 16–6 16–7 Index of Inconsistency for Selected Labor Force Characteristics February - July 1994 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16–6 16–8 Percentage of Households With Completed Interviews With Data Collected by Telephone (CAPI Cases Only) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16–7 16–9 Effect of Centralized Telephone Interviewing on Selected Labor Force Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16–8 16–10 Month-In-Sample Bias Indexes (and Standard Errors) in the CPS for Selected Labor Force Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16–9 16–11 Month-In-Sample Indexes in the CPS for Type A Noninterview Rates January December 1994 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16–10 16–12 Percentage of CPS Labor Force Reports Provided by Proxy Reporters . . . . . . . . . . . . 16–10 B–1 Listings for Multiunit Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B–3 E–1 Reliability of CPS Estimators Under the Current Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–1 F–1 Average Monthly Workload by Regional Office: 2001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F–3 H–1 The Proportion of Sample Cut and Expected CVs Before and After the January 1996 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H–2 ix Tables—Con. H–2 H–3 H–4 H–5 H–6 H–7 H–8 H–9 H–10 J–1 J–2 J–3 J–4 J–5 J–6 January 1996 Reduction: Number and Estimated Population of Strata for 754-PSU Design for the Nine Reduced States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reduction Groups Dropped From SR Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State First-Stage Factors for CPS January 1996 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . Components of Variance for FSC Monthly Estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effects of Weighting Stages on Monthly Relative Variance Factors. . . . . . . . . . . . . . . . . . Effect of Compositing on Monthly Variance and Relative Variance Factors . . . . . . . . . Design Effects for Total Monthly Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Components of Variance for Composited Month-to-Month Change Estimates . . . . . . Effect of Compositing on Month-to-Month Change Variance Factors . . . . . . . . . . . . . . . . Size of SCHIP General Sample Increase and Effect on Detectable Difference by State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Phase-in Table (First 11 Months of Sample Expansion) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effect of SCHIP on National Labor Force Estimates: June 2001 . . . . . . . . . . . . . . . . . . . . Effect of SCHIP on State Sampling Intervals and Coefficients of Variation. . . . . . . . . . CPS Sample Housing Unit Counts With and Without SCHIP . . . . . . . . . . . . . . . . . . . . . . . . Standard Errors on Percentage of Uninsured, Low-Income Children Compared at Stages of the SCHIP Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H–3 H–3 H–4 H–6 H–7 H–7 H–8 H–8 H–9 J–3 J–4 J–6 J–7 J–8 J–9 Appendix A. Maximizing Primary Sampling Unit (PSU) Overlap Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding a County to a PSU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dropping a County From a PSU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dropping and Adding PSU’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–1 A–3 A–3 A–4 A–5 Appendix B. Sample Preparation Materials Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit Frame Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Area Frame Materials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Group Quarters Frame Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Permit Frame Materials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B–1 B–1 B–3 B–9 B–9 Appendix C. Maintaining the Desired Sample Size Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maintenance Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C–1 C–1 Appendix D. Derivation of Independent Population Controls Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Population Universe for CPS Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculation of Population Projections for the CPS Universe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adjustment of CPS Controls for Net Underenumeration in the 1990 Census. . . . . . . . . . . . The Post-Enumeration Survey and Dual-System Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Monthly and Annual Revision Process for Independent Population Controls . . . . . . . Procedural Revisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary List of Sources for CPS Population Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–1 D–3 D–4 D–20 D–20 D–22 D–22 D–23 D–23 x Appendix E. State Model-Based Labor Force Estimation Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time Series Component Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diagnostic Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monthly Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . End-of-the-Year Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–1 E–1 E–4 E–4 E–4 E–4 E–5 E–6 Appendix F. Organization and Training of the Data Collection Staff Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Organization of Regional Offices/CATI Facilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Training Field Representatives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Field Representative Training Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Field Representative Performance Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluating Field Representative Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F–1 F–1 F–1 F–1 F–3 F–4 Appendix G. Reinterview: Design and Methodology Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Response Error Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality Control Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reinterview Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G–1 G–1 G–1 G–2 G–2 G–2 Appendix H. Sample Design Changes of the Current Population Survey: January 1996 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changes to the CPS Sample Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Revisions to CPS Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation of Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H–1 H–2 H–4 H–4 Appendix I. New Compositing Procedure Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Composite Estimation in the CPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computing Composite Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Note About Final Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I–1 I–1 I–2 I–2 I–3 Appendix J. Changes to the Current Population Survey Sample in July 2001 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . General Sample Increase in Selected States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Aspects of the SCHIP Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effect of SCHIP Expansion on Standard Errors of Estimates of Uninsured Low-Income Children. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J–1 J–1 J–8 J–8 J–10 Acronyms .......................................................................... ACRONYMS–1 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . INDEX–1 xi Summary of Changes (Changes made to Current Population Survey Technical Paper 63 to Produce Technical Paper 63RV, March 2002) Chapter 1. Background • page 1-1, left column, fourth paragraph:1 added a footnote about the sample size increase detailed in Appendix J. Chapter 2. History of the Current Population Survey • page 2-3, third paragraph of December 1971-March 1973: changed 1992 to 1972.2 • page 2-4, April 1984: changed 1995 to 1985. • page 2-5: added a section January 1998 describing a two-step composite estimation method. • page 2-5: added a section July 2001 describing the SCHIP. Chapter 3. Design of the Current Population Survey Sample • page 3-1, chapter heading: added a reference to Appendix J. • page 3-1, after second paragraph of INTRODUCTION: added text on the SCHIP. • page 3-11, Table 3-3a: corrected sorts. Chapter 9. Data Preparation • page 9-1, first paragraph of INDUSTRY AND OCCUPATION (I&O) CODING: added a footnote about the increase of cases for coding because of the SCHIP. Chapter 10. Estimation Procedures for Labor Force Data • page 10-1, last paragraph of INTRODUCTION: added text about a new compositing procedure detailed in Appendix I. • page 10-5, Table 10-1: changed 1999 in heading to 1990. • page 10-6, Table 10-3: inserted missing age category 55-59. • page 10-11, second paragraph of Estimates for States: added a footnote that all states are now based on a model. Chapter 11. Current Population Survey Supplemental Inquiries • pages 11-1–11-3: made clarifications and revisions to improve readability. • pages 11-3–11-8, Annual Demographic Supplement (March Supplement): major revisions due to more current methodologies and the inclusion of the SCHIP. Chapter 14. Estimation of Variance • page 14-5, Table 14-1: added a footnote that refers to Appendix H. Chapter 16. Quality Indicators of Nonsampling Errors • page 16-1, chapter heading: added a reference to a BLS internet site. Appendix E. State Model-Based Labor Force Estimation • page E-1, Table E-1: added a footnote about the SCHIP. Appendix F. Organization and Training of the Data Collection Staff • pages F-1 – F3, all sections: updated several of the numbers/percentages with 2001 data. • page F-3, Figure F-2: changed to Table F-1 and updated it. • page F-3, fourth paragraph of FIELD REPRESENTATIVE PERFORMANCE GUIDELINES: changed CARMIN to CARMN. 1 Page number, column, and paragraph number are those from TP63. They indicate where the change begins. Boldface indicates a change or addition from TP63 to TP63RV. 2 xii Appendix G. Reinterview: Design and Methodology • pages G-1–G-3, all sections: major revisions/deletions due to more current methodologies. Appendix H. Sample Design Changes of the Current Population Survey: January 1996 • page H-1, chapter heading: added a reference to Appendix J. Appendix J. Changes to the Current Population Survey Sample in July 2001 • This is an entirely new appendix, focusing on the changes that are collectively known as the SCHIP sample expansion. 1–1 Chapter 1. Background The Current Population Survey (CPS), sponsored jointly by the U.S. Census Bureau and the U.S. Bureau of Labor Statistics (BLS), is the Nation’s primary source of labor force statistics for the entire population. The CPS is the source of numerous high-profile economic statistics including the Nation’s unemployment rate and provides data on a wide range of issues relating to employment and earnings. The CPS also collects extensive demographic data which complement and enhance our understanding of labor market conditions in the Nation overall, among many different population groups, the various states, and even substate areas. Although the labor market information is central to the CPS, the survey provides a wealth of other social and economic data that are widely used by social scientists in both the public and private sectors. In addition, because of its long history, the CPS has been a model for other household surveys, both in the United States and in many other countries. Thus, the CPS is a source of information for both social science research and the study of survey methodology. This report aims to provide all users of the CPS with a comprehensive guide to the survey. The report focuses on labor force data because the timely and accurate collection of those data remains the principal purpose of the survey. The CPS is administered by the Census Bureau using a scientifically selected sample of some 50,000 occupied households.1 The fieldwork is conducted during the calendar week that includes the 19th of the month. The questions refer to activities during the prior week; that is, the week that includes the 12th of the month.2 Households from all 50 states and the District of Columbia are in the survey for 4 consecutive months, out for 8, and then return for another 4 months before leaving the sample permanently. This design ensures a high degree of continuity from 1 month to the next (as well as over the year). The 4-8-4 sampling scheme has the added benefit of allowing for the constant replenishment of the sample without excessive response burden. To be eligible to participate in the CPS, individuals must be 15 years of age or over and not in the Armed Forces. Persons in institutions, such as prisons, long-term care hospitals, and nursing homes are, by definition, ineligible to be interviewed in the CPS. In general, the BLS publishes labor force data only for persons age 16 and over, since those under 16 are substantially limited in their labor market activities by compulsory schooling and child labor laws. No upper age limit is used, and full-time students are treated the same as nonstudents. One person generally responds for all eligible members of the household. The person who responds is called the ‘‘reference person’’ and usually is the person who either owns or rents the housing unit. If the reference person is not knowledgeable about the employment status of the others in the household, attempts are made to contact those individuals directly. 1 Beginning with July 2001, the sample size increased to 60,000 occupied households. (See Appendix J for details.) 2 In the month of December, the survey is often conducted 1 week earlier to avoid conflicting with the holiday season. The CPS questionnaire is a completely computerized document that is administered by Census Bureau field Within 2 weeks of the completion of these interviews, the BLS releases the major results of the survey. Also included in BLS’s analysis of labor market conditions are data from a survey of nearly 400,000 employers (the Current Employment Statistics (CES) survey, conducted concurrently with the CPS). These two surveys are complementary in many ways. The CPS focuses on the labor force status (employed, unemployed, not in labor force) of the working-age population and the demographic characteristics of workers and nonworkers. The CES focuses on aggregate estimates of employment, hours, and earnings for several hundred industries that would be impossible to obtain with the same precision through a household survey. The CPS reports on individuals not covered in the CES, such as the self employed, agricultural workers, and unpaid workers in a family business. Information also is collected in the CPS about persons who are not working. In addition to the regular labor force questions, the CPS often includes supplemental questions on subjects of interest to labor market analysts. These include annual work activity and income, veteran status, school enrollment, contingent employment, worker displacement, and job tenure, among other topics. Because of the survey’s large sample size and broad population coverage, a wide range of sponsors use CPS supplements to collect data on topics as diverse as expectation of family size, tobacco use, computer use, and voting patterns. The supplements are described in greater detail in Chapter 11. 1–2 representatives across the country through both personal and telephone interviews. Additional telephone interviewing also is conducted from the Census Bureau’s three centralized collection facilities in Hagerstown, Maryland; Jeffersonville, Indiana; and Tucson, Arizona. The labor force concepts and definitions used in the CPS have undergone only slight modification since the survey’s inception in 1940. Those concepts and definitions are discussed in Chapter 5. 2–1 Chapter 2. History of the Current Population Survey INTRODUCTION The Current Population Survey (CPS) has its origin in a program established to provide direct measurement of unemployment each month on a sample basis. There were several earlier attempts to estimate the number of unemployed using various devices ranging from guesses to enumerative counts. The problem of measuring unemployment became especially acute during the Economic Depression of the 1930s. The Enumerative Check Census, taken as part of the 1937 unemployment registration, was the first attempt to estimate unemployment on a nationwide basis using probability sampling. During the latter half of the 1930s, the Work Projects Administration (WPA) developed techniques for measuring unemployment, first on a local area basis and later on a national basis. This research combined with the experience from the Enumerative Check Census led to the Sample Survey of Unemployment which was started in March 1940 as a monthly activity by the WPA. MAJOR CHANGES IN THE SURVEY: A CHRONOLOGY In August 1942, responsibility for the Sample Survey of Unemployment was transferred to the Bureau of the Census, and in October 1943, the sample was thoroughly revised. At that time, the use of probability sampling was expanded to cover the entire sample, and new sampling theory and principles were developed and applied to increase the efficiency of the design. The households in the revised sample were in 68 Primary Sampling Units (PSUs) (see Chapter 3), comprising 125 counties and independent cities. By 1945, about 25,000 housing units were designated for the sample, of which about 21,000 contained interviewed households. One of the most important changes in the CPS sample design took place in 1954 when, for the same total budget, the number of PSUs was expanded from 68 to 230, without any change in the number of sample households. The redesign resulted in a more efficient system of field organization and supervision and provided more information per unit of cost. Thus the accuracy of published statistics improved as did the reliability of some regional as well as national estimates. Since the mid-1950s, the CPS’s sample has undergone major revision after every decennial census. The following list chronicles the important modifications to the CPS starting in the mid-1940s: • July 1945. The CPS questionnaire was revised. The revision consisted of the introduction of four basic employment status questions. Methodological studies showed that the previous questionnaire produced results that misclassified large numbers of part-time and intermittent workers, particularly unpaid family workers. These groups were erroneously reported as not active in the labor force. • August 1947. The selection method was revised. The method of selecting sample units within a sample area was changed so that each unit selected would have the same basic weight. This change simplified tabulations and estimation procedures. • July 1949. Previously excluded dwelling places were now covered. The sample was extended to cover special dwelling places—hotels, motels, trailer camps, etc. This led to improvements in the statistics, (i.e., reduced bias) since residents of these places often have characteristics different from the rest of the population. • February 1952. Document sensing procedures were introduced in the survey process. The CPS questionnaire was printed on a document-sensing card. In this procedure, responses were recorded by drawing a line through the oval representing the correct answer using an electrographic lead pencil. Punch cards were automatically prepared from the questionnaire by document-sensing equipment. • January 1953. Ratio estimates now used data from the 1950 population census. Starting in January 1953, population data from the 1950 census were introduced into the CPS estimation procedure. Prior to that date, the ratio estimates had been based on 1940 census relationships for the first-stage ratio estimate, and 1940 population data were used to adjust for births, deaths, etc., for the second-stage ratio estimate. In September 1953, a question on ‘‘color’’ was added and the question on ‘‘veteran status’’ was deleted in the second-stage ratio estimate. This change made it feasible to publish separate, absolute numbers for persons by race; whereas, only the percentage of distributions had previously been possible. • July 1953. The 4-8-4 rotation system was introduced. This sample rotation system was adopted to improve measurement over time. In this system households are interviewed for 4 consecutive months 1 year, leave the 2–2 sample for 8 months, and return for the same period of 4 months the following year. In the previous system, households were interviewed for 6 months and then replaced. The 4-8-4 system provides some year-to-year overlap, thus improving estimate of change on both a month-tomonth and year-to-year basis. • September 1953. High speed electronic equipment was introduced for tabulations. The introduction of electronic calculation greatly increased timeliness and led to other improvements in estimation methods. Other benefits included the substantial expansion of the scope and content of the tabulations and the computation of sampling variability. The shift to modern computers was made in 1959. Keeping abreast of modern computing has proved a continuous process, and to this day, the Census Bureau is still updating and replacing its computer environment. • February 1954. The number of PSUs was expanded to 230. The number of PSUs was increased from 68 to 230 while retaining the overall sample size of 25,000 designated housing units. The 230 PSUs consisted of 453 counties and independent cities. At the same time, a substantially improved estimation procedure (See Chapter 10, Composite Estimation) was introduced. Composite estimation took advantage of the large overlap in the sample from month-to-month. These two changes improved the reliability of most of the major statistics by a magnitude that could otherwise be achieved only by doubling the sample size. • May 1955. Monthly questions on part-time workers were added. Monthly questions exploring the reasons for part-time work were added to the standard set of employment status items. In the past, this information had been collected quarterly or less frequently and was found to be valuable in studying labor market trends. • July 1955. Survey week was moved. The CPS survey week was moved to the calendar week containing the 12th day of the month to align the CPS time reference with that of other employment statistics. Previously, the survey week had been the calendar week containing the 8th day of the month. • May 1956. The number of PSUs was expanded to 330. The number of PSUs was expanded from 230 to 330. The overall sample size also increased by roughly twothirds to a total of about 40,000 households units (about 35,000 occupied units). The expanded sample covered 638 counties and independent cities. All of the former 230 PSUs were also included in the expanded sample. The expansion increased the reliability of the major statistics by around 20 percent and made it possible to publish more detailed statistics. • January 1957. Employment status definition was changed. Two relatively small groups of persons, both formerly classified as employed ‘‘with a job but not at work,’’ were assigned to new classifications. The reassigned groups were (1) persons on layoff with definite instructions to return to work within 30 days of the layoff date and (2) persons waiting to start new wage and salary jobs within 30 days of the interview. Most of the persons in these two groups were shifted to the unemployed classification. The only exception was the small subgroup in school during the survey week who were waiting to start new jobs; these persons were transferred to ‘‘not in labor force.’’ This change in definition did not affect the basic question or the enumeration procedures. • June 1957. Seasonal adjustment was introduced. Some seasonally adjusted unemployment data were introduced early in 1955. An extension of the data—using more refined seasonal adjustment methods programmed on electronic computers—was introduced in July 1957. The new data included a seasonally adjusted rate of unemployment and trends of seasonally adjusted total employment and unemployment. Significant improvements in methodology emerged from research conducted at the Bureau of Labor Statistics and the Census Bureau in the ensuing years. • July 1959. Responsibility for CPS was moved between agencies. Responsibility for the planning, analysis, and publication of the labor force statistics from the CPS was transferred to the BLS as part of a large exchange of statistical functions between the Commerce and Labor Departments. The Census Bureau continued to have (and still has) responsibility for the collection and computer processing of these statistics, for maintenance of the CPS sample, and for related methodological research. Interagency review of CPS policy and technical issues continues under the aegis of the Statistical Policy Division, Office of Management and Budget. • January 1960. Alaska and Hawaii were added to the population estimates and the CPS sample. Upon achieving statehood, Alaska and Hawaii were included in the independent population estimates and in the sample survey. This increased the number of sample PSUs from 330 to 333. The addition of these two states affected the comparability of population and labor force data with previous years. Another result was in an increase of about 500,000 in the noninstitutional population of working age and about 300,000 in the labor force, four-fifths of this in nonagricultural employment. The levels of other labor force categories were not appreciably changed. • October 1961. Conversion to the Film Optical Sensing Device for Input to the Computer (FOSDIC) system. The CPS questionnaire was converted to the FOSDIC type used by the 1960 census. Entries were made by filling in small circles with an ordinary lead pencil. The questionnaires were photographed to microfilm. The microfilms were then scanned by a reading device which transferred the information directly to computer tape. This system permitted a larger form and a more flexible arrangement 2–3 of items than the previous document-sensing procedure and did not require the preparation of punch cards. This data entry system was used through December 1993. • January 1963. New descriptive information was made available. In response to recommendations of a review committee, two new items were added to the monthly questionnaire. The first was an item, formerly carried out only intermittently, on whether the unemployed were seeking full- or part-time work. The second was an expanded item on household relationships, formerly included only annually, to provide greater detail on the marital status and household relationship of unemployed persons. • March 1963. The sample and population data used in ratio estimates were revised. From December 1961 to March 1963, the CPS sample was gradually revised. This revision reflected the changes in both population size and distribution as established by the 1960 census. Other demographic changes, such as the industrial mix between areas, were also taken into account. The overall sample size remained the same, but the number of PSUs increased slightly to 357 to provide greater coverage of the fast growing portions of the country. For most of the sample, census lists replaced the traditional area sampling. These lists were developed in the 1960 census. These changes resulted in further gains in reliability of about 5 percent for most statistics. The census-based updated population information was used in April 1962 for first- and second-stage ratio estimates. • January 1967. The sample was expanded to 449 PSUs. The CPS sample was expanded from 357 to 449 PSUs. An increase in total budget allowed the overall sample size to increase by roughly 50 percent to a total of about 60,000 housing units (52,500 occupied units). The expanded sample had households in 863 counties and independent cities with at least some coverage in every state. This expansion increased the reliability of the major statistics by about 20 percent and made it possible to publish more detailed statistics. The concepts of employment and unemployment were modified. In line with the basic recommendations of the President’s Committee to Appraise Employment and Unemployment Statistics (Eckler, 1972), a several-year study was conducted to develop and test proposed changes in the labor force concepts. The principal research results were implemented in January 1967. The changes included a revised age cutoff in defining the labor force; and new questions to improve the information on hours of work, the duration of unemployment, and the self-employed. The definition of unemployment was also revised slightly. The revised definition of unemployment led to small differences in the estimates of level and month-to-month change. • March 1968. Separate age/sex ratio estimation cells were introduced for Negro and other races. Previously, the second-stage ratio estimation used non-White and White race categories by age groups and sex. The revised procedures allowed for separate ratio estimates for Negro and Other1 race categories. This change amounts essentially to an increase in the number of ratio estimation cells from 68 to 116. • January 1971 and January 1972. 1970 census occupational classification was introduced. The questions on occupation were made more comparable to those used in the 1970 census by adding a question on major activities or duties of current job. The new classification was introduced into the CPS coding procedures in January 1971. Tabulated data were produced in the revised version beginning in January 1972. • December 1971-March 1973. Sample was expanded to 461 PSUs and data used in ratio estimation were updated. From December 1971 to March 1973, the CPS sample was revised gradually to reflect the changes in population size and distribution as described by the 1970 census. As part of an overall sample optimization, the sample size was reduced slightly (from 60,000 to 58,000 housing units), but the number of PSUs increased to 461. Also, the cluster design was changed from six nearby (but not contiguous) to four usually contiguous households. This change was undertaken after research found that smaller cluster sizes would increase sample efficiency. Even with the reduction in sample size, this change led to a small gain in reliability for most characteristics. The noninterview adjustment and first stage ratio estimate adjustment were also modified to improve the reliability of estimates for central cities and the rest of the Standard Metropolitan Statistical Areas (SMSAs). In January 1972, the population estimates used in the second-stage ratio estimation were updated to the 1970 census base. • January 1974. Inflation-deflation method was introduced for deriving independent estimates of the population. The derivation of independent estimates of the civilian noninstitutional population by age, race, and sex used in second-stage ratio estimation in preparing the monthly labor force estimates now used the inflation-deflation method (see Chapter 10). • September 1975. State supplementary samples were introduced. An additional sample, consisting of about 14,000 interviews each month, was introduced in July 1975 to supplement the national sample in 26 states and the District of Columbia. In all, 165 new PSUs were involved. The supplemental sample was added to meet a specific reliability standard for estimates of the annual average number of unemployed persons for each state. 1 Other includes American Indian, Eskimo, Aleut, Asian, and Pacific Islander. 2–4 In August 1976, an improved estimation procedure and modified reliability requirements led to the supplement PSUs being dropped from three states. Thus, the size of the supplemental sample was reduced to about 11,000 households in 155 PSUs. • October 1978. Procedures for determining demographic characteristics were modified. At this time, changes were made in the collection methods for household relationship, race, and ethnicity. Race was now determined by the respondent rather than by the interviewer. Other modifications included the introduction of earnings questions for the two outgoing rotations. New items focused on usual hours worked, hourly wage rate, and usual weekly earnings. Earnings items were asked of currently employed wage and salary workers. • January 1979. A new two-level, first-stage ratio estimation procedure was introduced. This procedure was designed to improve the reliability of metropolitan/nonmetropolitan estimates. Other newly introduced items were the monthly tabulation of children’s demographic data, including relationship, age, sex, race, and origin. • September/October 1979. The final report of the National Commission on Employment and Unemployment Statistics (NCEUS; ‘‘Levitan’’ Commission) (Executive Office of the President, 1976) was issued. This report shaped many of the future changes to the CPS. • January 1980. To improve coverage about 450 households were added to the sample, increasing the number of total PSUs to 629. • May 1981. The sample was reduced by approximately 6,000 assigned households bringing the total sample size to approximately 72,000 assigned households. • January 1982. The race categories in the second-stage ratio estimation adjustment were changed from White/NonWhite to Black/Non-Black. These changes were made to eliminate classification differences in race that existed between the 1980 census and the CPS. The change did not result in notable differences in published household data. Nevertheless, it did result in more variability for certain ‘‘White,’’ ‘‘Black,’’ and ‘‘Other’’ characteristics. As is customary, the CPS uses ratio estimates from the most recent decennial census. Beginning in January 1982, these ratio estimates were based on findings from the 1980 census. The use of the 1980 census-based population estimates, in conjunction with the revised second-stage adjustment, resulted in about a 2 percent increase in the estimates for total civilian noninstitutional population 16 years and over, civilian labor force, and unemployed persons. The magnitude of the differences between 1970 and 1980 census-based ratio estimates affected the historical comparability and continuity of major labor force series; therefore, the BLS revised approximately 30,000 series back to 1970. • November 1982. The question series on earnings was extended to include items on union membership and union coverage. • January 1983. The occupational and industrial data were coded using the 1980 classification systems. While the effect on industry-related data was minor, the conversion was viewed as a major break in occupationrelated data series. The census developed a ‘‘list of conversion factors’’ to translate occupation descriptions based on the 1970 census-coding classification system to their 1980 equivalents. Most of the data historically published for the ‘‘Black and Other’’ population group were replaced by data which relate only to the ‘‘Black’’ population. • October 1984. School enrollment items were added for persons 16-24 years of age. • April 1984. The 1970 census-based sample was phasedout through a series of changes that were completed by July 1985. The redesigned sample used data from the 1980 census to update the sampling frame, took advantage of recent research findings to improve the efficiency and quality of the survey, and used a state-based design to improve the estimates for the states without any change in sample size. • September 1984. Collection of veteran’s data for females was started. • January 1985. Estimation procedures were changed to use data from the 1980 census and new sample. The major changes were to the second-stage adjustment which replaced population estimates for ‘‘Black’’ and ‘‘Non-Black’’ (by sex and age groups) with population estimates for ‘‘White,’’ ‘‘Black,’’ and ‘‘Other’’ population groups. In addition, a separate, intermediate step was added as a control to the Hispanic2 population. The combined effect of these changes on labor force estimates and aggregates for most population groups was negligible; however, the Hispanic population and associated labor force estimates were greatly affected and revisions were made back to January 1980 to the extent possible. • June 1985. The CPS Computer Assisted Telephone Interviewing (CATI) facility was opened at Hagerstown, Maryland. A series of tests over the next few years were conducted to identify and resolve the operational issues associated with the use of CATI. Later tests focused on CATI-related issues, such as data quality, costs, and mode effects on labor force estimates. Samples used in these tests were not used as part of the CPS. • April 1987. First CATI cases were used in CPS monthly estimates. Initially, CATI started with 300 cases a month. As operational issues were resolved and new telephone 2 Hispanics may be of any race. 2–5 centers were opened—Tucson, Arizona (May 1992) and Jeffersonville, Indiana (September 1994)—the CATI workload was gradually increased to about 9,200 cases a month (January 1995). • June 1990. The first of a series of experiments to test alternative labor force questionnaires was started at the Hagerstown Telephone Center. These tests used random digit dialing and were conducted in 1990 and 1991. • July 1992. The CATI and Computer Assisted Personal Interviewing (CAPI) Overlap (CCO) experiments began. CATI and automated laptop versions of the revised CPS questionnaire were used in a sample of about 12,000 households selected from the National Crime Victimization Survey sample. The experiment continued through December 1993. The CCO ran parallel with the official CPS. The CCO’s main purpose was to gauge the combined effect of the new questionnaire and computer-assisted data collection. It is estimated that the redesign had no statistically significant effect on the total unemployment rate, but it did affect statistics related to unemployment, such as the reasons for unemployment, the duration of unemployment, and the industry and occupational distribution of the unemployed with previous work experience. It also is estimated that the redesign significantly increased the employment-to-population ratio and the labor force participation rate for women, but significantly decreased the employment-to-population ratio for men. Along with the changes in employment, it was estimated that the redesign significantly influenced the measurement of characteristics related to employment, such as the proportion of employment working part-time, the proportion working part-time for economic reasons, the number of individuals classified as self-employed, and industry and occupational distribution of the employed. • January 1994. A new questionnaire designed solely for use in computer assisted interviewing was introduced in the official CPS. Computerization allowed for the use of a very complex questionnaire without increasing response burden, increased consistency by reducing interviewer error, permitted editing at time of interviewing, and allowed for the use of dependent interviewing where information reported in one month (industry/occupation, retired/disabled statuses, and duration of unemployment) was confirmed or updated in subsequent months. Industry and occupation codes from the 1990 census were introduced. Population estimates were converted to 1990 census base for use in ratio estimation procedures. • April 1994. The 16-month phase-in of the redesigned sample based on the 1990 census began. The primary purpose of this sample redesign was to maintain the efficiency of the sampling frames. Once phased-in, this resulted in a monthly sample of 56,000 eligible housing units in 792 sample areas. The details of the 1990 sample redesign are described in Chapter 3. • December 1994. Starting in December 1994, a new set of response categories was phased in for the relationship to reference person. This modification was directed at individuals not formally related to the reference person to identify unmarried partners in a household. The old partner/roommate category was deleted and replaced with the following categories: unmarried partner, housemate/roommate, and roomer/boarder. This modification was phased in two rotation groups at a time and was fully in place by March 1995. This change had no effect on the family statistics produced by CPS. • January 1996. The 1990 CPS design was changed because of a funding reduction. The original reliability requirements of the sample were relaxed, allowing a reduction in the national sample size from roughly 56,000 eligible housing units to 50,000 eligible housing units. The reduced CPS national sample contains 754 PSUs. The details of the sample design changes as of January 1996 are described in Appendix H. • January 1998. A new two-step composite estimation method for the CPS was implemented (See Appendix I). The first step involves computation of composite estimates for the main labor force categories, classified by important demographic characteristics. The second adjusts person weights, through a series of ratio adjustments, to agree with the composite estimates, thus incorporating the effect of composite estimation into the person weights. This new technique provides increased operational simplicity for microdata users and improves the accuracy of labor force estimates by using different compositing coefficients for different labor force categories. The weighting adjustment method assures additivity while allowing this variation in compositing coefficients. • July 2001. Effective with the release of July 2001 data, official labor force estimates from the CPS and Local Area Unemployment Statistics (LAUS) program reflect the expansion of the monthly CPS sample from about 50,000 to about 60,000 eligible households. This expansion of the monthly CPS sample was one part of the Census Bureau’s plan to meet the requirements of the State Children’s Health Insurance Program (SCHIP) legislation. The SCHIP legislation requires the Census Bureau to improve state estimates of the number of children who live in low-income families and lack health insurance. These estimates are obtained from the Annual Demographic Supplement to the CPS. In September 2000, the Census Bureau began expanding the monthly CPS sample in 31 states and the District of Columbia. States were identified for sample supplementation based on the standard error of their March estimate of lowincome children without health insurance. The additional 10,000 households were added to the sample over a 3-month period. The BLS chose not to include the 2–6 additional households in the official labor force estimates, however, until it had sufficient time to evaluate the estimates from the 60,000 household sample. See Appendix J, Changes to the Current Population Survey Sample in July 2001, for details. REFERENCES Eckler, R.A. (1972), U.S. Census Bureau, New York: Praeger. Executive Office of the President, Office of Management and Budget, Statistical Policy Division (1976), Federal Statistics: Coordination, Standards, Guidelines: 1976, Washington, DC: Government Printing Office. 3–1 Chapter 3. Design of the Current Population Survey Sample (See Appendix H for sample design changes as of January 1996 and Appendix J for changes in July 2001.) INTRODUCTION For more than five decades, the Current Population Survey (CPS) has been one of the major sources of up-to-date information on the labor force and demographic characteristics of the U.S. population. Because of the CPS’s importance and high profile, the reliability of the estimates has been evaluated periodically. The design has often been under constant and close scrutiny in response to demand for new data and to improve the reliability of the estimates by applying research findings and new types of information (especially census results). All changes are implemented with concern for minimizing cost and maximizing comparability of estimates across time. A sample redesign takes place after each census. The most recent decennial revision which incorporated new information from the 1990 census was, in the main, complete as of July 1995. Thus, this chapter describes the CPS sample design as of July 1995. In January 1996, the CPS design was again modified because of funding reductions. This time, the redesign was restricted to the most populous states where survey requirements were relaxed compared to earlier years. These changes in the sample are not reflected in the main body of this report, but the reader is referred to Appendix H which provides an up-to-date and contemporary description of the changes introduced in 1996. Periodic modifications to the CPS design are occasionally needed to respond to growth of the housing stock. Appendix C discusses how this type of design modification is implemented. Any changes to the CPS design that postdate publication of the present document will be described separately in future appendices. Effective with the release of July 2001 data, official labor force estimates from the CPS and Local Area Unemployment Statistics (LAUS) program reflect the expansion of the monthly CPS sample. The State Children’s Health Insurance Program (SCHIP) legislation requires the Census Bureau to improve state estimates of the number of children who live in low-income families and lack health insurance. These estimates are obtained from the Annual Demographic Supplement to the CPS. States were identified for sample supplementation based on the standard error of their March estimate of low-income children without health insurance. See Appendix J for details. This chapter is directed to a general audience and presents many topics with varying degrees of detail. The following section provides a broad overview of the CPS design and is recommended for all readers. Later sections of this chapter provide a more in-depth description of the CPS design and are recommended for readers who require greater detail. SURVEY REQUIREMENTS AND DESIGN Survey Requirements The following list briefly describes the major characteristics of the CPS sample as of July 1995: 1. The CPS sample is a probability sample. 2. The sample is designed primarily to produce national and state estimates of labor force characteristics of the civilian noninstitutional population 16 years of age and older (CNP16+). 3. The CPS sample consists of independent samples in each state and the District of Columbia. In other words, each state’s sample is specifically tailored to the demographic and labor market conditions that prevail in that particular state. California and New York State are further divided into two substate areas that also have independent designs: the Los Angeles-Long Beach metropolitan area and the rest of California; New York City and the rest of New York State. Since the CPS design consists of independent designs for the states and substate areas, it is said to be state-based. 4. Sample sizes are determined by reliability requirements which are expressed in terms of the coefficient of variation, or CV. The CV is a relative measure of the sampling error and is calculated as sampling error divided by the expected value of the given characteristic. The specified CV for the monthly unemployment level for the nation, given a 6 percent unemployment rate, is 1.8 percent. The 1.8 percent CV is based on the requirement that a difference of 0.2 percent in the unemployment rate for two consecutive months be significant at the 0.10 level. 5. The specified CV for the monthly unemployment level for 11 states, given a 6 percent unemployment rate, is 8 percent.1 The specified CV for the monthly unemployment level for the California and New York substate areas, given a 6 percent unemployment rate, is 9 1 The 11 states are California, Florida, Illinois, Massachusetts, Michigan, New Jersey, New York, North Carolina, Ohio, Pennsylvania, and Texas. 3–2 percent. This latter specification leads to California and New York State having CVs somewhat less than 8 percent. 6. The required CV on the annual average unemployment level for the other 39 states and the District of Columbia, given a 6 percent unemployment rate, is 8 percent. Overview of Survey Design The CPS sample is a multistage stratified sample of approximately 56,000 housing units from 792 sample areas designed to measure demographic and labor force characteristics of the civilian noninstitutional population 16 years of age and older. The CPS samples housing units from lists of addresses obtained from the 1990 Decennial Census of Population and Housing. These lists are updated continuously for new housing built after the 1990 census. The first stage of sampling involves dividing the United States into primary sampling units (PSUs) — most of which comprise a metropolitan area, a large county, or a group of smaller counties. Every PSU falls within the boundary of a state. The PSUs are then grouped into strata on the basis of independent information, that is, information obtained from the decennial census or other sources. The strata are constructed so that they are as homogeneous as possible with respect to labor force and other social and economic characteristics that are highly correlated with unemployment. One PSU is sampled per stratum. The probability of selection for each PSU in the stratum is proportional to its population as of the 1990 census. In the second stage of sampling, a sample of housing units within the sample PSUs is drawn. Ultimate sampling units (USUs) are clusters of about four housing units. The bulk of the USUs sampled in the second stage consists of sets of addresses which are systematically drawn from sorted lists of addresses of housing units prepared as part of the 1990 census. Housing units from blocks with similar demographic composition and geographic proximity are grouped together in the list. In parts of the United States where addresses are not recognizable on the ground, USUs are identified using area sampling techniques. Occasionally, a third stage of sampling is necessary when actual USU size is extremely large. A final addition to the USUs is a sample of building permits, which compensates for the exclusion of construction since 1990 in the list of addresses in the 1990 census. Each month, interviewers collect data from the sample housing units. A housing unit is interviewed for 4 consecutive months and then dropped out of the sample for the next 8 months and is brought back in the following 4 months. In all, a sample housing unit is interviewed eight times. Households are rotated in and out of the sample in a way that improves the accuracy of the month-to-month and year-to-year change estimates. The rotation scheme ensures that in any 1 month, one-eighth of the housing units are interviewed for the first time, another eighth is interviewed for the second time, and so on. That is, after the first month, 6 of the 8 rotation groups will have been in the survey for the previous month — there will always be a 75 percent month-to-month overlap. When the system has been in full operation for 1 year, 4 of the 8 rotation groups in any month will have been in the survey for the same month, 1 year ago; there will always be a 50 percent year-to-year overlap. This rotation scheme fully upholds the scientific tenets of probability sampling, so that each month’s sample produces a true representation of the target population. Also, this rotation scheme is considered better than other candidate schemes. For example, undue reporting burden expected of survey respondents if they were to constitute a permanent panel is avoided. The properties of the rotation system also show that it could be used to reduce sampling error by use of a composite estimation procedure2 and, at slight additional cost, by increasing the representation in the sample of USUs with unusually large numbers of housing units. Each state’s sample design ensures that most housing units within a state have the same overall probability of selection. Because of the state-based nature of the design, sample housing units in different states have different overall probabilities of selection. It is true that if we considered only the national level, a more efficient design would result from using the same overall probabilities for all states. Nevertheless, the current system of state-based designs ensures that both the state and national reliability requirements are met. FIRST STAGE OF THE SAMPLE DESIGN The first stage of the CPS sample design is the selection of counties. The purpose of selecting a subset of counties instead of having all counties in the sample is to reduce travel costs for the field representatives. Two features of the first-stage sampling are: (1) to ensure that sample counties represent other counties with similar labor force characteristics that are not selected and (2) to ensure that each field representative is allotted a manageable workload in his/her sample area. The first stage-sample selection is carried out in three major steps: 1. Definition of the PSUs. 2. Stratification of the PSUs within each state. 3. Selection of the sample PSUs in each state. 2 The complete estimation procedure results in significant reduction in the sampling error of estimates of level and change for most items. This procedure depends on the fact that data from previous months are usually highly correlated with the corresponding estimates for the current month. 3–3 Definition of the Primary Sampling Units PSUs are delineated in such a way that they encompass the entire United States. The land areas within each PSU are made reasonably compact so they can be traversed by an interviewer without incurring unreasonable costs. The population is as heterogeneous with regard to labor force characteristics as can be made consistent with the other constraints. Strata are constructed that are homogenous in terms of labor force characteristics to minimize betweenPSU variance. Between-PSU variance is a component of total variance which arises from selecting a sample of PSUs rather than selecting housing units from all PSUs. In each stratum, a PSU is selected that is representative of the other PSUs in the same stratum. When revisions are made in the sample each decade, a procedure used for reselection of PSUs maximizes the overlap in the sample PSUs with the previous CPS sample (see Appendix A). Most PSUs are groups of contiguous counties rather than single counties. A group of counties is more likely to have diverse labor force characteristics rather than a single county. Limits are placed on the geographic size of a PSU to contain the distance a field representative must travel. After some empirical research in the late 1940s to help establish rules, the PSUs were initially established in late 1949 and early 1950. The original definitions were subsequently modified and now conform to the rules listed below. Rules for Defining PSUs 1. PSUs are contained within state boundaries. 2. Metropolitan areas are defined as separate PSUs using projected 1990 Metropolitan Statistical Area (MSA) definitions. (An MSA is defined to be at least one county.) If an MSA straddles state boundaries, each state-MSA intersection is a separate PSU.3 3. For most states, PSUs are either one county or two or more contiguous counties. For the New England states4 and part of Hawaii, minor civil divisions (towns or townships) define the PSUs. In some states, county equivalents are used: cities, independent of any county organization, in Maryland, Missouri, Nevada, and Virginia; parishes in Louisiana; and boroughs and census divisions in Alaska. 4. The area of the PSU should not exceed 3,000 square miles except in cases where a single county exceeds the maximum area. 3 Final MSA definitions were not available from the Office of Management and Budget when PSUs were defined. Fringe counties having a good chance of being in final MSA definitions are separate PSUs. Most projected MSA definitions are the same as final MSA definitions (Executive Office of the President, 1993). 4 The New England states are Rhode Island, Connecticut, Massachusetts, Maine, New Hampshire, and Vermont. 5. The population of the PSU is at least 7,500 except where this would require exceeding the maximum area specified in number 4. 6. In addition to meeting the limitation on total area, PSUs are formed to limit extreme length in any direction and to avoid natural barriers within the PSU. Combining counties into PSUs. The PSU definitions are reviewed each time the CPS sample design is revised. Before 1980, almost all changes in the composition of the PSUs were made to reflect changes in definitions of MSAs. For 1980, revised PSU definitions reflect new MSA definitions and ensure that the PSU definitions were compatible with a state-based sample design. For 1990, revised PSU definitions reflect changes in MSA definitions and make the PSU definitions more consistent with those used by the other Census Bureau demographic surveys. The following are steps for combining counties, county equivalents, and independent cities into PSUs for 1990. 1. The 1980 PSUs are evaluated by incorporating into the PSU definitions those counties comprising MSAs that are new or have been redefined. 2. Any single county is classified as a separate PSU, regardless of its 1990 population, if it exceeds the maximum area limitation deemed practical for interviewer travel. 3. Other counties within the same state are examined to determine whether they might advantageously be combined with contiguous counties without violating the population and area limitations. 4. Contiguous counties with natural geographic barriers between them are placed in separate PSUs to reduce the cost of travel within PSUs. 5. The proposed combinations are reviewed. Although personal judgment can have no place in the actual selection of sample units, (known probabilities of selection can be achieved only through a random selection process) there are a large number of ways in which a given population can be structured and arranged prior to sampling. Personal judgment legitimately plays an important role in devising an optimal arrangement; that is, one designed to minimize the variances of the sample estimates subject to cost constraints. These steps result in 2,007 CPS PSUs in the United States from which to draw a sample. Stratification of Primary Sampling Units The CPS sample design calls for combining PSUs into strata within each state and selecting one PSU from each stratum. For this type of sample design, sampling theory 3–4 suggests forming strata with approximately equal population sizes. When the design is self-weighting (same sampling fraction in all strata) and one field representative is assigned to each sample PSU, equal stratum sizes also have the advantage of providing equal field representative workloads (at least during the early years of each decade, before population growth and migration significantly affect the PSU population sizes). The objective of the stratification, therefore, is to group PSUs with similar characteristics into strata having approximately equal 1990 populations. Sampling theory also dictates that highly populated PSUs should be selected for sample with certainty. The rationale is that some PSUs exceed or come close to the stratum size needed for equalizing stratum sizes. These PSUs are designated as self-representing (SR); that is, each of the SR PSUs is treated as a separate stratum and is included in the sample. The following describes the steps for stratifying PSUs for the 1990 redesign. 1. The PSUs required to be SR are identified if the PSU meets one of the following criteria: a. The PSU belongs to one of the 150 MSAs with the largest populations in the 1990 census or the PSU contains counties which had a good chance of joining one of these 150 MSAs under final MSA definitions. b. The PSU belongs to an MSA that was SR for the 1980 design and among the 150 largest following the 1980 census. 2. The remaining PSUs are grouped into nonself-representing (NSR) strata within state boundaries by adhering to the following criteria: a. Roughly equal-sized NSR strata are formed within a state. b. NSR strata are formed so as to yield reasonable field representative workloads in an NSR PSU of roughly 45 to 60 housing units. The number of NSR strata in a state is a function of 1990 population, civilian labor force, state CV, and between-PSU variance on the unemployment level. (Workloads in NSR PSUs are constrained because one field representative must canvass the entire PSU. No such constraints are placed on SR PSUs.) c. NSR strata are formed with PSUs homogeneous with respect to labor force and other social and economic characteristics that are highly correlated with unemployment. This helps to minimize the between-PSU variance. d. Stratification is performed independently of previous CPS sample designs. Key variables used for stratification are: • Number of male unemployed. • Number of female unemployed. • Number of families with female head of household. • Ratio of occupied housing units with three or more persons, of all ages, to total occupied housing units. In addition to these, a number of other variables such as industry and wage variables obtained from the Bureau of Labor Statistics are used for some states. The number of stratification variables in a state ranges from 3 to 12. Table 3–1 summarizes the number of SR and NSR strata in each state. (The other columns of the table are discussed in later sections of this chapter.) The algorithm for implementing the NSR stratification criteria for the 1980 and 1990 sample designs is a modified version of the Friedman-Rubin clustering algorithm (Kostanich, 1981). The algorithm consists of three basic steps: hillclimbing, size adjustment, and an exchange pass. For each state, the algorithm identifies a stratification which meets two criteria: (1) all strata are about the same size and (2) the value of the objective function—a scaled total between-PSU variance for all the stratification variables—is relatively small. Each of the algorithm’s three steps assigns slightly different priorities to criteria (1) and (2). Before the start of the first step, the program groups the PSUs within a state into randomly defined strata. The algorithm then ‘‘swaps’’ PSUs between strata to reduce size disparity between the strata or to decrease the value of the objective function. The hillclimbing procedure moves PSUs from stratum to stratum, subject to loose size constraints, in order to minimize the between-PSU variance for stratification variables.5 The size adjustment tightens size constraints and adjusts stratum sizes by making moves that lead to the smallest increases in between-PSU variance. With tight size constraints, the exchange pass seeks to further reduce between-PSU variance by exchanging PSUs between strata. The algorithm is run several times allowing stratum sizes to vary by differing degrees. A final stratification is chosen which minimizes, to the extent possible, variability in stratum workloads and total between-PSU variance for all stratification variables for the state. If a stratification results in an NSR PSU being placed in a stratum by itself, the PSU is then SR. After the strata are defined, some state sample sizes are adjusted to bring the national CV for unemployment level down to 1.8 percent assuming a 6 percent unemployment rate. (The stratification procedure for Alaska takes into account expected interview cost and betweenPSU variance (Ludington, 1992).) 5 Between-PSU variance is the component of total variance arising from selecting a sample of PSUs from all possible PSUs. For this stratification process, the between-PSU variance was calculated using 1990 census data for the stratification variables. 3–5 A consequence of the above stratification criteria is that states that are geographically small, mostly urban, or demographically homogeneous are entirely SR. These states are Connecticut, Delaware, Massachusetts, New Hampshire, New Jersey, Rhode Island, Vermont, and the District of Columbia. Selection of Sample Primary Sampling Units Each SR PSU is in the sample by definition. As shown in Table 3–1, there are 432 SR PSUs. In each of the remaining 360 NSR strata, one PSU is selected for the sample following the guidelines described next. At each sample redesign of the CPS, it is important to minimize the cost of introducing a new set of PSUs. Substantial investment has been made in the hiring and training of field representatives in the existing sample PSUs. For each PSU dropped from the sample and replaced by another in the new sample, the expense of hiring and training a new field representative must be accepted. Furthermore, there is a temporary loss in accuracy of the results produced by new and relatively inexperienced field representatives. Concern for these factors is reflected in the procedure used for selecting PSUs. Objectives of the selection procedure. The selection of the sample of NSR PSUs is carried out within the strata using the 1990 population. The selection procedure accomplishes the following objectives: 1. Select one sample PSU from each stratum with probability proportional to the 1990 population. 2. Retain in the new sample the maximum number of sample PSUs from the 1980 design sample. Using the Maximum Overlap procedure described in Appendix A, one PSU is selected per stratum with probability proportional to its 1990 population. This procedure uses mathematical programming techniques to maximize the probability of selecting PSUs that are already in sample while maintaining the correct overall probabilities of selection. sample selection.6 The first-stage variance is called the between-PSU variance and the second-stage variance is called the within-PSU variance. The square of the state CV, or the relative variance, on the unemployment level is expressed as CV2 ⫽ where ␴b2 ␴w2 E(x) ␴b2 ⫹ ␴w2 关E共x兲兴2 共3.1兲 = between-PSU variance contribution to the variance of the state unemployment level estimator. = within-PSU variance contribution to the variance of the state unemployment level estimator. = the expected value of the unemployment level for the state. The term, ␴w2, can be written as the variance assuming a binomial distribution from a simple random sample multiplied by a design effect ␴w2 ⫽ where N p N2 p q 共deff兲 n = the civilian noninstitutional population, 16 years of age and older (CNP16+), for the state. = proportion of unemployed in the CNP16+ x for the state, or N. Substituting. q = 1 – p. n = the state sample size. deff = the state within-PSU design effect. This is a factor accounting for the difference between the variance calculated from a multistage stratified sample and that from a simple random sample. This formula can be rewritten as ␴w2 ⫽ SI 共x q兲共deff兲共3.2兲 where Calculation of overall state sampling interval. After stratifying the PSUs within the states, the overall sampling interval in each state is computed. The overall state sampling interval is the inverse of the probability of selection of each housing unit in a state for a self-weighting design. By design, the overall state sampling interval is fixed, but the state sample size is not fixed allowing growth of the CPS sample because of housing units built after the 1990 census. (See Appendix C for details on how the desired sample size is maintained.) The state sampling interval is designed to meet the requirements for the variance on an estimate of the unemployment level. This variance can be thought of as a sum of variances from the first stage and the second stage of SI N = the state sampling interval, or n . Substituting (3.2) into (3.1) and rewriting in terms of the state sampling interval gives SI ⫽ CV2x2 ⫺ ␴b2 x q deff 6 The variance of an estimator, u, based on a two-stage sample has the general form Var共u兲 ⫽ VarIEII共u兩 set of sample PSUs兲 ⫹ EIVarII共u兩 set of sample PSUs兲 where I and II represent the first and second stage designs, respectively. The left term represents the between-PSU variance,␴b2. The right term represents the within-PSU variance, ␴w2. 3–6 Table 3–1. Number and Estimated Population of Strata for 792-PSU Design by State Self-representing (SR) State Number of strata Nonself-representing (NSR) Estimated population1 Number of strata Estimated population1 Overall sampling interval Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alabama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alaska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arizona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arkansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . California . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Los Angeles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remainder of California. . . . . . . . . . . . . . . . . . . . . . . . . . . Colorado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connecticut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Delaware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432 4 5 3 4 21 1 20 5 20 3 141,876,295 1,441,118 287,192 2,188,880 612,369 20,866,697 6,672,035 14,194,662 1,839,521 2,561,010 509,330 360 10 5 4 13 5 0 5 5 0 0 45,970,646 1,604,901 91,906 544,402 1,151,315 1,405,500 0 1,405,500 629,640 0 0 2,060 2,298 336 2,016 1,316 2,700 1,691 3,132 1,992 2,307 505 District of Columbia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Georgia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hawaii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Idaho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iowa. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kentucky. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 20 9 3 8 11 9 5 3 9 486,083 9,096,732 2,783,587 778,364 416,284 6,806,874 2,215,745 710,910 920,819 1,230,062 0 8 9 1 8 12 8 11 9 9 0 1,073,818 2,038,631 49,720 302,256 1,821,548 1,949,996 1,372,871 906,626 1,546,770 356 2,176 3,077 769 590 1,810 3,132 1,582 1,423 2,089 Louisiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maryland. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massachusetts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minnesota. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mississippi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Montana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nebraska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 8 4 31 12 4 6 7 8 2 1,638,220 685,937 3,310,546 4,719,188 5,643,817 2,137,810 540,351 2,394,137 366,320 557,203 9 4 2 0 11 7 14 6 8 9 1,403,969 247,682 352,766 0 1,345,074 1,119,797 1,334,898 1,457,616 221,287 608,513 2,143 838 3,061 954 1,396 2,437 1,433 3,132 431 910 Nevada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Hampshire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Mexico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York City . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remainder of New York . . . . . . . . . . . . . . . . . . . . . . . . . . North Carolina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . North Dakota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ohio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 13 11 6 13 1 12 14 4 13 819,424 846,029 6,023,359 685,543 12,485,029 5,721,495 6,763,534 3,089,176 232,865 6,238,600 2 0 0 8 7 0 7 23 9 13 98,933 0 0 410,929 1,414,548 0 1,414,548 1,973,074 235,364 1,958,138 961 857 1,221 867 1,709 1,159 2,093 1,095 363 1,653 Oklahoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oregon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rhode Island . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . South Carolina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . South Dakota. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tennessee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Utah. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vermont . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 4 14 5 7 5 8 22 2 11 1,244,920 1,396,261 7,704,963 784,090 1,434,621 211,146 2,502,671 8,779,997 888,524 428,263 11 6 11 0 7 11 6 20 3 0 1,092,513 761,794 1,507,946 0 1,161,298 291,546 1,219,915 3,613,170 251,746 0 1,548 1,904 1,757 687 2,291 376 3,016 2,658 958 410 Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Washington . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . West Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wisconsin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wyoming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5 10 7 8 3,093,947 2,437,454 803,797 1,773,109 227,401 6 6 9 9 6 1,612,667 1,216,557 581,544 1,889,141 98,321 3,084 2,999 896 2,638 253 1 Estimate of civilian noninstitutional population 16 years of age and older based on preliminary 1990 census counts. 3–7 Generally, this overall state sampling interval is used for all strata in a state yielding a self-weighting state design. (In some states, the sampling interval is adjusted in certain strata to equalize field representative workloads.) When computing the sampling interval for the current CPS sample, a 6 percent state unemployment rate is assumed for 1995. The results are given in Table 3–1, which was provided earlier. SECOND STAGE OF THE SAMPLE DESIGN The second stage of the CPS sample design is the selection of sample housing units within PSUs. The objectives of within-PSU sampling are to: 1. Select a probability sample that is representative of the total civilian, noninstitutional population. 2. Give each housing unit in the population one and only one chance of selection, with virtually all housing units in a state having the same overall chance of selection. 3. For the sample size used, keep the within-PSU variance on labor force statistics (in particular, unemployment) at as low a level as possible, subject to response burden, costs, and other constraints. 4. Select enough within-PSU sample for additional samples that will be needed before the next decennial census. 5. Put particular emphasis on providing reliable estimates of monthly levels and change over time of labor force items. 1990 and later is used to supplement the census data. These data are collected via the Building Permit Survey, which is an ongoing survey conducted by the Census Bureau. Therefore, a list sample of census addresses, supplemented by a sample of building permits, is used in most of the United States. However, where city-type street addresses from the 1990 census do not exist, or where residential construction does not need or require building permits, area samples are sometimes necessary. (See the next section for more detail on the development of the sampling frames.) These sources provide sampling information for numerous demographic surveys conducted by the Census Bureau.7 In consideration of respondents, sampling methodologies are coordinated among these surveys to ensure a sampled housing unit is selected for one survey only. Consistent definition of sampling frames allows for the development of separate, optimal sampling schemes for each survey. The general strategy for each survey is to sort and stratify all the elements in the sampling frame (eligible and not eligible) to satisfy individual survey requirements, select a systematic sample, and remove the selected sample from the frame. Sample is selected for the next survey from what remains. Procedures are developed to determine eligibility of sample cases at the time of interview for each survey. This coordinated sampling approach is computer intensive and was not possible in previous redesigns.8 Development of Sampling Frames Overview of Sampling Sources Results from the 1990 census, the Building Permit Survey, and the relationship between these two sources are used in developing sampling frames. Four frames are created: the unit frame, the area frame, the group quarters frame, and the permit frame. The unit, area, and group quarters frames are collectively called old construction. To describe frame development methodology, several terms must be defined. Two types of living quarters were defined for the census. The first type is a housing unit. A housing unit is a group of rooms or a single room occupied as a separate living quarter or intended for occupancy as a separate living quarter. A separate living quarter is one in which the occupants live and eat separately from all other persons on the property and have direct access to their living quarter To accomplish the objectives of within-PSU sampling, extensive use is made of data from the 1990 Decennial Census of Population and Housing and the Building Permit Survey. The 1990 census collected information on all living quarters existing as of April 1, 1990, including characteristics of living quarters as well as the demographic composition of persons residing in these living quarters. Data on the economic well-being and labor force status of individuals were solicited for about 1 in 6 housing units. However, since the census does not cover housing units constructed since April 1, 1990, a sample of building permits issued in 7 CPS sample selection was coordinated with the following demographic surveys in the 1990 redesign: the American Housing Survey Metropolitan Sample, the American Housing Survey - National sample, the Consumer Expenditure Survey - Diary sample, the Consumer Expenditure Survey - Quarterly sample, the Current Point of Purchase Survey, the National Crime Victimization Survey, the National Health Interview Survey, the Rent and Property Tax Survey, and the Survey of Income and Program Participation. 8 This sampling strategy is unbiased because if a random selection is removed from a frame, the part of the frame that remains is a random subset. Also, the sample elements selected and removed from each frame for a particular survey have similar characteristics as the elements remaining in the frame. USUs are the sample units selected during the second stage of the CPS sample design. As discussed earlier in this chapter, most USUs consist of a geographically compact cluster of approximately four addresses, corresponding to four housing units at the time of the census. Use of housing unit clusters lowers travel costs for field representatives. Clustering slightly increases within-PSU variance of estimates for some labor force characteristics since respondents within a compact cluster tend to have similar labor force characteristics. 3–8 from the outside or through a common hall or lobby as found in apartment buildings. A housing unit may be occupied by a family or one person, as well as by two or more unrelated persons who share the living quarter. About 98 percent of the population counted in the 1990 census resided in housing units. The second type of living quarter is a group quarters. A group quarters is a living quarter where residents share common facilities or receive formally authorized care. Examples include college dormitories, retirement homes, and communes. For some group quarters, such as fraternity and sorority houses and certain types of group houses, a group quarters is distinguished from a housing unit if it houses ten or more unrelated people. The group quarters population is classified as institutional or noninstitutional and as military or civilian. CPS targets only the civilian noninstitutional population residing in group quarters. Military and institutional group quarters are included in the group quarters frame and given a chance of selection in case of conversion to civilian noninstitutional housing by the time it is scheduled for interview. Less than 2 percent of the population counted in the 1990 census resided in group quarters. Old Construction Frames Old construction consists of three sampling frames: unit, area, and group quarters. The primary objectives in constructing the three sampling frames are maximizing the use of census information to reduce variance of estimates, ensuring adequate coverage, and minimizing cost. The sampling frames used in a particular geographic area take into account three major address features: 1. Type of living quarters — housing units or group quarters. 2. Completeness of addresses — complete or incomplete. 3. Building permit office coverage — covered or not covered. An address is considered complete if it describes a specific location; otherwise, the address is considered incomplete. (When the 1990 census addresses cannot be used to locate sample units, area listings must be performed in those areas before sample units can be selected for interview. See Chapter 4 for more detail.) Examples of a complete address are city delivery types of mailing addresses composed of a house number, street name, and possibly a unit designation, such as ‘‘1599 Main Street’’ or ‘‘234 Elm Street, Apartment 601.’’ Examples of incomplete addresses are addresses composed of postal delivery information without indicating specific locations, such as ‘‘PO Box 123’’ or ‘‘Box 4’’ on a rural route. Housing units in complete blocks covered by building permit offices are assigned to the unit frame. Group quarters in complete blocks covered by building permit offices are assigned to the group quarters frame. Other blocks are assigned to the area frame. Unit frame. The unit frame consists of housing units in census blocks that contain a very high proportion of complete addresses and are essentially covered by building permit offices. The unit frame covers most of the population. (Although building permit offices cover nearly all blocks in the unit frame, a few exceptions may slightly compromise CPS coverage of the target population (see Chapter 16)). A USU in the unit frame consists of a compact cluster of four addresses, which are identified during sample selection. The addresses, in most cases, are those for separate housing units. However, over time some buildings may be demolished or converted to nonresidential use, and others may be split up into several housing units. These addresses remain sample units, resulting in a small variability in cluster size. Also, USUs usually cover neighboring housing units, though, occasionally they are dispersed across a neighborhood, resulting in a USU with housing units from different blocks. Area frame. The area frame consists of housing units and group quarters in census blocks that contain a high proportion of incomplete addresses, or are not covered by building permit offices. A CPS USU in the area frame also consists of about four housing unit equivalents, except in some areas of Alaska that are difficult to access where a USU is eight housing unit equivalents. The area frame is converted into groups of four housing unit equivalents called ‘‘measures’’ because the census addresses of individual housing units or persons within a group quarters are not used in the sampling. An integer number of area measures is calculated at the census block level. The number is referred to as the area block measure of size (MOS) and is calculated as follows: H area block MOS ⫽ 4 ⫹ 关GQ block MOS兴共3.3兲 where H = the number of housing units enumerated in the block for the 1990 census. GQ block MOS = the integer number of group quarters measures in a block (see equation 3.4). The first term of equation (3.3) is rounded to the nearest nonzero integer. When the fractional part is 0.5 and the term is greater than 1, it is rounded to the nearest even integer. Sometimes census blocks are combined with geographically nearby blocks before the area block MOS is calculated. This is done to ensure that newly constructed units 3–9 have a chance of selection in blocks with no housing units or group quarters at the time of the census and that are not covered by a building permit office. This also reduces the sampling variability caused by USU size differing from four housing unit equivalents for small blocks with fewer than four housing units. Depending on whether or not a block is covered by a building permit office, area frame blocks are classified as area permit or area nonpermit. No distinction is made between area permit and area nonpermit blocks during sampling. Field procedures are developed to ensure proper coverage of housing units built after the 1990 census in the area blocks to (1) prevent these housing units from having a chance of selection in area permit blocks and (2) give these housing units a chance of selection in area nonpermit blocks. These field procedures have the added benefit of assisting in keeping USU size constant as the number of housing units in the block increases because of new construction. Group quarters frame. The group quarters frame consists of group quarters in census blocks that contain a sufficient proportion of complete addresses and are essentially covered by building permit offices. Although nearly all blocks are covered by building permit offices, some are not, which may result in minor undercoverage. The group quarters frame covers a small proportion of the population. A CPS USU in the group quarters frame consists of four housing unit equivalents. The group quarters frame, like the area frame, is converted into housing unit equivalents because 1990 census addresses of individual group quarters or persons within a group quarters are not used in the sampling. The number of housing unit equivalents is computed by dividing the 1990 census group quarters population by the average number of persons per household (calculated from the 1990 census as 2.63). An integer number of group quarters measures is calculated at the census block level. The number of group quarters measures is referred to as the GQ block MOS and is calculated as follows: NIGQPOP GQ block MOS ⫽ 共4兲共2.63兲 ⫹ MIL ⫹ IGQ 共3.4兲 where NIGQPOP = the noninstitutional group quarters population in the block from the 1990 census. MIL = the number of military barracks in the block from the 1990 census. IGQ = 1 if one or more institutional group quarters are in the block or 0 if no institutional group quarters are in the block from the 1990 census. The first term of equation (3.4) is rounded to the nearest nonzero integer. When the fractional part is 0.5 and the term is greater than 1, it is rounded to the nearest even integer. Only the civilian noninstitutional population is interviewed for CPS. Military barracks and institutional group quarters are given a chance of selection in case group quarters convert status over the decade. A military barrack or institutional group quarters is equivalent to one measure regardless of the number of people counted there in the 1990 census. Special situations in old construction. During development of the old construction frames, several situations are given special treatment. Military and national park blocks are treated as if covered by a building permit office to increase the likelihood of being in the unit or group quarters frames to minimize costs. Blocks in American Indian Reservations are treated as if not covered by a building permit office and are put in the area frame to improve coverage. To improve coverage of newly constructed college housing, special procedures are used so blocks with existing college housing and small neighboring blocks are in the area frame. Blocks in Ohio which are covered by building permit offices that issue permits for only certain types of structures are treated as area nonpermit blocks. Two examples of blocks excluded from sampling frames are blocks consisting entirely of docked maritime vessels where crews reside and street locations where only homeless people were enumerated in the 1990 census. The Permit Frame Permit frame sampling ensures coverage of housing units built since the 1990 census. The permit frame grows as building permits are issued during the decade. Data collected by the Building Permit Survey are used to update the permit frame monthly. About 92 percent of the population lives in areas covered by building permit offices. Housing units built since the 1990 census in areas of the United States not covered by building permit offices have a chance of selection in the nonpermit portion of the area frame. Group quarters built since the 1990 census are generally not covered in the permit frame, although the area frame does pick up new group quarters. (This minor undercoverage is discussed in Chapter 16.) A permit measure which is equivalent to a CPS USU is formed within a permit date and a building permit office resulting in a cluster containing an expected four newly built housing units. The integer number of permit measures is referred to as the BPOMOS and is calculated as follows: HPt BPOMOSt ⫽ 4 共3.5兲 where HPt = the total number of housing units for which the building permit office issues permits for a time period, t, normally a month; for example, a building permit office issued 2 permits for a total 24 housing units to be built in month t. 3–10 BPOMOS for time period t is rounded to the nearest integer except when nonzero and less than 1, then it is rounded to 1. Permit cluster size varies according to the number of housing units for which permits are actually issued. Also, the number of housing units for which permits are issued may differ from the number of housing units that actually get built. When developing the permit frame, an attempt is made to ensure inclusion of all new housing units constructed after the 1990 census. To do this, housing units for which building permits had been issued but which had not yet been constructed by the time of the census should be included in the permit frame. However, by including permits issued prior to the 1990 census in the permit frame, there is a risk that some of these units will have been built by the time of the census and, thus, included in the old construction frame. These units will then have two chances of selection in the CPS: one in the permit frame and one in the old construction frames. For this reason, permits issued too long before the census should not be included in the permit frame. However, excluding permits issued long before the census brings the risk of excluding units for which permits were issued but which had not yet been constructed by the time of the census. Such units will have no chance of selection in the CPS, since they are not included in either the permit or old construction frames. In developing the permit frame, an attempt is made to strike a reasonable balance between these two problems. Summary of Sampling Frames Before providing a summary of the various sampling frames, an exception is noted. Census blocks containing sample selected by the National Health Interview Survey (NHIS) were also included in the area frame to ensure a housing unit was in sample for only one demographic survey. That is, any sample (both housing unit and group quarters) selected for the NHIS was transferred to the area frame. Therefore, group quarters were in both the area and group quarters frames. The NHIS had an all area frame sample design because it was not conducted under Title 13; thus, it was prohibited from selecting a sample of 1990 census addresses. Table 3–2 summarizes the features of the sampling frames and CPS USU size discussed above. Roughly 65 percent of the CPS sample is from the unit frame, 30 percent is from the area frame, and 1 percent is from the group quarters frame. In addition, about 5 percent of the sample is from the permit frame initially. The permit frame has grown, historically, about 1 percent a year. Optimal cluster size or USU composition differs for the demographic surveys. The unit frame allows each survey a choice of cluster size. For the area, group quarters, and permit frames, MOS must be defined consistently for all demographic surveys. Table 3–2. Summary of Sampling Frames Frame Typical characteristics of frame CPS USU Unit frame . . . . . . . . . . . High percentage of Compact cluster of four complete addresses in addresses areas covered by a building permit office Group quarters frame . High percentage of complete addresses in areas covered by a building permit office Measure containing group quarters of four expected housing unit equivalents Area frame Area permit . . . . . . . . Many incomplete addresses in areas covered by a building permit office Area nonpermit . . . . Not covered by a building permit office Measure containing housing units and group quarters of four expected housing unit equivalents Permit frame. . . . . . . . . Housing units built Cluster of four expected since 1990 census in housing units areas covered by a building permit office Selection of Sample Units The CPS sample is designed to be self-weighting by state or substate area. A systematic sample is selected from each PSU at a sampling rate of 1 in k, where k is the within-PSU sampling interval which is equal to the product of the PSU probability of selection and the stratum sampling interval. The stratum sampling interval is usually the overall state sampling interval. (See the earlier section in this chapter, ‘‘Calculation of overall state sampling interval.’’) The first stage of selection is conducted independently for each demographic survey involved in the 1990 redesign. Sample PSUs overlap across surveys and have different sampling intervals. To make sure housing units get selected for only one survey, the largest common geographic areas obtained when intersecting each survey’s sample PSUs are identified. These intersecting areas, as well as the residual areas of those PSUs, are called basic PSU components (BPCs). A CPS stratification PSU consists of one or more BPCs. For each survey, a within-PSU sample is selected from each frame within BPCs. However, sampling by BPCs is not an additional stage of selection. After combining sample from all frames for all BPCs in a PSU, the resulting within-PSU sample is representative of the entire civilian, noninstitutional population of the PSU. When CPS is not the first survey to select a sample in a BPC, the CPS within-PSU sampling interval is decreased to maintain the expected CPS sample size after other surveys have removed sampled USUs. When a BPC does not include enough sample to support all surveys present in the BPC for the decade, each survey proportionally reduces its expected sample size for the BPC. This makes a state no longer self-weighting, but this adjustment is rare. 3–11 CPS sample is selected separately within each sampling frame. Since sample is selected at a constant overall rate, the percentage of sample selected from each frame is proportional to population size. Although the procedure is the same for all sampling frames, old construction sample selection is performed once for the decade while permit frame sample selection is an ongoing process each month throughout the decade. Within-PSU Sort Units or measures are arranged within sampling frames based on characteristics of the 1990 census and geography. Sorting minimizes within-PSU variance of estimates by grouping together units or measures with similar characteristics. The 1990 census data and geography are used to sort blocks and units. (Sorting is done within BPCs since sampling is performed within BPCs.) The unit frame is sorted on block level characteristics, keeping housing units in each block together, and then by a housing unit identification to sort the housing units geographically. Sorts are different for each frame and are provided in Tables 3–3a and 3–3b. General Sampling Procedure The CPS sampling is a one-time operation that involves selecting enough sample for the decade. To accommodate the CPS rotation system and the phasing in of new sample designs, 19 samples are selected. A systematic sample of USUs is selected and 18 adjacent sample USUs identified. The group of 19 sample USUs is known as a hit string. Due to the sorting variables, persons residing in USUs within a hit string are likely to have similar labor force characteristics. The within-PSU sample selection is performed independently by BPC and frame. Four dependent random numbers (one per frame) between 0 and 1 are calculated for each BPC within a PSU.9 Random numbers are used to calculate random starts. Random starts determine the first sampled USU in a BPC for each frame. The method used to select systematic samples of hit strings of USUs within each BPC and sampling frame follows: 1. Units or measures within the census blocks are sorted using the within-PSU sort criteria specified in Tables 3–3a and 3–3b. 2. Each successive USU not selected by another survey is assigned an index number 1 through N. 3. A random start (RS) for the BPC/frame is calculated. RS is the product of the dependent random number and the adjusted within-PSU sampling interval (SIw). 4. Sampling sequence numbers are calculated. Given N USUs, sequence numbers are: RS, RS ⫹ 共1共SIw兲兲, RS ⫹ 共2共SIw兲兲, ..., RS ⫹ 共n共SIw兲兲 where n is the largest integer such that RS + (n(SIw)) ≤ N. Sequence numbers are rounded up to the next integer. Each rounded sequence number represents the first unit or measure designating the beginning of a hit string. 5. Sequence numbers are compared to the index numbers assigned to USUs. Hit strings are assigned to sequence numbers. The USU with the index number matching the sequence number is selected as the first 9 Random numbers are evenly distributed by frame within BPC and by BPC within PSU to minimize variability of sample size. Table 3–3a. Old Construction Within-PSU Sorts Unit and area frames Group quarters frame Sort order Urban PSUs 1 . . . . . . . . . . Block CBUR classification 1 Rural PSUs Block CBUR classification All PSUs District Office 2 . . . . . . . . . . Proportion of minority renter occupied housing Proportion of households headed by females to Address Register Area units to all occupied housing units in the block all households in the block 3 . . . . . . . . . . Proportion of owner occupied housing units to Proportion of minority population to total popula- County code total population age 16 years and over tion in the block 4 . . . . . . . . . . Proportion of housing units to population age 16 Proportion of population age 65 and over to total MCD/CCD code years and over in the block population in the block 5 . . . . . . . . . . Proportion of households headed by females to Proportion of Black renter occupied housing units Block CBUR classification all households in the block to total Black population in the block 6 . . . . . . . . . . County code County code 7 . . . . . . . . . . Tract number Tract number 8 . . . . . . . . . . Combined block number (area frame only) Combined block number (area frame only) 9 . . . . . . . . . . Housing unit ID (unit frame only) Housing unit ID (unit frame only) 1 Block number A census block is classified as C, B, U, or R, which means: central city of 1990 MSA (C); balance of 1990 urbanized area (B); other urban (U); or rural (R). 3–12 Table 3–3b. Permit Frame Within-PSU Sort Sort order All PSUs 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . County code 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Building permit office 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Permit date sample. The 18 USUs that follow the sequence number are selected as the next 18 samples. This method may yield hit strings with less than 19 samples (called incomplete hit strings) at the beginning or end of BPCs.10 Allowing incomplete hit strings ensures that each USU has the same probability of selection. 6. A sample designation uniquely identifying 1 of the 19 samples is assigned to each USU in a hit string. For the 1990 design, sample designations A62 through A80 are assigned sequentially to the hit string. A62 is assigned to the first sample; A63 to the second sample; and assignment continues through A80 for the nineteenth sample. A sample designation suffix, A or B, is assigned in areas of Alaska that are difficult to access (in which USUs consist of eight housing unit equivalents). Example of Within-PSU Sample Selection for Old Construction Although this example is for old construction, selecting a systematic sample for the permit frame is similar except that sampling is performed on an imaginary universe (called a skeleton universe) consisting of an estimated number of USUs within each BPC. As monthly permit information becomes available, the skeleton universe gradually fills with issued permits and eventually specific sample addresses are identified. Table 3–4. Sampling Example Within a BPC for Any of the Three Old Construction Frames Census block 101 101A . . . . . . . . . . . . . . . . . . . . . . 103 104 . . . . . . . . . . . . . . . . . . . . . . . 106 . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 107A . . . . . . . . . . . . . . . . . . . . . . 108 108D . . . . . . . . . . . . . . . . . . . . . . The following example illustrates selection of withinPSU sample for an old construction frame within a BPC. Assume blocks have been sorted within a BPC. The BPC contains 18 unsampled USUs (N=18) and each USU is assigned an index number (1, 2, ..., 18). The dependent random number is 0.6528 and the SIw is 5.7604. To simplify this example, four samples are selected and sample designations A1 through A4 assigned. The random start is RS = 0.6528 x 5.7604 = 3.7604. Sequence numbers are 3.7604, 9.5208, and 15.2812, rounding up to 4, 10, and 16. These sequence numbers represent first samples and correspond to index numbers assigned to USUs. A hit string is assigned to each sequence number to obtain the remaining three samples: 4, 10, 16 (first sample), 5, 11, 17 (second sample), 6, 12, 18 (third sample), and 7, 13, 111 (fourth sample). This example includes an incomplete hit string at the beginning and end of the BPC. After sample selection, corresponding sample designations are assigned. Table 3–4 illustrates results from sample selection. 10 When RS + I > SIw, an incomplete hit string occurs at the beginning of a BPC. When (RS + I) + (n( SIw)) > N, an incomplete hit string occurs at the end of a BPC (I = 1 to 18). 11 Since 3.7604 + I > 5.7604 (RS + I > SIw) where I = 3, an incomplete hit string occurs at the beginning of the BPC. The sequence number is calculated as RS + I - SIw. USU number within the block Index number Sample designation 1 2 3 4 1 2 3 4 5 6 7 8 1 1 2 3 1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A4 A1 A2 A3 A4 A1 A2 A3 A4 A1 A2 A3 Assignment of Post-Sampling Codes Two types of post-sampling codes are assigned to the sampled units. First, there are the CPS technical codes used to weight the data, estimate the variance of characteristics, and identify representative subsamples of the CPS sample units. The technical codes include final hit number, rotation group, and random group codes. Second, there are operational codes common to the demographic household surveys used to identify and track the sample units through data collection and processing. The operational codes include field PSU, segment number and segment number suffix. Final hit number. The final hit number identifies the original within-PSU order of selection. All USUs in a hit string are assigned the same final hit number. For each PSU, this code is assigned sequentially starting with one for both the old construction and the permit frames. The final hit number is used in the application of the CPS variance estimation method discussed in Chapter 14. Rotation group. Sample is partitioned into eight representative subsamples called rotation groups used in the CPS rotation scheme. All USUs in a hit string are assigned to the same rotation group. Assignment is performed separately 3–13 for old construction and the permit frame. Rotation groups are assigned after sorting hits by state, MSA/non-MSA status (old construction only), SR/NSR status, stratification PSU, and final hit number. Because of this sorting, the eight subsamples are balanced across stratification PSUs, states, and the nation. Rotation group is used in conjunction with sample designation to determine units in sample for particular months during the decade. Random group. Sample is partitioned into ten representative subsamples called random groups. All USUs in the hit string are assigned to the same random group. Assignment is performed separately for old construction and the permit frame. Since random groups are assigned after sorting hits by state, stratification PSU, rotation group, and final hit number, the ten subsamples are balanced across stratification PSUs, states, and the Nation. Random groups can be used to partition the sample into test and control panels for survey research. Field PSU. A field PSU is usually a single county within a stratification PSU, except in the New England states and part of Hawaii where a field PSU is a group of minor civil divisions. Field PSU definitions are consistent across all demographic surveys and are more useful than stratification PSUs for coordinating field representative assignments among demographic surveys. Segment number. A segment number is assigned to each USU within a hit string. If a hit string consists of USUs from only one field PSU, then the segment number applies to the entire hit string. If a hit string consists of USUs in different field PSUs, then each portion of the hit string/field PSU combination gets a unique segment number. The segment number is a four-digit code. The first digit corresponds to the rotation group of the hit. The remaining three digits are sequence numbers. In any 1 month, a segment within a field PSU identifies one USU or expected four housing units that the field representative is scheduled to visit. A field representative’s workload usually consists of a set of segments within one or more adjacent field PSUs. Segment number suffix. Adjacent USUs with the same segment number may be in different blocks for area and group quarters sample or in different building permit office dates or ZIP Codes for permit sample, but in the same field PSU. If so, an alphabetic suffix appended to the segment number indicates that a hit string has crossed one of these boundaries. Segment number suffixes are not assigned to the unit sample. Examples of Post-Sampling Code Assignments Two examples are provided to illustrate assignment of codes. To simplify the examples, only two samples are selected, and sample designations A1 and A2 are assigned. The examples illustrate a stratification PSU consisting of all sampling frames (which often does not occur). Assume the index numbers (shown in Table 3–5) are selected in two BPCs. These sample USUs are sorted and survey design codes assigned as shown in Table 3–6. The example in Table 3–6 illustrates that assignment of rotation group and final hit number is done separately for old construction and the permit frame. Consecutive numbers are assigned across BPCs within frames. Although not shown in the example, assignment of consecutive rotation group numbers (modulo 8) carries across stratification PSUs. For example, the first old construction hit in the next stratification PSU is assigned to rotation group 1. However, assignment of final hit numbers is performed within stratification PSUs. A final hit number of 1 is assigned to the first old construction hit and the first permit hit of each stratification PSU. Operational codes are assigned as shown in Table 3–7. After sample USUs are selected and post-sampling codes assigned, addresses are needed in order to interview sampled units. The procedure for obtaining addresses differs by sampling frame. For operational purposes, identifiers are used in the unit frame during sampling instead of actual addresses. The procedure for obtaining unit frame addresses by matching identifiers to census files is described in Chapter 4. Field procedures, usually involving a listing operation, are used to identify addresses in other frames. A description of listing procedures is also given in Chapter 4. Illustrations of the materials used in the listing phase are shown in Appendix B. Table 3–5. Index Numbers Selected During Sampling for Code Assignment Examples BPC number 1 ....................................................... 2 ....................................................... Unit frame 3-4, 27-28, 51-52 10-11, 34-35 Group quarters frame Area frame Permit frame 10-11 1 (incomplete), 32-33 none 6-7 7-8, 45-46 14-15 3–14 Table 3–6. Example of Postsampling Survey Design Code Assignments Within a PSU Frame Index Sample designation Final hit number Rotation group 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit Unit Unit Unit Unit Unit Group quarters Group quarters Area Area Area Unit Unit Unit Unit Area Area 3 4 27 28 51 52 10 11 1 32 33 10 11 34 35 6 7 A1 A2 A1 A2 A1 A2 A1 A2 A2 A1 A2 A1 A2 A1 A2 A1 A2 1 1 2 2 3 3 4 4 5 6 6 7 7 8 8 9 9 8 8 1 1 2 2 3 3 4 5 5 6 6 7 7 8 8 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Permit Permit Permit Permit Permit Permit 7 8 45 46 14 15 A1 A2 A1 A2 A1 A2 1 1 2 2 3 3 3 3 4 4 5 5 BPC Table 3–7. Example of Postsampling Operational Code Assignments Within a PSU BPC County Frame Block 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 2 2 1 1 1 1 1 3 3 3 3 3 3 Unit Unit Unit Unit Unit Unit Group quarters Group quarters Area Area Area Unit Unit Unit Unit Area Area 1 1 2 2 3 3 4 4 5 6 6 7 7 8 8 9 10 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 3 3 Permit Permit Permit Permit Permit Permit THIRD STAGE OF THE SAMPLE DESIGN Often, the actual USU size in the field can deviate from what is expected from the computer sampling. Occasionally, the deviation is large enough to jeopardize the successful completion of a field representative’s assignment. When these situations occur, a third stage of selection is conducted to maintain a manageable field representative workload. This third stage is called field subsampling. Sample Index designation Final hit number Field PSU Segment/ suffix 3 4 27 28 51 52 10 11 1 32 33 10 11 34 35 6 7 A1 A2 A1 A2 A1 A2 A1 A2 A2 A1 A2 A1 A2 A1 A2 A1 A2 1 1 2 2 3 3 4 4 5 6 6 7 7 8 8 9 9 1 1 1 2 2 2 1 1 1 1 1 3 3 3 3 3 3 8999 8999 1999 1999 2999 2999 3599 3599 4699 5699 5699 6999 6999 7999 7999 8699 8699A 7 8 45 46 14 15 A1 A2 A1 A2 A1 A2 1 1 2 2 3 3 1 1 2 2 3 3 3001 3001 4001 4001 5001 5001 Field subsampling occurs when a USU consists of more than 15 sample housing units identified for interview. Usually, this USU is identified after a listing operation. (See Chapter 4 for a description of field listing.) The regional office staff selects a systematic subsample of the USU to reduce the number of sample housing units to a more manageable number, from 8 to 15 housing units. To facilitate the subsampling, an integer take-every (TE) and start-with (SW) are used. An appropriate value of the TE 3–15 reduces the USU size to the desired range. For example, if the USU consists of 16 to 30 housing units, a TE of 2 reduces USU size to 8 to 15 housing units. The SW is a randomly selected integer between 1 and the TE. Field subsampling changes the probability of selection for the housing units in the USU. An appropriate adjustment to the probability of selection is made by applying a special weighting factor in the weighting procedure. See ‘‘Special Weighting Adjustments’’ in Chapter 10. ROTATION OF THE SAMPLE The CPS sample rotation scheme is a compromise between a permanent sample (from which a high response rate would be difficult to maintain) and a completely new sample each month (which results in more variable estimates of change). The CPS sample rotation scheme represents an attempt to strike a balance in the minimization of the following: 1. Variance of estimates of month-to-month change: threefourths of the sample is the same in consecutive months. 2. Variance of estimates of year-to-year change: one-half of the sample is the same in the same month of consecutive years. 3. Variance of other estimates of change: outgoing sample is replaced by sample likely to have similar characteristics. 4. Response burden: eight interviews are dispersed across 16 months. The rotation scheme follows a 4-8-4 pattern. A housing unit or group quarters is interviewed 4 consecutive months, not in sample for the next 8 months, interviewed the next 4 months, and then retired from sample. The rotation scheme is designed so outgoing housing units are replaced by housing units from the same hit string which have similar characteristics. The following summarizes the main characteristics (in addition to the sample overlap described above) of the CPS rotation scheme: 1. In any 1 month, one-eighth of the sample housing units are interviewed for the first time; another eighth is interviewed for the second time; and so on. 2. The sample for 1 month is composed of units from two or three consecutive samples. 3. One new sample designation-rotation group is activated each month. The new rotation group replaces the rotation group retiring permanently from sample. 4. One rotation group is reactivated each month after its 8-month resting period. The returning rotation group replaces the rotation group beginning its 8-month resting period. 5. Rotation groups are introduced in order of sample designation and rotation group: A62(1), A62(2), ..., A62(8), A63(1), A63(2), ..., A63(8), ..., A80(1), A80(2), ..., A80(8). The present rotation scheme has been used since 1953. The most recent research into alternate rotation patterns was prior to the 1980 redesign when state-based designs were introduced (Tegels, 1982). The Rotation Chart The CPS rotation chart illustrates the rotation pattern of CPS sample over time. Figure 3–1 presents the rotation chart beginning in January 1996. The following statements provide guidance in interpreting the chart: 1. Numbers in the chart refer to rotation groups. Sample designations appear in column headings. In January 1996, rotation groups 3, 4, 5, and 6 of A64; 7 and 8 of A65; and 1 and 2 of A66 are designated for interview. 2. Consecutive monthly samples have six rotation groups in common. The sample housing units in A64(4 - 6), A65(8), and A66(1-2), for example, are interviewed in January and February of 1996. 3. Monthly samples 1 year apart have four rotation groups in common. For example, the sample housing units in A65(7-8) and A66(1-2) are interviewed in January 1996 and January 1997. 4. Of the two rotation groups replaced from month-tomonth, one is in sample for the first time and one returns after being excluded for 8 months. For example, in October 1996, the sample housing units in A67(3) are interviewed for the first time and the sample housing units in A65(7) are interviewed for the fifth time after last being in sample in January. Overlap of the Sample Table 3–8 shows the proportion of overlap between any 2 months of sample depending on the time lag between them. The proportion of sample in common has a strong effect on correlation between estimates from different months and, therefore, on variances of estimates of change. 3–16 Table 3–8. Proportion of Sample in Common for 4-8-4 Rotation System Figure 3–1. CPS Rotation Chart: January 1996 - April 1998 Sample designation and rotation groups Interval (in months) 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 and greater . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Percent of sample in common between the 2 months 75 50 25 0 12.5 25 37.5 50 37.5 25 12.5 0 Phase-In of a New Design When a newly redesigned sample is introduced into the ongoing CPS rotation scheme, there are a number of reasons not to discard the old CPS sample one month and replace it with a completely redesigned sample the next month. Since redesigned sample contains different sample areas, new field representatives must be hired. Modifications in survey procedures are usually made for a redesigned sample. These factors can cause discontinuity in estimates if the transition is made at one time. Instead, a gradual transition from the old sample design to the new sample design is undertaken. Beginning in April 1994, the 1990 census-based design was phased in through a series of changes completed in July 1995 (U.S. Department of Labor, 1994). Year/month A64 1996 Jan Feb Mar Apr 3456 4567 5678 678 1 May June July Aug Sept Oct Nov Dec 1997 Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec 1998 Jan Feb Mar Apr A65 A66 A67 A68 A69 78 12 8 123 1234 2345 78 12 8 123 1234 2345 3456 4567 5678 678 1 3456 4567 5678 678 1 78 12 8 123 1234 2345 78 12 8 123 1234 2345 3456 4567 5678 678 1 3456 4567 5678 678 1 78 12 8 123 1234 2345 78 12 8 123 1234 2345 3456 4567 5678 678 1 3456 4567 5678 678 1 78 12 8 123 1234 2345 3–17 REFERENCES Executive Office of the President, Office of Management and Budget (1993), Metropolitan Area Changes Effective With the Office of Management and Budget’s Bulletin 93-17, June 30, 1993. Hansen, Morris H., Hurwitz, William N., and Madow, William G. (1953), Survey Sample Methods and Theory, Vol. I, Methods and Applications. New York: John Wiley and Sons. Hanson, Robert H. (1978), The Current Population Survey: Design and Methodology, Technical Paper 40, Washington, DC: Government Printing Office. Kostanich, Donna, Judkins, David, Singh, Rajendra, and Schautz, Mindi (1981), ‘‘Modification of Friedman-Rubin’s Clustering Algorithm for Use in Stratified PPS Sampling,’’ paper presented at the 1981 Joint Statistical Meetings, American Statistical Association. Ludington, Paul W. (1992), ‘‘Stratification of Primary Sampling Units for the Current Population Survey Using Computer Intensive Methods,’’ paper presented at the 1992 Joint Statistical Meetings, American Statistical Association. Statt, Ronald, Vacca, E. Ann, Wolters, Charles, and Hernandez, Rosa (1981), ‘‘Problems Associated With Using Building Permits as a Frame of Post-Census Construction: Permit Lag and ED Identification,’’ paper presented at the 1981 Joint Statistical Meetings, American Statistical Association. Tegels, Robert, and Cahoon, Lawrence (1982), ‘‘The Redesign of the Current Population Survey: The Investigation Into Alternate Rotation Plans,’’ paper presented at the 1982 Joint Statistical Meetings, American Statistical Association. U.S. Department of Labor, Bureau of Labor Statistics (1994), ‘‘Redesign of the Sample for the Current Population Survey,’’ Employment and Earnings, Washington, DC: Government Printing Office, May 1994. 4–1 Chapter 4. Preparation of the Sample INTRODUCTION The sample preparation operations have been developed to fulfill the following goals: 1. Implement the sampling procedures described in Chapter 3. 2. Produce virtually complete coverage of the eligible population. 3. Ensure that only a trivial number of households will appear in the sample more than once over the course of a decade, or will appear in more than one of the household surveys conducted by the Bureau of the Census. 4. Provide cost efficient collection by producing most of the sampling materials needed for both the Current Population Survey (CPS) and other household surveys in a single, integrated operation. The CPS is one of many household surveys conducted on a regular basis by the Census Bureau. Insofar as possible, Census Bureau programs have been designed so that survey materials, survey procedures, personnel, and facilities can be used by as many surveys as possible. Sharing personnel and sampling material among a number of programs yields obvious savings. For example, CPS field representatives can be (and often are) employed on non-CPS activities because the sampling materials, listing and coverage instructions, and to a lesser extent, questionnaire content are similar for a number of different programs. In addition, the sharing of sampling materials makes it possible to keep respondents from being in more than one sample. The postsampling codes described in Chapter 3 identify, among other information, the sample cases that are scheduled to be interviewed for the first time in each month of the decade and indicate the types of materials (maps, listing of addresses, etc.) needed by the census field representative to locate the sample addresses. This chapter describes how these materials are put together. This chapter may occasionally indulge in detail beyond the interest of the general reader. The next section is an overview that is recommended for all readers. Subsequent sections provide a more in-depth description of the CPS sample preparation and will prove useful to those interested in greater detail. Census Bureau headquarters are located in the Washington, DC area. Staff at headquarters coordinate CPS functions ranging from sample design, sample selection, and resolution of subject matter issues to administration of the interviewing staffs maintained under the 12 regional offices and data processing. Census Bureau staff located in Jeffersonville, IN also participate in CPS planning and administration. Their responsibilities include preparation and dissemination of interviewing materials, such as maps and segment folders. Nonetheless, the successful completion of the CPS data collection operations rests on the combined efforts of all Census Bureau offices. Monthly sample preparation of the CPS has three major components: 1. Identifying addresses. 2. Listing living quarters. 3. Assigning sample to field representatives. The within-PSU sample described in Chapter 3 is selected from four distinct sampling frames, not all of which consist of specific addresses. Since the field representatives need to know the exact location of the households or group quarters they are going to interview, much of sample preparation involves the conversion of selected sample (e.g., maps, lists of building permits) to a set of addresses. This conversion is described below. Address Identification in the Unit Frame About 65 percent of the CPS sample is selected from the unit frame. The unit frame sample is selected from a 1990 census file that contains the information necessary for within-PSU sample selection, but does not contain address information. The address information from the 1990 census is stored in a separate file. The addresses of unit segments are obtained by matching the file of 1990 census information to the file containing the associated 1990 census addresses. This is a one-time operation for the entire unit sample and is performed at headquarters. If the addresses are thought to be incomplete (missing a house number or street name), the 1990 census information is reviewed in an attempt to complete the address before sending it to the field representative for interview. Address Identification in the Area Frame About 30 percent of the CPS sample is selected from the area frame. Measures of expected four housing units are selected during within-PSU sampling instead of selecting 4– 2 housing units because the addresses are not city-style or there is no building permit office coverage. This essentially means that no particular housing units are as yet associated with the selected measure. The only information available is a map of the block which contains the area segment, the number of measures the block contains, and which measure is associated with the area segment. Before the individual housing units associated with the area segment can be identified, additional procedures are used to ensure that field representatives can locate the housing units and that all newly built housing units have a probability of selection. A field representative will be sent to visit and canvass the block to create a complete list of the housing units located in the block. This is referred to as a listing operation which is described more thoroughly in the next section. A systematic sampling pattern is applied to this listing to identify the housing units in the area segment that are designated for each month’s sample. Systematic sampling pattern. Applying the systematic sampling pattern to a listing is referred to as a Start-with/Takeevery procedure. Start-with/Take-every’s are calculated at headquarters based on the first hit measure in the block and on the number of measures in the block, respectively. For more information on hits and blocks, see Chapter 3. The regional offices use the Start-with/Take-every information to identify a sample housing unit to be interviewed. In simplistic terms, the Start-with identifies the line number on the listing sheet where the sampling pattern begins; the Take-every identifies the frequency with which sample housing units or lines on the listing sheet are selected. To apply a Start-with/Take-every for a block, the regional office would begin with the first listing sheet for the block counting down the number of lines as indicated by the Start-with. This resulting line number identifies a selected sample housing unit. Beginning with the next line listed, the regional office would count down the number of lines equal to the Take-every. The resulting line identifies the next selected sample housing unit. The regional office continues the above approach until the sampling pattern has been applied to all listed units. For example, a Start-with/Takeevery of 3/4 indicates that when the regional office applies the sampling pattern, they would start on line three of the listing sheet and take every fourth line. Suppose the listing sheet had 12 listed lines, a Start-with/Take-every results in the units on lines 3, 7, and 11 being selected for interview. See Figure4-1. Address Identification in the Group Quarters Frame About 1 percent of the CPS sample is selected from the group quarters frame. The decennial census files did not have information on the characteristics of the group quarters. The files contain information about the residents as of April 1, 1990, but there was no information about their living arrangements within the group quarters. Consequently, the census information did not provide a tangible sampling unit for the CPS. Measures were selected during within-PSU sampling since there was no way to associate the selected sample cases with persons to interview at a group quarters. A two-step process is used to identify the group quarters segment. First, the group quarters addresses are obtained by matching to the file of 1990 census addresses, similar to the process for the unit frame. This is a one-time operation done at headquarters. Before the persons living at the group quarters associated with the group quarters segment can be identified, an interviewer visits the group quarters and creates a complete list of eligible sample units (consisting of persons, rooms, or beds). This is referred to as a listing operation. Then a systematic sampling pattern, as described above, is applied to the listing to identify the individuals at the group quarters in the group quarters segment. Address Identification in the Permit Frame The proportion of the CPS sample selected from the permit frame keeps increasing over the decade as new housing units are constructed. The CPS sample redesigns are introduced about 4 or 5 years after each decennial census, and at this time the permit sample makes up about 5 percent of the CPS sample; this has historically increased about 1 percent a year. Hypothetical measures are selected during within-PSU sampling (skeleton sampling) in anticipation of the construction of new housing units. Identifying the addresses for these permit measures involves a listing operation at the building permit office, clustering of addresses to form measures, and associating these addresses with the hypothetical measures (or USUs) in sample. The Census Bureau conducts the Building Permit Survey which collects information on a monthly basis from each building permit office in the Nation about the number of housing units authorized to be built. The Building Permit Survey results are converted to measures of expected four housing units. These measures are continuously accumulated and linked with the frame of hypothetical measures used to select the CPS sample. This matching identifies which building permit office contains the measure that is in sample. A field representative then visits the building permit office to obtain a list of addresses of units that were authorized to be built; this is the Permit Address List (PAL) operation. This list of addresses is keyed and transmitted to headquarters where clusters are formed that correspond one-to-one with the measures. Using this link, the clusters of four addresses in each permit segment are identified. Forming clusters. To help ensure some geographic clustering of addresses within permit measures and to make PAL listing more efficient, information collected by the 4– 3 Figure 4– 1. Example of Systematic Sampling 4– 4 Survey of Construction (SOC)1 is used to identify many of the addresses in the permit frame. For building permit offices included in SOC, the Census Bureau collects information on the characteristics of units to be built for each permit issued by the building permit office. This information is used to form measures in SOC building permit offices. This is not the case for non-SOC building permit offices. 1. SOC PALs are listings from building permit offices that are in the SOC. If a building permit office is in the SOC, then the actual permits issued by the building permit office and the number of units authorized by each permit (though not the addresses) are known in advance of the match to the skeleton universe. Therefore, the measures for sampling are formed directly from the actual permits. The sample permits can then be identified. These sample permits are the only ones for which addresses are collected. Because measures for SOC permits were constrained to be within permits, once the listed addresses are complete, the formation of clusters follows easily. The measures formed at the time of sampling are, in effect, the clusters. The sample measures within permits are fixed at the time of sampling; that is, there cannot be any later rearrangement of these units into more geographically compact clusters without voiding the original sampling results. Even without geographic clustering, there is some degree of compactness inherent in the manner in which measures are formed for sampling. The units within the SOC permits are assigned to measures in the same order in which they were listed for SOC; thus units within apartment buildings will normally be in the same clusters, and permits listed on adjacent lines on the SOC listing often represent neighboring structures. 2. Non-SOC PALs are listings from building permit offices not in the SOC. At the time of sampling, the only data known for non-SOC building permit offices is a cumulative count of the units authorized on all permits for an office for a given month (or year). Therefore, all addresses for a building permit office/date are collected, together with the number of apartments in multiunit buildings. The addresses are clustered using all units on the PAL. The purpose of clustering is to group units together geographically, thus enabling a reduction in field travel costs. For multiunit addresses, as many whole clusters as possible are created from the units within each address. The remaining units on the PAL are clustered within ZIP Code and permit day of issue. 1 The Survey of Construction (SOC) is conducted by the U.S. Census Bureau in conjunction with the Department of Housing and Urban Development. It provides current regional statistics on starts and completions of new single-family and multifamily units and sales of new onefamily homes. LISTING ACTIVITIES When address information from the census is not available or the address information from the census no longer corresponds to the current address situation, then a listing of all eligible units must be created. Creating this list of basic addresses is referred to as listing. Listing can occur in all four frames: units within multiunit structures, living quarters in blocks, units or residents within group quarters, and addresses for building permits issued. The living quarters to be listed are usually housing units. In group quarters such as transient hotels, rooming houses, dormitories, trailer camps, etc., where the occupants have special living arrangements, the living quarters listed may be housing units, rooms, beds, etc. In this discussion of listing, all of these living quarters are included in the term ‘‘unit’’ when it is used in context of listing or interviewing. Completed listings are sampled by regional office staff. Performing the listing and sampling in two separate steps allows each step to be verified and allows a more complete control of sampling to avoid bias in designating the units to be interviewed. The units to be interviewed are selected as objectively as possible to reduce bias. However, the sample rotation system used in the CPS tends to remove biases that may have been introduced in the selection process. For example, if a unit USU is to be defined within a large structure where units will be assigned to several USUs, the field representative could conceivably manipulate the order of listing of the housing units to bias the selection for a given USU. However, as the CPS rotation system eventually replaces each USU by others made up of units adjacent on the listing at the address, the unknown error from the selection bias is replaced by a (measurable) variance. In order to ensure accurate and complete coverage of the area and group quarters segments, updating of the initial listings is done periodically throughout the decade. The updating ensures that changes such as units missed in the initial listing, demolished units, residential/commercial conversions, and new construction are accounted for. Listing in the Unit Frame Listing in the unit frame is usually not necessary. The only time it is done is when the field representative discovers that the address information from the 1990 census is no longer accurate for a multiunit structure and the field representative cannot adequately reconcile the differences. For multiunit addresses (addresses where the expected number or unit designations is two or more), the field representative receives preprinted (computer-generated) Unit/Permit Listing Sheets (Form 11-3) showing unit designations as recorded in the 1990 census. For multiunits containing between two and nine units, the listing sheet displays the unit designations of all units in the structure 4– 5 even if some of the units are not in sample. For multiunit structures with ten or more units, the listing sheet contains only the unit designations of sample cases. Other important information on the Unit/Permit Listing Sheet includes: the address, the expected number of units at the address, the sample designations, and serial numbers for the selected sample units, and the 1990 census IDs corresponding to those units. For an example of a Unit/Permit Listing Sheets see Appendix B. The first time a multiunit address enters the sample, the field representative does one or more of the following: • Verifies that the 1990 census information on the Unit/Permit Listing Sheet (Form 11-3) is accurate. • Corrects the listing sheet when it does not agree with what the field representative finds at the address. • Relists the address on a blank Unit/Permit Listing Sheet (only for some large multiunit listings). After the field representative has an accurate listing sheet, he/she conducts an interview for each unit that has a current (preprinted) sample designation. If an address is relisted, the field representative provides information to the regional office on the relisting. The regional office staff will resample the listing sheet and provide the line numbers for the specific lines on the listing sheet that identify the units that should be interviewed. A regular system of updating the listing in unit segments is not followed; however, the field representative may correct an in-sample listing during any visit if a change is noticed. The change may result in additional units being added or removed from sample. For single-unit addresses, a preprinted listing sheet is not provided to the field representative since only one unit is expected based on the 1990 census information. If a field representative discovers other units at the address at the time of interview, these additional units are interviewed. Listing in the Area Frame All blocks that contain area frame sample cases must be listed. Several months before the first area segment in a block is to be interviewed, a field representative visits the block to establish a list of living quarters. Units within the block are recorded on an Area Segment Listing Sheet (Form 11-5). The heading entries on the form are all filled by a regional office clerk, and the field representative records a systematic list of all occupied and vacant housing units and group quarters within the block boundaries. If the area segment is within a jurisdiction where building permits are issued, housing units constructed since April 1, 1990, are eliminated from the area segment through the year built procedure to avoid giving an address more than one chance of selection. This is required because housing units constructed since April 1, 1990, Census Day, are represented by segments in the permit frame. When there has been significant growth in housing units since April 1, 1990, it is necessary to determine the year built for every structure in the segment at the time of listing. To determine ‘‘year built,’’ the field representative inquires at each listed unit and circles the appropriate code on the listing sheet. For an example of an Area Segment Listing Sheet see Appendix B. The inquiry at the time of listing is omitted in areas with low new construction activity; in such cases, new construction units are identified later in completing the coverage questions during the interview. This process avoids an extra contact at units built before 1990. If an area segment is not in a building permit issuing jurisdiction, then housing units constructed after the 1990 census do not have a chance of being selected for interview in the permit frame. The field representative does not determine ‘‘year built’’ for units in such blocks. After the listing of living quarters in the area segment has been completed, the listing forms are returned to the regional office. A clerk in the regional office then applies the sampling pattern to identify the units to be interviewed. (See the Systematic sampling pattern section above.) To reflect changes in the number of units in each sample block, it is desirable to periodically review and correct (i.e., update) each segment’s listing so that the current USU, and later USUs to be interviewed, will represent the current status. The following rule is used: The USU being interviewed for the first time for CPS must be identified from a listing that has been updated within the last 24 months. The area listing sheet is updated by retracing the path of travel of the original lister, verifying the existence of each unit on the listing sheet, accounting for units no longer in existence, and appending any unlisted housing units or other living quarters that are found at the end of the list. By extending the original sampling pattern, the appended units are given their chance of being in sample. Listing in the Group Quarters Frame Group quarters addresses in the CPS sample are listed if they are in the group quarters, area, or unit frames. Group quarters found in the permit frame are not listed. Before the first interviews at a group quarters address can be conducted, a field representative visits the group quarters to establish a list of eligible units at the group quarters. The same group quarters procedures for creating a list of eligible units applies for group quarters found in the area or unit frame. The only difference is the time frame in which the listing is done. If the group quarters is in the group quarters or area frame, then the listing is done several months before interview. If the group quarters is discovered at the address of unit frame sample, then the listing is done at the time of interview. Group Quarters Listing Sheets (Form 11-1) are used to record the group quarters name, group quarters type, address, the name and telephone number of a contact person, and to list the eligible units within the group 4– 6 quarters. Institutional and verified military group quarters are not listed by a field representative. For an example of a Group Quarters Listing Sheet, see Appendix B. Each eligible unit of a group quarters is listed on a separate line of the group quarters listing sheet, regardless of the number of units. If there is more than one group quarters within the block, each group quarters must be listed on a separate listing sheet. More detailed information on the listing of group quarters can be found in the Listing and Coverage Manual for Field Representatives. The rule for the frequency of updating group quarters listings is the same as for area segments. Listing in the Permit Frame There are two phases of listing in the permit frame. The first is the PAL operation which establishes a list of addresses authorized to be built by a building permit office. This is done shortly after the permit has been issued by the building permit office and associated with a sample hypothetical measure. The second listing is required when the field representative visits the unit to conduct an interview and discovers that the original address information collected through the PAL operation is not accurate and cannot be reconciled. PAL operation. For each building permit office containing a sample measure, a PAL Form (Form 11-193A) is computergenerated. A field representative visits the building permit office and completes the PALs by listing the necessary permit and address information. If an address given on a permit is missing a house number or street name (or number), then the address is considered incomplete. In this case, the field representative visits the new construction site and draws a sketch map showing the location of the structure and, if possible, completes the address. Permit listing. Listing in the permit frame is necessary for all multiunit addresses. The PAL operation does not obtain unit designations at multiunit addresses. Therefore, listing is necessary to complete the addresses. Additionally, listing is required when the addresses obtained through the PAL operation do not correspond to what the field representative actually finds when visiting the address for the first interview. When this occurs, the unit frame listing procedures are followed. THIRD STAGE OF THE SAMPLE DESIGN (SUBSAMPLING) Chapter 3 describes a third stage of the sample design. This third stage is referred to as subsampling. The need for subsampling is dependent on the results of the listing operations and the results of the clerical sampling. Subsampling is required when the number of housing units in a segment for a given sample is greater than 15 or when 1 of the 4 units in the USU yields more than 4 total units. For unit segments (and permit segments) this can happen when more units than expected are found at the address at the time of the first interview. For more information on subsampling, see Chapter 3. INTERVIEWER ASSIGNMENTS The final stage of sample preparation includes those operations that are needed in order to break the sample down into manageable interviewer workloads and to get the resulting assignments to the field representatives for interview. At this point, all the sample cases for the month have been identified and all the necessary information about these sample cases is available in a central database at headquarters. The listings have been completed, sampling patterns have been applied, and the addresses are available. The central database also includes additional information, such as telephone numbers for those cases which have been interviewed in previous months. The rotation of sample used in the CPS is such that seveneighths of the sample cases each month have been in sample in previous months (see Chapter 3). This central database is part of an integrated system described briefly below. This integrated system also affects the management of the sample and the way the interview results are transmitted to central headquarters, as described in Chapter 8. Overview of the Integrated System In recent years, technological advances have changed the face of data preparation and collection activities at the Census Bureau. The Census Bureau has been developing computer-based methods for survey data collection, communications, management, and analysis. Within the Census Bureau, this integrated data collection system is called the Computer Assisted Interviewing System. The computer assisted interviewing system has two principal components: computer assisted personal interviewing (CAPI) and centralized computer assisted telephone interviewing (CATI). See Chapter 7 on the data collection aspects of this system. The integrated system is designed to manage decentralized data collection using laptop computers, a centralized telephone collection, and a central database for data management and accounting. The integrated system is made up of three main parts: 1. Headquarters operations in the central database. The headquarters operations include transmission of cases to field representatives, transmission of CATI cases to the telephone centers, and database maintenance. 2. Regional office operations in the central database. The regional office operations include sample maintenance, keying of addresses as necessary, preparation of assignments, determination of CATI assignments, reinterview selection, and review and reassignment of cases. 4– 7 3. Field representative case management operations on the laptop computer. The field representative operations include receipt of assignments, completion of interview assignments, and transmittal of completed work to the central database for processing (see Chapter 8). The central database resides at headquarters where the file of sample cases is maintained. The database stores field representative data (name, phone number, address, etc.), information for making assignments (Field PSU, segment, address, etc.), and all the data for cases in sample. Regional Office Operations When the information in the database is complete, the regional offices can begin the assignment preparation phase. This includes making assignments to field representatives and sending cases to a centralized telephone facility. Regional offices access the central database to break down the assignment areas geographically and key in information that is used to aid in the monthly field representative assignment operations. The regional office supervisor considers such characteristics as the size of the Field PSU, the workload in that Field PSU, and the number of field representatives working in that Field PSU when deciding the best geographic method for dividing the workload in Field PSUs among field representatives. As part of the process of making field representative assignments, the CATI assignments are also made. Each regional office is informed of the recommended number of cases that should be sent to CATI. The regional office attempts to assign the recommended number of cases for centralized telephone interviewing. The selection of cases for CATI involves several steps. Cases may be assigned to CATI if several criteria are met. These criteria pertain to the Field PSU, the household, and the time in sample. In general terms, the criteria are as follows: 1. The sample case must be in a Field PSU and random group that is CATI eligible. (The random group restriction is imposed to allow for statistical testing. See Chapter 3 for further description of random group.) 2. The household must have a telephone and be willing to accept a telephone interview. 3. First and fifth month cases are generally not eligible for a telephone or CATI interview. The regional offices have the ability to temporarily assign cases to CATI in order to cover their workloads in certain situations. Each region has provided headquarters with the number of CATI cases that they could provide to CATI on a temporary basis each month. This is done primarily to fill in for field representatives that are ill or on vacation. When interviewing for the month is completed, these cases will automatically be reassigned for CAPI interviewing. Many of the cases sent to CATI are successfully completed as telephone interviews. Those that cannot be completed from the telephone centers are returned to the field prior to the end of the interview period. These cases are called ‘‘CATI recycles.’’ See Chapter 8 for further detail. The final step is the certification of assignments. After all changes to the interview and CATI assignments have been made, the regional offices certify the assignments. This must be done after all assignments have been reviewed for geographic efficiency and have the proper balance among field representatives. After assignments are made and certified, the regional offices transmit the assignments to the central database. This transmission places the assignments on the telecommunications server for the field representatives and centralized telephone facilities. Prior to the interview period, field representatives receive their assignments by initiating a transmission to the telecommunications server at headquarters. Assignments include the instrument (questionnaire and/or supplements) and the cases they must interview that month. These files are copied to the laptop during the transmission from the server. The instrument includes the control card or household demographic information and labor force questions. All data sent and received from the field representatives pass through the central communications system maintained at headquarters. See Chapter 8 for more information on the transmission of interview results. Finally, the regional offices prepare the remaining paper materials needed by the field representatives to complete their assignments. The materials include: 1. Field Representative Assignment Listing (CAPI-35). 2. Segment folders for cases to be interviewed (including maps, listing sheets, and other pertinent information that will aid the interviewer in locating specific cases). 3. Blank listing sheets, payroll forms, respondent letters, and other supplies requested by the field representative. 4. Segment folders and listing sheets for blocks and group quarters to be listed for future CPS samples. Once the field representative has received the above materials and has successfully completed a transmission to retrieve his/her assignments, the sample preparation operations are complete and the field representative is ready to conduct the interviews. 5–1 Chapter 5. Questionnaire Concepts and Definitions for the Current Population Survey INTRODUCTION An extremely important component of the Current Population Survey (CPS) is the questionnaire, also called the survey instrument. Although the concepts and definitions of both labor force items and associated demographic information collected in the CPS have, with a few exceptions, remained relatively constant over the past several decades, the survey instrument was radically redesigned in January 1994. The purpose of the redesign was to reflect changes that have occurred in the economy and to take advantage of the possibilities inherent in the automated data collection methods introduced in 1994, that is, computer assisted personal interviewing (CAPI) and computer assisted telephone interviewing (CATI). This chapter briefly describes and discusses the current survey instrument: its concepts, definitions, and data collection procedures and protocols. STRUCTURE OF THE SURVEY INSTRUMENT The CPS interview is divided into three basic parts: (1) household and demographic information, (2) labor force information, and (3) supplement information in months that include supplements. Household and demographic information historically was called ‘‘control card information’’ because, in the paper-and-pencil environment, this information was collected on a separate cardboard form. With electronic data collection, this distinction is no longer apparent to the respondent or the interviewer. The order in which interviewers attempt to collect information is as follows: (1) housing unit data, (2) demographic data, (3) labor force data, (4) more demographic data, (5) supplement data, and finally (6) more housing unit data. Only the concepts and definitions of the household, demographic, and labor force data are discussed below. (For more information about supplements to the CPS see Chapter 10.) CONCEPTS AND DEFINITIONS Household and Demographic Information Upon contacting a household, interviewers proceed with the interview unless there is a clear indication that the case is a definite noninterview. (Chapter 7 discusses the interview process and explains refusals and other types of noninterviews.) When interviewing a household for the first time, interviewers collect information about the housing unit and all individuals who usually live at the address. Housing unit information. Upon first contact with a housing unit, interviewers collect information on the housing unit’s physical address, its mailing address, the year it was constructed, the type of structure (single or multiple family), whether it is renter- or owner-occupied, whether other units or persons are part of the sample unit, whether the housing unit has a telephone and, if so, the telephone number. Household roster. After collecting or updating the housing unit data, the interviewer either creates or updates a list of all individuals living in the unit and determines whether or not they are members of the household. This list is referred to as the household roster. Household respondent. One person may provide all of the CPS data for the entire sample unit, provided that the person is a household member 15 years of age or older who is knowledgeable about the household. The person who responds for the household is called the household respondent. Information collected from the household respondent for other members of the household is referred to as proxy response. Reference person. To create the household roster, the interviewer asks the household respondent to give ‘‘the names of all persons living or staying’’ in the housing unit, and to ‘‘start with the name of the person or one of the persons who owns or rents’’ the unit. The person whose name the interviewer enters on line one (presumably one of the individuals who owns or rents the unit) becomes the reference person. Note that the household respondent and the reference person are not necessarily the same. For example, if you are the household respondent and you give your name ‘‘first’’ when asked to report the household roster, then you also are the reference person. If, on the other hand, you are the household respondent and you give your spouse’s name ‘‘first’’ when asked to report the household roster, then your spouse is the reference person. (Sometimes the reference person is referred to as the ‘‘householder.’’) Household. A household is defined as all individuals (related family members and all unrelated individuals) whose usual place of residence at the time of the interview 5–2 is the sample unit. Individuals who are temporarily absent and who have no other usual address are still classified as household members even though they are not present in the household during the survey week. College students comprise the bulk of such absent household members, but persons away on business or vacation are also included. (Not included are persons in institutions or the military.) Once household/nonhousehold membership has been established for all persons on the roster, the interviewer proceeds to collect all other demographic data for household members only. Relationship to reference person. The interviewer will show a flash card with relationship categories (e.g., spouse, child, grandchild, parent, brother/sister) to the household respondent and ask him/her to report each household member’s relationship to the reference person (the person listed on line one). Relationship data also are used to define families, subfamilies, and individuals whose usual place of residence is elsewhere. A family is defined as a group of two or more individuals residing together who are related by birth, marriage, or adoption; all such individuals are considered members of one family. Families are further classified either as married-couple families or as families maintained by women or men without spouses. Subfamilies are defined as families that live in housing units where none of the members of the family are related to the reference person. It is also possible that a household contains unrelated individuals; that is, people who are not living with any relatives. An unrelated individual may be part of a household containing one or more families or other unrelated individuals, may live alone, or may reside in group quarters, such as a rooming house. Additional demographic information. In addition to asking for relationship data, the interviewer asks for other demographic data for each household member including: birth date, marital status, who a particular person’s parent is among the members of the household (entered as the line number of parent) and who a person’s spouse is (if it cannot be determined from the relationship data), Armed Forces status, level of education, race, ethnicity, nativity, and social security number (for those 15 years of age or older in selected months). Total household income also is collected. The social security number for each household member 15 years of age or older is collected so that it is possible to match the CPS data to other government data in order to evaluate the accuracy, consistency, and comparability of various statistics derived from CPS data. The following terms are used to define an individual’s marital status at the time of the interview: married spouse present, married spouse absent, widowed, divorced, separated, or never married. The term ‘‘married spouse present’’ applies to a husband and wife who both live at the same address, even though one may be temporarily absent due to business, vacation, a visit away from home, a hospital stay, etc. The term ‘‘married spouse absent’’ applies to individuals who live apart for reasons such as marital problems, as well as husbands and wives who are living apart because one or the other is employed elsewhere, on duty with the Armed Forces, or any other reason. The information collected during the interview is used to create three marital status categories: single never married, married spouse present, and other marital status. The latter category includes those who were classified as widowed; divorced; separated; or married, spouse absent. Starting in January 1992, educational attainment for each person in the household age 15 or older was obtained through a question asking about the highest grade or degree completed. In January 1996, additional questions were added for several educational attainment categories to ascertain the total number of years of school or credit years completed. (Prior to January 1992, the questions referred to the highest grade or year of school each person had attended and, then, if they had completed that grade or year of school.) The household respondent identifies his/her own race as well as that of other household members by selecting a category from a flash card presented by the interviewer. The categories on the flash card are: White, Black, American Indian, Aleut, Eskimo, Asian or Pacific Islander, or Other. The ‘‘nativity’’ items ask about a person’s country of birth, as well as that of the individual’s parents, and whether they are American citizens and if so, whether by birth or naturalization. Persons born outside of the 50 states also are asked date of immigration to the United States. The nativity items were added in April of 1993. The additional demographic data collected are used to classify individuals into racial groups, education categories, and groups based on ethnic origin. Currently, the racial categories for CPS data are White, Black, and Other, where the Other group includes American Indians, Alaskan Natives, and Asians and Pacific Islanders. The data about individuals’ ethnic origin primarily are collected for the purpose of identifying Hispanics,1 where Hispanics are those who identify their origin or descent as Mexican, Puerto Rican, Cuban, Central or South American, or some other Hispanic descent. It should be noted that race and ethnicity are distinct categories. Thus, individuals of Hispanic origin may be of any race. Labor Force Information Labor force information is obtained after the household and demographic information has been collected. One of the primary purposes of the labor force information is to classify individuals as employed, unemployed, or not in the labor force. Other information collected includes hours worked, occupation, and industry and related aspects of the working population. It should be noted that the major labor force categories are defined hierarchically and, thus, 1 Hispanics may be of any race. 5–3 are mutually exclusive. Employed supersedes unemployed which supersedes not in the labor force. For example, individuals who are classified as employed, even if they worked less than full time, are not asked the questions about having looked for work, and hence cannot be classified as unemployed. Similarly, an individual who is classified as unemployed is not asked the questions used to determine one’s primary nonlabor market activity. For instance, retired persons who are currently working are classified as employed even though they have retired from previous jobs. Consequently, they are not asked the questions about their previous employment nor can they be classified as retired. The current concepts and definitions underlying the collection and estimate of the labor force data are presented below. Reference week. The CPS labor force questions ask about labor market activities for 1 week each month. This week is referred to as the ‘‘reference week.’’ The reference week is defined as the 7-day period, Sunday through Saturday, that includes the 12th of the month. Civilian noninstitutional population. In the CPS, labor force data are restricted to persons 16 years of age and older, who currently reside in 1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces. Employed persons. Employed persons are those who, during the reference week (a) did any work at all (for at least 1 hour) as paid employees; worked in their own businesses, professions, or on their own farms; or worked 15 hours or more as unpaid workers in an enterprise operated by a family member or (b) were not working, but who had a job or business from which they were temporarily absent because of vacation, illness, bad weather, childcare problems, maternity or paternity leave, labormanagement dispute, job training, or other family or personal reasons whether or not they were paid for the time off or were seeking other jobs. Each employed person is counted only once, even if he or she holds more than one job. (See the discussion of multiple jobholders below.) Employed citizens of foreign countries who are temporarily in the United States but not living on the premises of an embassy are included. Excluded are persons whose only activity consisted of work around their own house (painting, repairing, cleaning, or other home-related housework) or volunteer work for religious, charitable, or other organizations. The initial survey question, asked only once for each household, inquires whether anyone in the household has a business or a farm. Subsequent questions are asked for each household member to determine whether any of them did any work for pay (or for profit if there is a household business) during the reference week. If no work for pay or profit was performed and a family business exists, respondents are asked whether they did any unpaid work in the family business or farm. Multiple jobholders. These are employed persons who, during the reference week, had either two or more jobs as wage and salary workers; were self-employed and also held one or more wage and salary jobs; or worked as unpaid family workers and also held one or more wage and salary jobs. A person employed only in private households (cleaner, gardener, babysitter, etc.) who worked for two or more employers during the reference week is not counted as a multiple jobholder since working for several employers is considered an inherent characteristic of private household work. Also excluded are self-employed persons with multiple unincorporated businesses and persons with multiple jobs as unpaid family workers. Since January 1994, CPS respondents have been asked questions each month to identify multiple jobholders. First, all employed persons are asked ‘‘Last week, did you have more than one job (or business, if one exists), including part-time, evening, or weekend work?’’ Those who answer ‘‘yes’’ are then asked, ‘‘Altogether, how many jobs (or businesses) did you have?’’ Prior to 1994, this information had been available only through periodic CPS supplements. Hours of work. Beginning with the CPS redesign in January 1994, both actual and usual hours of work have been collected. Prior to the redesign, only actual hours were requested for all employed individuals. Published data on hours of work relate to the actual number of hours spent ‘‘at work’’ during the reference week. For example, persons who normally work 40 hours a week but were off on the Memorial Day holiday, would be reported as working 32 hours, even though they were paid for the holiday. For persons working in more than one job, the published figures relate to the number of hours worked at all jobs during the week. Data on persons ‘‘at work’’ exclude employed persons who were absent from their jobs during the entire reference week for reasons such as vacation, illness, or industrial dispute. Data also are available on usual hours worked by all employed persons, including those who were absent from their jobs during the reference week. At work part time for economic reasons. Sometimes referred to as involuntary part time, this category refers to individuals who gave an economic reason for working 1 to 34 hours during the reference week. Economic reasons include slack work or unfavorable business conditions, inability to find full-time work, and seasonal declines in demand. Those who usually work part time also must indicate that they want and are available to work full time to be classified as being part time for economic reasons. At work part time for noneconomic reasons. This group includes those persons who usually work part time and were at work 1 to 34 hours during the reference week for a 5–4 noneconomic reason. Noneconomic reasons include illness or other medical limitation, childcare problems or other family or personal obligations, school or training, retirement or social security limits on earnings, and being in a job where full-time work is less than 35 hours. The group also includes those who gave an economic reason for usually working 1 to 34 hours but said they do not want to work full time or were unavailable for such work. Usual full- or part-time status. In order to differentiate a person’s normal schedule from his/her activity during the reference week, persons also are classified according to their usual full- or part-time statuses. In this context, full-time workers are those who usually work 35 hours or more (at all jobs combined). This group includes some individuals who worked less than 35 hours in the reference week — for either economic or noneconomic reasons — as well as those who are temporarily absent from work. Similarly, part-time workers are those who usually work less than 35 hours per week (at all jobs), regardless of the number of hours worked in the reference week. This may include some individuals who actually worked more than 34 hours in the reference week, as well as those who were temporarily absent from work. The full-time labor force includes all employed persons who usually work full time and unemployed persons who are either looking for fulltime work or are on layoff from full-time jobs. The part-time labor force consists of employed persons who usually work part time and unemployed persons who are seeking or are on layoff from part-time jobs. Prior to 1994, persons who worked full time during the reference week were not asked about their usual hours. Rather, it was assumed that they usually worked full time, and hence they were classified as full-time workers. Occupation, industry, and class-of-worker. For the employed, this information applies to the job held in the reference week. A person with two or more jobs is classified according to the job at which he or she worked the greatest number of hours. The unemployed are classified according to their last jobs. The occupational and industrial classification of CPS data is based on the coding systems used in the 1990 census. A list of these codes can be found in Alphabetical Index of Industries and Occupations (Bureau of the Census, January 1992). The class-of-worker classification assigns workers to one of the following categories: wage and salary workers, self-employed workers, and unpaid family workers. Wage and salary workers are those who receive wages, salary, commissions, tips, or pay in kind from a private employer or from a government unit. The class-of-worker question also includes separate response categories for ‘‘private for profit company’’ and ‘‘nonprofit organization’’ to further classify private wage and salary workers (this distinction has been in place since January 1994). Self-employed persons are those who work for profit or fees in their own businesses, professions, trades, or farms. Only the unincorporated self-employed are included in the self-employed category since those whose businesses are incorporated technically are wage and salary workers because they are paid employees of a corporation. Unpaid family workers are persons working without pay for 15 hours a week or more on a farm or in a business operated by a member of the household to whom they are related by birth or marriage. Occupation, industry, and class-of-worker on second job. The occupation, industry, and class-of-worker information for individuals’ second jobs is collected in order to obtain a more accurate measure of multiple jobholders, to obtain more detailed information about their employment characteristics, and to provide information necessary for comparing estimates of number of employees in the CPS and in BLS’s establishment survey (the Current Employment Statistics; for an explanation of this survey see BLS Handbook of Methods, April 1997). For the majority of multiple jobholders, occupation, industry, and class-ofworker data for their second jobs are collected only from a quarter of the sample–those in their fourth or eighth monthly interviews. However, for those classified as ‘‘self-employed unincorporated’’ on their main jobs, class-of-worker of the second job is collected each month. This is done because, according to the official definition, individuals who are self-employed unincorporated on both of their jobs are not considered multiple jobholders. The questions used to determine whether an individual is employed or not, along with the questions an employed person typically will receive, are presented in Figure 5–1. A copy of the entire questionnaire can be obtained from the Internet. The address for this file is: http://www.bls.census.gov/cps/bqestair.htm. Earnings. Information on what people earn at their main jobs is collected only for those who are receiving their fourth or eighth monthly interviews. This means that earnings questions are asked of only one-fourth of the survey respondents. Respondents are asked to report their usual earnings before taxes and other deductions and to include any overtime pay, commissions, or tips usually received. The term ‘‘usual’’ is as perceived by the respondent. If the respondent asks for a definition of usual, however, interviewers are instructed to define the term as more than half the weeks worked during the past 4 or 5 months. Respondents may report earnings in the time period they prefer—for example, hourly, weekly, biweekly, monthly, or annually. (Allowing respondents to report in a periodicity with which they were most comfortable was a feature added in the 1994 redesign.) Based on additional information collected during the interview, earnings reported on a basis other than weekly are converted to a weekly amount in later processing. Data are collected for wage and salary workers (excluding the self-employed who respond that their businesses were incorporated). These earnings data are used to construct estimates of the distribution of usual weekly earnings and median earnings. Individuals who do not 5–5 report their earnings on an hourly basis are asked if they are, in fact, paid at an hourly rate and if so, what the hourly rate is. The earnings of those who reported hourly and those who are paid at an hourly rate is used to analyze the characteristics of hourly workers, for example, those who are paid the minimum wage. Unemployed persons. All persons who were not employed during the reference week but were available for work (excluding temporary illness) and had made specific efforts to find employment some time during the 4-week period ending with the reference week are classified as unemployed. Individuals who were waiting to be recalled to a job from which they had been laid off need not have been looking for work to be classified as unemployed. A relatively minor change was incorporated into the definition of unemployment with the implementation of the January 1994 redesign. Under the former definition, persons who volunteered that they were waiting to start a job within 30 days (a very small group numerically) were classified as unemployed, whether or not they were actively looking for work. Under the new definition, by contrast, people waiting to start a new job must have actively looked for a job within the last 4 weeks in order to be counted as unemployed. Otherwise, they are classified as not in the labor force. As the definition indicates, there are two ways people may be classified as unemployed. They are either looking for work (job seekers) or they have been temporarily separated from a job (persons on layoff). Job seekers must have engaged in active job search during the above mentioned 4-week period in order to be classified as unemployed. (Active methods are defined as job search methods that have the potential to result in a job offer without any further action on the part of the job seeker.) Examples of active job search methods include going to an employer directly or to a public or private employment agency, seeking assistance from friends or relatives, placing or answering ads, or using some other active method. Examples of the ‘‘other active’’ category include being on a union or professional register, obtaining assistance from a community organization, or waiting at a designated labor pickup point. Passive methods, which do not qualify as job search, include reading (as opposed to answering or placing) ‘‘help wanted’’ ads and taking a job training course. The response categories for active and passive methods are clearly delineated in separately labeled columns on the interviewers’ computer screens. Job search methods are identified by the following questions: ‘‘Have you been doing anything to find work during the last 4 weeks?’’ and ‘‘What are all of the things you have done to find work during the last 4 weeks?’’ To ensure that respondents report all of the methods of job search used, interviewers ask ‘‘Anything else?’’ after the initial or a subsequent job search method is reported. Persons ‘‘on layoff’’ are defined as those who have been separated from a job to which they are waiting to be recalled (i.e., their layoff status is temporary). In order to measure layoffs accurately, the questionnaire determines whether people reported to be on layoff did in fact have an expectation of recall; that is, whether they had been given a specific date to return to work or, at least, had been given an indication that they would be recalled within the next 6 months. As previously mentioned, persons on layoff need not be actively seeking work to be classified as unemployed. Reason for unemployment. Unemployed individuals are categorized according to their status at the time they became unemployed. The categories are: (1) Job losers: a group comprised of (a) persons on temporary layoff from a job to which they expect to be recalled and (b) permanent job losers, whose employment ended involuntarily and who began looking for work; (2)Job leavers: persons who quit or otherwise terminated their employment voluntarily and began looking for work; (3)Persons who completed temporary jobs: persons who began looking for work after their jobs ended; (4)Reentrants: persons who previously worked but were out of the labor force prior to beginning their job search; (5)New entrants: persons who never worked before and who are entering the labor force for the first time. Each of these five categories of unemployed can be expressed as a proportion of the entire civilian labor force or as a proportion of the total unemployed. Prior to 1994, new entrants were defined as job seekers who had never worked at a full-time job lasting 2 weeks or longer; reentrants were defined as job seekers who had held a full-time job for at least 2 weeks and had then spent some time out of the labor force prior to their most recent period of job search. These definitions have been modified to encompass any type of job, not just a full-time job of at least 2 weeks duration. Thus, new entrants are now defined as job seekers who have never worked at all, and reentrants are job seekers who have worked before but not immediately prior to their current job search. Duration of unemployment. The duration of unemployment is expressed in weeks. For individuals who are classified as unemployed because they are looking for work, the duration of unemployment is the length of time (through the current reference week) that they have been looking for work. For persons on layoff, the duration of unemployment is the number of full weeks (through the reference week) they have been on layoff. The questions used to classify an individual as unemployed can be found in Figure 5–1. Not in the labor force. Included in this group are all persons in the civilian noninstitutional population who are neither employed nor unemployed. Information is collected on their desire for and availability to take a job at the time of the CPS interview, job search activity in the prior year, and reason for not looking in the 4-week period prior to the survey week. This group includes discouraged workers, 5–6 defined as persons not in the labor force who want and are available for a job and who have looked for work sometime in the past 12 months (or since the end of their last job if they held one within the past 12 months), but are not currently looking, because they believe there are no jobs available or there are none for which they would qualify. (Specifically, the main reason identified by discouraged workers for not recently looking for work is one of the following: Believes no work available in line of work or area; could not find any work; lacks necessary schooling, training, skills, or experience; employers think too young or too old; or other types of discrimination.) Data on a larger group of persons outside the labor force, one that includes discouraged workers as well as persons who desire work but give other reasons for not searching (such as childcare problems, family responsibilities, school, or transportation problems) are also published regularly. This group is made up of persons who want a job, are available for work, and have looked for work within the past year. This group is generally described as having some marginal attachment to the labor force. Prior to January 1994, questions about the desire for work among those who were not in the labor force were asked only of a quarter of the sample. Since 1994, these questions have been asked of the full CPS sample. Consequently since 1994, estimates of the number of discouraged workers as well as those with a marginal attachment to the labor force are published monthly rather than just quarterly. Additional questions relating to individuals’ job histories and whether they intend to seek work continue to be asked only of persons not in the labor force who are in the sample for either their fourth or eighth month. Data based on these questions are tabulated only on a quarterly basis. Estimates of the number of employed and unemployed are used to construct a variety of measures. These measures include: • Labor force: The labor force consists of all persons 16 years of age or older classified as employed or unemployed in accordance with the criteria described above. • Unemployment rate: The unemployment rate represents the number of unemployed as a percentage of the labor force. • Labor force participation rate: The labor force participation rate is the proportion of the age-eligible population that is in the labor force. • Employment-population ratio: The employment-population ratio represents the proportion of the age-eligible population that is employed. Figure 5–1. Questions for Employed and Unemployed 1. Does anyone in this household have a business or a farm? 2. LAST WEEK, did you do ANY work for (either) pay (or profit)? Parenthetical filled in if there is a business or farm in the household. If 1 is ‘‘yes’’ and 2 is ‘‘no,’’ ask 3. If 1 is ‘‘no’’ and 2 is ‘‘no,’’ ask 4. 3. LAST WEEK, did you do any unpaid work in the family business or farm? If 2 and 3 are both ‘‘no,’’ ask 4. 4. LAST WEEK, ( in addition to the business,) did you have a job, either full or part time? Include any job from which you were temporarily absent. Parenthetical filled in if there is a business or farm in the household. If 4 is ‘‘no,’’ ask 5. 5. LAST WEEK, were you on layoff from a job? If 5 is ‘‘yes,’’ ask 6. If 5 is ‘‘no,’’ ask 8. 6. Has your employer given you a date to return to work? If ‘‘no,’’ ask 7. 7. Have you been given any indication that you will be recalled to work within the next 6 months? If ‘‘no,’’ ask 8. 8. Have you been doing anything to find work during the last 4 weeks? If ‘‘yes,’’ ask 9. 9. What are all of the things you have done to find work during the last 4 weeks? Individuals are classified as employed if they say ‘‘yes’’ to questions 2, 3 (and work 15 hours or more in the reference week or receive profits from the business/farm), or 4. Individuals who are available to work are classified as unemployed if they say ‘‘yes’’ to 5 and either 6 or 7, or if they say ‘‘yes’’ to 8 and provide a job search method that could have brought them into contact with a potential employer in 9. 5–7 REFERENCES U.S. Department of Commerce, U.S. Census Bureau (1992),Alphabetical Index of Industries and Occupations, January 1992. U.S. Department of Labor, Bureau of Labor Statistics (1997), BLS Handbook of Methods, Bulletin 2490, April 1997. 6–1 Chapter 6. Design of the Current Population Survey Instrument INTRODUCTION Chapter 5 describes the concepts and definitions underpinning the Current Population Survey (CPS) data collection instrument. The current survey instrument is the result of an 8-year research and development effort to redesign the data collection process and to implement previously recommended changes in the underlying labor force concepts. The changes described here were introduced in January 1994. Of particular interest in the redesign was the wording of the labor force portion of the questionnaire and the data collection methods. For virtually every labor force concept the current questionnaire wording is different from what was used previously. Data collection has also been redesigned so that the instrument is fully automated and is administered either on a laptop computer or from a centralized telephone facility. MOTIVATION FOR THE REDESIGN OF THE QUESTIONNAIRE COLLECTING OF LABOR FORCE DATA The CPS produces some of the most important data used to develop economic and social policy in the United States. Although the U.S. economy and society have undergone major shifts in recent decades, the survey questionnaire had remained unchanged since 1967. The growth in the number of service-sector jobs and the decline in the number of factory jobs have been two key developments. Other changes include the more prominent role of women in the workforce and the growing popularity of alternative work schedules. The 1994 revisions were designed to accommodate these changes. At the same time, the redesign took advantage of major advances in survey research methods and data collection technology. Furthermore, recommendations for changes in the CPS had been proposed in the late 1970s and 1980s, primarily by the Presidentially-appointed National Commission on Employment and Unemployment Statistics (commonly referred to as the Levitan Commission). No changes were implemented at that time, however, due to the lack of funding for a large overlap sample necessary to assess the effect of the redesign. In the mid-1980s, funding for an overlap became available. Spurred by all of these developments, it was decided to redesign the CPS questionnaire. OBJECTIVES OF THE REDESIGN There were five main objectives in redesigning the CPS questionnaire: (1) to better operationalize existing definitions and reduce reliance on volunteered responses; (2) to reduce the potential for response error in the questionnairerespondent-interviewer interaction and, hence, improve measurement of CPS concepts; (3) to implement minor definitional changes within the labor force classifications; (4) to expand the labor force data available and improve longitudinal measures; and (5) to exploit the capabilities of computer assisted interviewing for improving data quality and reducing respondent burden (see Copeland and Rothgeb (1990) for a fuller discussion). Enhanced Accuracy In redesigning the CPS questionnaire, the BLS and Census Bureau attempted to develop questions that would lessen the potential for response error. Among the approaches used were: (1) shorter, clearer question wording; (2) splitting complex questions into two or more separate questions; (3) building concept definitions into question wording; (4) reducing reliance on volunteered information; (5) explicit and implicit strategies for the respondent to provide numeric data on hours, earnings, etc.; and (6) the use of revised precoded response categories for open-ended questions (Copeland and Rothgeb, 1990.) Definitional Changes The labor force definitions used in the CPS have undergone only minor modifications since the survey’s inception in 1940, and with only one exception, the definitional changes and refinements made in 1994 were uniformly small. The one major definitional change dealt with the concept of discouraged workers; that is, persons outside the labor force who are not looking for work because they believed that there are no jobs available for them. As was noted in Chapter 5, discouraged workers are similar to the unemployed in that they are not working and want a job. Since they are not conducting an active job search, however, they do not satisfy a key element necessary to be classified as unemployed. The former measurement of discouraged workers was criticized by the Levitan Commission as too arbitrary and subjective. It was deemed arbitrary that assumptions about a person’s availability for work were made from responses to a question on why the respondent was not currently looking for work. It was considered too subjective that the measurement was based on a person’s stated desire for a job regardless of whether the individual had ever looked for work. A new, more 6– 2 precise measurement of discouraged workers was introduced that specifically required that a person had searched for a job during the prior 12 months and was available for work. The new questions also enable estimation of the number of persons outside the labor force who, although they cannot be precisely defined as discouraged, satisfy many of the same criteria as discouraged workers and thus show some marginal attachment to the labor force. Other minor changes were made to fine-tune the definitions of unemployment, categories of unemployed persons, and persons who were employed part time for economic reasons. New Labor Force Information Introduced With the revised questionnaire, several types of labor force data became available regularly for the first time. For example, information is now available each month on employed persons who have more than one job. Also, by collecting information on the number of hours multiple jobholders work on their main job and secondary jobs separately, estimates of the number of workers who combined two or more part-time jobs into a full-time workweek, and the number of full- and part-time jobs in the economy can be made. The inclusion of the multiple jobholding question also improves the accuracy of answers to the questions on hours worked and facilitates comparisons of employment estimates from the CPS with those from the Current Employment Statistics program, the survey of nonfarm business establishments (for a discussion of CES survey see BLS Handbook of Methods, Bureau of Labor Statistics, April 1997). In addition, beginning in 1994, monthly data on the number of hours usually worked per week and data on the number of discouraged workers are collected from the entire CPS sample rather than from the one-quarter sample of respondents in their fourth or eighth monthly interviews. Computer Technology A key feature of the redesigned CPS is that the new questionnaire was designed for a computer assisted interview. Prior to the redesign, CPS data were primarily collected using a paper-and-pencil form. In an automated environment, most interviewers now use laptop computers on which the questionnaire has been programmed. This mode of data collection is known as computer assisted personal interviewing (CAPI). Interviewers ask the survey questions as they appear on the screen of the laptop and then type the responses directly into the computer. A portion of sample households—currently about 18 percent—is interviewed via computer assisted telephone interviewing (CATI) from three centralized telephone centers located in Hagerstown, MD; Tucson, AZ; and Jeffersonville, IN. Automated data collection methods allow greater flexibility in questionnaire designing than paper-and-pencil data collection methods. Complicated skips, respondentspecific question wording, and carry-over of data from one interview to the next are all possible with an automated environment. For example, automated data collection permits capabilities such as (1) the use of dependent interviewing, that is the carrying over of information from the previous month — for industry, occupation, and duration of unemployment data and (2) the use of respondent-specific question wording based on the person’s name, age, and sex, answers to prior questions, household characteristics, etc. Furthermore, by automatically bringing up the next question on the interviewer’s screen, computerization reduces the probability of an interviewer asking the wrong set of questions. The computerized questionnaire also permits the inclusion of several built-in editing features, including automatic checks for internal consistency, checking for unlikely responses, and verification of answers. With these built-in editing features, errors can be caught and corrected during the interview itself. Evaluation and Selection of Revised Questions Planning for the revised CPS questionnaire began in 1986, when BLS and the Census Bureau convened a task force to identify areas for improvement. Studies employing methods from the cognitive sciences were conducted to test possible solutions to the problems identified. These studies included interviewer focus groups, respondent focus groups, respondent debriefings, a test of interviewers’ knowledge of concepts, in-depth cognitive laboratory interviews, response categorization research, and a study of respondents’ comprehension of alternative versions of labor force questions (Campanelli, Martin, and Rothgeb, 1991; Edwards, Levine, and Cohany, 1989; Fracasso, 1989; Gaertner, Cantor, and Gay,1989; Martin, 1987; Palmisano, 1989). In addition to the qualitative research mentioned above, the revised questionnaire, developed jointly by Census Bureau and BLS staff, used information collected in a large two-phase test of question wording. During Phase I, two alternative questionnaires were tested with the then official questionnaire as the control. During Phase II, one alternative questionnaire was tested with the control. The questionnaires were tested using computer assisted telephone interviewing and a random digit dialing sample (CATI/RDD). During these tests, interviews were conducted from the centralized telephone interviewing facilities of the Census Bureau. Both quantitative and qualitative information was used in the two phases to select questions, identify problems, and suggestion solutions. Analyses were based on information from item response distributions, respondent and interviewer debriefing data, and behavior coding of interviewer/ respondent interactions. Item Response Analysis The primary use of item response analysis was to determine whether different questionnaires produce different response patterns, which may, in turn, have affected 6– 3 the labor force estimates. Unedited data were used for this analysis. Statistical tests were conducted to ascertain whether response patterns for different questionnaire versions differed significantly. The statistical tests were adjusted to take into consideration the use of a nonrandom clustered sample, repeated measures over time, and multiple persons in a household. Response distributions were analyzed for all items on the questionnaires. The response distribution analysis indicated the degree to which new measurement processes produced different patterns of responses. Data gathered using the other methods outlined above also aided in the interpretation of the response differences observed. (Response distributions were calculated on the basis of persons who responded to the item, excluding those for whom a ‘‘don’t know’’ or ‘‘refused’’ was obtained.) Respondent Debriefings At the end of the interview, respondent debriefing questions were administered to a sample of respondents to measure respondent comprehension and response formulation. Question-specific probes were used to ascertain whether certain words, phrases, or concepts were understood by respondents in the manner intended. (Esposito et al., 1992.). From these data, indicators of how respondents interpret and answer the questions and some measures of response accuracy were obtained. The debriefing questions were tailored to the respondent and depended on the path the interview had taken. Two forms of respondent debriefing questions were administered— probing questions and vignette classification. For example, those who did not indicate that they had done any work in the main survey were asked the direct probe ‘‘LAST WEEK did you do any work at all, even for as little as 1 hour?’’ An example of the vignettes respondents received is ‘‘Last week, Amy spent 20 hours at home doing the accounting for her husband’s business. She did not receive a paycheck.’’ Individuals were asked to classify the person in the vignette as working or not working based on the wording of the question they received in the main survey (e.g., ‘‘Would you report her as working last week not counting work around the house?’’ if the respondent received the unrevised questionnaire or ‘‘Would you report her as working for pay or profit last week?’’ if the respondent received the current, revised questionnaire (Martin and Polivka, 1995). Behavior Coding Behavior coding entails monitoring or audiotaping interviews, and recording significant interviewer and respondent behaviors (e.g., minor/major changes in question wording, probing behavior, inadequate answers, requests for clarification). During early stages of testing, behavior coding data were useful in identifying problems with proposed questions. For example, if interviewers frequently reword a question, this may indicate that the question was too difficult to ask as worded; respondents’ requests for clarification may indicate that they were experiencing comprehension difficulties; and interruptions by respondents may indicate that a question was too lengthy (Esposito et al., 1992). During later stages of testing, the objective of behavior coding was to determine whether the revised questionnaire improved the quality of interviewer/respondent interactions as measured by accurate reading of the questions and adequate responses by respondents. Additionally, results from behavior coding helped identify areas of the questionnaire that would benefit from enhancements to interviewer training. Interviewer Debriefings The primary objective of interviewer debriefing was to identify areas of the revised questionnaire or interviewer procedures that were problematic for interviewers or respondents. The information collected during the debriefings was useful in identifying which questions needed revision. The information was also useful in modifying initial interviewer training and the interviewer manual. A secondary objective of interviewer debriefing was to obtain information about the questionnaire, interviewer behavior, or respondent behavior that may help explain differences observed in the labor force estimates from the different measurement processes. Two different techniques were used to debrief interviewers. The first was the use of focus groups at the centralized telephone interviewing facilities and in geographically dispersed regional offices. The focus groups were conducted after interviewers had at least 3 to 4 months experience using the revised CPS instrument. Approximately 8 to 10 interviewers were selected for each focus group. Interviewers were selected to represent different levels of experience and ability. The second technique was the use of a self-administered standardized interviewer debriefing questionnaire. Once problematic areas of the revised questionnaire were identified through the focus groups, a standardized debriefing questionnaire was developed and administered to all interviewers. HIGHLIGHTS OF THE QUESTIONNAIRE REVISION A copy of the questionnaire can be obtained from the Internet. The address for this file is: http://www.bls.census.gov/cps/bqestair.htm General Definition of reference week. In the interviewer debriefings that were conducted in 13 different geographic areas during 1988, interviewers reported that the current question 19 (Q19, major activity question) ‘‘What were you 6– 4 doing most of LAST WEEK, working or something else?’’ was unwieldy and sometimes misunderstood by respondents. In addition to not always understanding the intent of the question, respondents were unsure what was meant by the time period ‘‘last week’’ (BLS, 1988). A respondent debriefing conducted in 1988 found that only 17 percent of respondents had definitions of ‘‘last week’’ that matched the CPS definition of Sunday through Saturday of the reference week. The majority (54 percent) of respondents defined ‘‘last week’’ as Monday through Friday (Campanelli et al., 1991). In the revised questionnaire, an introductory statement was added with the reference period clearly stated. The introductory statement reads as follows: ‘‘I am going to ask a few questions about work-related activities LAST WEEK. By last week I mean the week beginning on Sunday, August 9 and ending Saturday, August 15.’’ This statement makes the reference period more explicit to respondents. Additionally, the former Q19 has been deleted from the questionnaire. In the past, Q19 had served as a preamble to the labor force questions, but in the revised questionnaire the survey content is defined in the introductory statement, which also defines the reference week. Direct question on presence of business. The definition of employed persons includes those who work without pay for at least 15 hours per week in a family business. In the former questionnaire, there was no direct question on the presence of a business in the household. Such a question is included in the revised questionnaire. This question is asked only once for the entire household prior to the labor force questions. The question reads as follows: ‘‘Does anyone in this household have a business or a farm?’’ With this question it can be determined whether a business exists and who in the household owns the business. The primary purpose of this question is to screen for households that may have unpaid family workers, not to obtain an estimate of household businesses. (See Rothgeb et al. (1992), Copeland and Rothgeb (1990), and Martin (1987) for a fuller discussion of the need for a direct question on presence of a business.) For households that have a family business, direct questions are asked about unpaid work in the family business of all persons who were not reported as working last week. BLS produces monthly estimates of unpaid family workers who work 15 or more hours per week. Employment Related Revisions Revised ‘‘At Work’’ question. Having a direct question on the presence of a family business not only improved the estimates of unpaid family workers, but also permitted a revision of the ‘‘at work’’ question. In the former questionnaire, the ‘‘at work’’ question read: ‘‘LAST WEEK, did you do any work at all, not counting work around the house?’’ In the revised questionnaire, the wording reads, ‘‘LAST WEEK did you do ANY work for (either) pay (or profit)?’’ (The parentheticals in the question are read only when there is a business or farm in the household.) The revised wording ‘‘work for pay (or profit)’’ better captures the concept of work that BLS is attempting to measure. (See Martin (1987) or Martin and Polivka (1995) for a fuller discussion of problems with the concept of ‘‘work.’’) Direct question on multiple jobholding. In the former questionnaire, the actual hours question read: ‘‘How many hours did you work last week at all jobs?’’ During the interviewer debriefings conducted in 1988, it was reported that respondents do not always hear the last phrase ‘‘at all jobs.’’ Some respondents who work at two jobs may have only reported hours for one job (BLS, 1988). In the revised questionnaire, a question is included at the beginning of the hours series to determine whether or not the person is a multiple jobholder. A followup question also asks for the number of jobs the multiple jobholder has. Multiple jobholders are asked about their hours on their main job and other job(s) separately to avoid the problem of multiple jobholders not hearing the phrase ‘‘at all jobs.’’ These new questions also allow monthly estimates of multiple jobholders to be produced. Hours series. The old question on ‘‘hours worked’’ read: ‘‘How many hours did you work last week at all jobs?’’ If a person reported 35-48 hours worked, additional followup probes were asked to determine whether the person worked any extra hours or took any time off. Interviewers were instructed to correct the original report of actual hours, if necessary, based on responses to the probes. The hours data are important because they are used to determine the sizes of the full-time and part-time labor forces. It is unknown whether respondents reported exact actual hours, usual hours, or some approximation of actual hours. In the revised questionnaire, a revised hours series was adopted. An anchor-recall estimation strategy was used to obtain a better measure of actual hours and to address the issue of work schedules more completely. For multiple jobholders, it also provides separate data on hours worked at a main job and other jobs. The revised questionnaire first asks about the number of hours a person usually works at the job. Then, separate questions are asked to determine whether a person worked extra hours, or fewer hours, and finally a question is asked on the number of actual hours worked last week. It also should be noted that the new hours series allows monthly estimates of usual hours worked to be produced for all employed persons. In the former questionnaire, usual hours were obtained only in the outgoing rotation for employed private wage and salary workers and were available only on a quarterly basis. Industry and occupation – dependent interviewing. Prior to the revision, CPS industry and occupation (I&O) data were not always consistent from month-to-month for the same person in the same job. These inconsistencies 6– 5 arose, in part, because the household respondent frequently varies from 1 month to the next. Furthermore, it is sometimes difficult for a respondent to describe an occupation consistently from month-to-month. Moreover, distinctions at the three-digit occupation and industry level, that is, at the most detailed classification level, can be very subtle. To obtain more consistent data and make full use of the automated interviewing environment, dependent interviewing for the I&O questions was implemented in the revised questionnaire for month-in-sample 2-4 households and month-in-sample 6-8 households. Dependent interviewing uses information collected during the previous month’s interview in the current month’s interview. (Different variations of dependent interviewing were evaluated during testing. See Rothgeb et al. (1991) for more detail.) In the revised CPS, respondents are provided with the name of their employer as of the previous month and asked if they still work for that employer. If they answer ‘‘no,’’ respondents are asked the independent questions on industry and occupation. If they answer ‘‘yes,’’ respondents are asked ‘‘Have the usual activities and duties of your job changed since last month?’’ If individuals say ‘‘yes,’’ their duties have changed, these individuals are then asked the independent questions on occupation, activities or duties, and class-ofworker. If their duties have not changed, individuals are asked to verify the previous month’s description through the question ‘‘Last month, you were reported as (previous month’s occupation or kind of work performed) and your usual activities were (previous month’s duties). Is this an accurate description of your current job?’’ If they answer ‘‘yes,’’ the previous month’s occupation and class-of-worker are brought forward and no coding is required. If they answer ‘‘no,’’ persons are asked the independent questions on occupation activities and duties and class-of-worker. This redesign permits a direct inquiry about job change before the previous month’s information is provided to the respondent. Earnings. The earnings series in the revised questionnaire is considerably different from that in the former questionnaire. In the former questionnaire, persons were asked whether they were paid by the hour, and if so, what the hourly wage was. All wage and salary workers were then asked for their usual weekly earnings. In the former version, earnings could be reported as weekly figures only, even though that may not have been the easiest way for the respondent to recall and report earnings. Data from early tests indicated that a small proportion (14 percent) (n=853) of nonhourly wage workers were paid at a weekly rate and less than 25 percent (n=1623) of nonhourly wage workers found it easiest to report earnings as a weekly amount. In the revised questionnaire, the earnings series is designed to first request the periodicity for which the respondent finds it easiest to report earnings and then request an earnings amount in the specified periodicity, as displayed below. The wording of questions requesting an earnings amount is tailored to the periodicity identified earlier by the respondent. (Because data on weekly earnings are published quarterly by BLS, earnings data provided by respondents in periodicities other than weekly are converted to a weekly earnings estimate later during processing operations.) Revised Earnings Series (Selected items) 1. For your (MAIN) job, what is the easiest way for you to report your total earnings BEFORE taxes or other deductions: hourly, weekly, annually, or on some other basis? 2. Do you usually receive overtime pay, tips, or commissions (at your MAIN job)? 3. (Including overtime pay, tips and commissions,) What are your usual (weekly, monthly, annual, etc.) earnings on this job, before taxes or other deductions? As can be seen from the revised questions presented above, other revisions to the earnings series include a specific question to determine whether a person usually receives overtime pay, tips, or commissions. If so, a preamble precedes the earnings questions that reminds respondents to include overtime pay, tips, and commissions when reporting earnings. If a respondent reports that it is easiest to report earnings on an hourly basis, then a separate question is asked regarding the amount of overtime pay, tips and commissions usually received, if applicable. An additional question is asked of persons who do not report that it is easiest to report their earnings hourly. The question determines whether they are paid at an hourly rate and is displayed below. This information, which allows studies of the effect of the minimum wage, is used to identify hourly wage workers. ‘‘Even though you told me it is easier to report your earnings annually, are you PAID AT AN HOURLY RATE on this job?’’ Unemployment Related Revisions Persons on layoff - direct question. Previous research (Rothgeb, 1982; Palmisano, 1989) demonstrated that the former question on layoff status—‘‘Did you have a job or business from which you were temporarily absent or on layoff LAST WEEK?’’—was long, awkwardly worded, and frequently misunderstood by respondents. Some respondents heard only part of the question, while others thought that they were being asked whether they had a business. In an effort to reduce response error, the revised questionnaire includes two separate direct questions about layoff and temporary absences. The layoff question is: ‘‘LAST WEEK, were you on layoff from a job?’’ Questions asked later screen out those persons who do not meet the criteria for layoff status. 6– 6 Persons on layoff— expectation of recall. The official definition of layoff includes the criterion of an expectation of being recalled to the job. In the former questionnaire, persons reported to be on layoff were never directly asked whether they expected to be recalled. In an effort to better capture the existing definition, persons reported to be on layoff in the revised questionnaire are asked ‘‘Has your employer given you a date to return to work?’’ Persons who respond that their employers have not given them a date to return are asked ‘‘Have you been given any indication that you will be recalled to work within the next 6 months?’’ If the response is positive, their availability is determined by the question, ‘‘Could you have returned to work LAST WEEK if you had been recalled?’’ Persons who do not meet the criteria for layoff are asked the job search questions so they still have an opportunity to be classified as unemployed. Job search methods. The concept of unemployment requires, among other criteria, an active job search during the past 4 weeks. In the former questionnaire, the following question was asked to determine whether a person conducted an active job search. ‘‘What has ... been doing in the last 4 weeks to find work?’’ checked with— • public employment agency • private employment agency • employer directly • friends and relatives • placed or answered ads • nothing • other Interviewers were instructed to code all passive job search methods into the ‘‘nothing’’ category. This included such activities as looking at newspaper ads, attending job training courses, and practicing typing. Only active job search methods for which no appropriate response category exists were to be coded as ‘‘other.’’ In the revised questionnaire, several additional response categories were added and the response options reordered and reformatted to more clearly represent the distinction between active job search methods and passive methods. The revisions to the job search methods question grew out of concern that interviewers were confused by the precoded response categories. This was evident even before the analysis of the CATI/RDD test. Martin (1987) conducted an examination of verbatim entries for the ‘‘other’’ category and found that many of the ‘‘other’’ responses should have been included in the ‘‘nothing’’ category instead. The analysis also revealed responses coded as ‘‘other’’ that were too vague to determine whether or not an active job search method had been undertaken. Fracasso (1989) also concluded that the current set of response categories was not adequate for accurate classification of active and passive job search methods. During development of the revised questionnaire, two additional passive categories were included: (1)‘‘looked at ads’’ and (2) ‘‘attended job training programs/courses,’’ and two additional active categories were included: (1)‘‘contacted school/university employment center’’ and (2)‘‘checked union/ professional registers.’’ Later research also demonstrated that interviewers had difficulty coding relatively common responses such as ‘‘sent out resumes’’ and ‘‘went on interviews’’; thus the response categories were further expanded to reflect these common job search methods. Duration of job search and layoff. The duration of unemployment is an important labor market indicator published monthly by BLS. In the former questionnaire, this information was collected by the question: ‘‘How many weeks have you been looking for work?’’ This wording forced people to report in a periodicity that may not have been meaningful to them, especially for the longer-term unemployed. Also, asking for the number of weeks (rather than months) may have led respondents to underestimate the duration. In the revised questionnaire, the question reads: ‘‘As of the end of LAST WEEK, how long had you been looking for work?’’ Respondents can select the periodicity themselves and interviewers are able to record the duration in weeks, months, or years. To avoid clustering of answers around whole months, the revised questionnaire also asks persons who report duration in whole months (between 1 and 4 months) a followup question to obtain an estimated duration in weeks: ‘‘We would like to have that in weeks, if possible. Exactly how many weeks had you been looking for work?’’ The purpose of this is to lead people to report the exact number of weeks instead of multiplying their monthly estimates by four as was done in an earlier test and may have been done in the former questionnaire. As mentioned earlier, the CATI/CAPI technology makes it possible to automatically update duration of job search and layoff for persons who are unemployed in consecutive months. For persons reported to be looking for work for 2 consecutive months or longer, the previous month’s duration is updated without re-asking the duration questions. For persons on layoff for at least 2 consecutive months, the duration of layoff is also automatically updated. This revision was made to reduce respondent burden and enhance the longitudinal capability of the CPS. This revision also will produce more consistent month-to-month estimates of duration. Previous research indicates that only about 25 percent of those unemployed in consecutive months who received the former questionnaire (where duration was collected independently each month) increased their reported durations by 4 weeks plus or minus a week. (Polivka and Rothgeb, 1993; Polivka and Miller, 1995). A very small bias is introduced when a person has a brief (less than 3 or 4 6– 7 weeks) period of employment in between surveys. However, testing revealed that only 3.2 percent of those who had been looking for work in consecutive months said that they had worked in the interlude between the surveys. Furthermore, of those who had worked, none indicated that they had worked for 2 weeks or more. Revisions Included in ‘‘Not in the Labor Force’’ Related Questions Response options of retired, disabled, and unable to work at key labor force items. In the former questionnaire, when individuals reported they were retired in response to any of the labor force items, the interviewer was required to continue asking whether they worked last week, were absent from a job, were looking for work, and, in the outgoing rotation, when they last worked and their job histories. Interviewers commented that elderly respondents frequently complained that they had to respond to questions that seemed to have no relevance to their own situations. In an attempt to reduce respondent burden, a response category of ‘‘retired’’ was added to each of the key labor force status questions in the revised questionnaire. If individuals 50 years of age or older volunteer that they are retired, they are immediately asked a question inquiring whether they want a job. If they indicate that they want to work, they are then asked questions about looking for work and the interview proceeds as usual. If they do not want to work, the interview is concluded and they are classified as not in the labor force - retired. (If they are in the outgoing rotation, an additional question is asked to determine whether they worked within the last 12 months. If so, the industry and occupation questions are asked about the last job held.) A similar change has been made in the revised questionnaire to reduce the burden for individuals reported to be ‘‘unable to work’’ or ‘‘disabled.’’ (Individuals who may be ‘‘unable to work’’ for a temporary period of time may not consider themselves as ‘‘disabled’’ so both response options are provided.) If a person is reported to be ‘‘disabled’’ or ‘‘unable to work’’ at any of the key labor force classification items, a followup question is asked to determine whether he/she can do any gainful work during the next 6 months. Different versions of the followup probe are used depending on whether the person is disabled or unable to work. Dependent interviewing for persons reported to be retired, disabled, or unable to work. The revised questionnaire also is designed to use dependent interviewing for persons reported to be retired, disabled, or unable to work. An automated questionnaire increases the ease with which information from the previous month’s interview can be used during the current month’s interview. Once it is reported that the person did not work during the current month’s reference week, the previous month’s status of retired (if a person is 50 years of age or older), disabled, or unable to work is verified, and the regular series of labor force questions is not asked. This revision reduces respondent and interviewer burden. Discouraged workers. The implementation of the Levitan Commission’s recommendations on discouraged workers resulted in one of the major definitional changes in the 1994 redesign. The Levitan Commission criticized the former definition because it was based on a subjective desire for work and questionable inferences about an individual’s availability to take a job. As a result of the redesign, two requirements were added: For persons to qualify as discouraged, they must have engaged in some job search within the past year (or since they last worked if they worked within the past year), and they must currently be available to take a job. (Formerly, availability was inferred from responses to other questions; now there is a direct question.) Data on a larger group of persons outside the labor force (one that includes discouraged workers as well as persons who desire work, but give other reasons for not searching, such as child care problems, family responsibilities, school, or transportation problems) also are published regularly. This group is made up of persons who want a job, are available for work, and have looked for work within the past year. This group is generally described as having some marginal attachment to the labor force. Also beginning in 1994, questions on this subject are asked of the full CPS sample rather than limited to a quarter of the sample, permitting estimates of the number of discouraged workers to be published monthly rather than quarterly. From data available during the tests of the revised questionnaire, it is clear that improvements in data quality for labor force classification have been obtained as a result of the redesign of the CPS questionnaire, and in general, measurement error has been reduced. Data from respondent debriefings, interviewer debriefings, and response analysis demonstrated that the revised questions are more clearly understood by respondents and the potential for labor force misclassification is reduced. Results from these tests formed the basis for the design of the final revised version of the questionnaire. This revised version was tested in a separate year and a half parallel survey prior to implementation as the official survey in January 1994. In addition, from January 1994 through May 1994, the unrevised procedures were used with the parallel survey sample. These parallel surveys were conducted to assess the effect of the redesign on national labor force estimates. Estimates derived from the initial year and a half of the parallel survey indicated that the redesign might increase the unemployment rate by 0.5 percentage points. However, subsequent analysis using the entire parallel survey indicates that the redesign did not have a statistically significant effect on the unemployment rate. (Analysis of the effect of the redesign on the unemployment rate and other labor force estimates can be found in Cohany, Polivka, and 6– 8 Rothgeb (1994). Analysis of the redesign on the unemployment rate along with a wide variety of other labor force estimates using data from the entire parallel survey can be found in Polivka and Miller (1995).) CONTINUOUS TESTING AND IMPROVEMENTS OF THE CURRENT POPULATION SURVEY AND ITS SUPPLEMENTS Experience gained during the redesign of the CPS has demonstrated the importance of testing questions and monitoring data quality. The experience, along with contemporaneous advances in research on questionnaire design, also has helped inform the development of methods for testing new or improved questions for the basic CPS and its periodic supplements (Martin, 1987; Oksenberg, Bischoping, K., Cannell and Kalton, 1991; Campanelli, Martin, and Rothgeb, 1991; Esposito et al., 1992; and Forsyth and Lessler, 1991). Methods to continuously test questions and assess data quality are discussed in Chapter 15. It is important to note, however, that despite the benefits of adding new questions and improving existing ones, changes in the CPS should be approached cautiously and the effects measured and evaluated. When possible, methods to bridge differences caused by changes or techniques to avoid the disruption of historical series should be included in the testing of new or revised questions. REFERENCES Bischoping, K. (1989), An Evaluation of Interviewer Debriefings in Survey Pretests, In C. Cannell et al. (eds.), New Techniques for Pretesting Survey Questions, Chapter 2. Campanelli, P.C., Martin, E.A., and Rothgeb, J.M. (1991), The Use of Interviewer Debriefing Studies as a Way to Study Response Error in Survey Data, The Statistician, vol. 40, pp. 253-264. Cohany, S.R., Polivka, A.E., and Rothgeb, J.M., (1994) Revisions in the Current Population Survey Effective January 1994, Employment and Earnings, February 1994 vol. 41 no. 2 pp. 1337. Copeland, K. and Rothgeb, J.M. (1990), Testing Alternative Questionnaires for the Current Population Survey, Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 63-71. Edwards. S.W., Levine, R., and Cohany, S.R., (1989), Procedures for Validating Reports of Hours Worked and for Classifying Discrepancies Between Reports and Validation Totals, Proceedings of the Section on Survey Research Methods, American Statistical Association. Esposito, J.L., Rothgeb,J.M., Polivka, A.E., Hess, J., and Campanelli, P.C. (1992), Methodologies for Evaluating Survey Questions: Some Lessons From the Redesign of the Current Population Survey, Paper presented at the International Conference on Social Science Methodology, Trento, Italy, June, 1992. Forsyth, B.H. and Lessler, J.T. (1991), Cognitive Laboratory Methods: A Taxonomy, in P.P Biemer, R.M. Groves, L.E. Lyberg, N.A. Mathiowetz, and S. Sudman (eds.), Measurement Errors in Surveys, New York: Wiley, pp. 393-418. Fracasso, M.P. (1989), Categorization of Responses to the Open-Ended Labor Force Questions in the Current Population Survey, Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 481-485. Gaertner, G., Cantor, D., and Gay, N. (1989), Tests of Alternative Questions for Measuring Industry and Occupation in the CPS, Proceedings of the Section on Survey Research Methods, American Statistical Association. Martin, E.A., (1987), Some Conceptual Problems in the Current Population Survey, Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 420-424. Martin, E.A. and Polivka, A.E. (1995), Diagnostics for Redesigning Survey Questionnaires: Measuring Work in the Current Population Survey, Public Opinion Quarterly, Winter 1995 vol. 59, no.4, pp. 547-67. Oksenberg, L., Cannell, C., and Kalton, G. (1991), New Strategies for Pretesting Survey Questions, Journal of Official Statistics, vol. 7, no. 3, pp. 349-365. Palmisano, M. (1989), Respondents Understanding of Key Labor Force Concepts Used in the CPS, Proceedings of the Section on Survey Research Methods, American Statistical Association. Polivka, A.E. and Miller, S.M. (1998),The CPS After the Redesign Refocusing the Economic Lens, Labor Statistics Measurement Issues, edited by J. Haltiwanger et al., pp. 249-86. Polivka, A.E. and Rothgeb, J.M. (1993), Overhauling the Current Population Survey: Redesigning the Questionnaire, Monthly Labor Review, September 1993 vol. 116, no. 9, pp. 10-28. Rothgeb, J.M. (1982), Summary Report of July Followup of the Unemployed, Internal memorandum, December 20, 1982. Rothgeb, J.M., Polivka, A.E., Creighton, K.P., and Cohany, S.R. (1992), Development of the Proposed Revised Current Population Survey Questionnaire, Proceedings of the Section on Survey Research Methods, American Statistical Association. U.S. Department of Labor, Bureau of Labor Statistics (1988), Response Errors on Labor Force Questions Based on Consultations With Current Population Survey Interviewers in the United States. Paper prepared for the OECD Working Party on Employment and Unemployment Statistics. U.S. Department of Labor, Bureau of Labor Statistics (1997), BLS Handbook of Methods, Bulletin 2490, April 1997. 7–1 Chapter 7. Conducting the Interviews INTRODUCTION Each month during interview week, field representatives (FRs) and computer assisted telephone interviewers (CATI) attempt to contact and interview a responsible person living in each sample unit to complete a Current Population Survey (CPS) interview. Typically, the week containing the 19th of the month is the interview week. The week containing the 12th is the reference week (i.e., the week about which the labor force questions are asked). In December, the week containing the 12th is used as interview week, provided the reference week (in this case the week containing the 5th) falls entirely within the month of December. As outlined in Chapter 3, households are in sample for 8 months. Each month, one-eighth of the households are in sample for the first time (month-in-sample 1 (MIS-1)), one-eighth for the second time, etc. Because of this, different types of interviews (due to differing MIS) are conducted by each FR within his/her assignment. An introductory letter is sent to each household in sample prior to its first and fifth month interviews. The letter describes the CPS, announces the forthcoming visit, and provides respondents with information regarding their rights under the Privacy Act, the voluntary nature of the survey, and the guarantees of confidentiality for the information they provide. Figure 7–1 shows the introductory letter sent to sample units in the area administered by the Chicago Regional Office. A personal visit interview is required for all first month-in-sample households. This is due to the fact that the CPS sample is strictly a sample of addresses. The Census Bureau has no way of knowing who the occupants of the sample household are, much less whether the household is occupied or eligible for interview. (Note: In some MIS-1 households, telephone interviews are conducted. This occurs when, during the initial personal contact, the respondent requests a telephone interview.) NONINTERVIEWS AND HOUSEHOLD ELIGIBILITY The FR’s first task is to establish the eligibility of the sample address for CPS. There are many reasons an address may not be eligible for interview. The address may have been converted to a permanent business, condemned or demolished, or falls outside the boundaries of the segment for which it was selected. Regardless of the reason, such sample addresses are classified as Type C noninterviews. The Type C units have no chance of becoming eligible for the CPS interview in future months, because the condition is considered permanent. These addresses are stricken from the roster of sample addresses and are never visited again with regard to CPS. All households classified as Type C undergo a full supervisory review of the circumstances surrounding the case before such a determination is made final. Other sample addresses are also ineligible for CPS interview. These are units which are intended for occupancy but are not occupied by any eligible individuals. Reasons for such ineligibility include a vacant housing unit (either for sale or rent), occupied entirely by individuals who are not eligible for a CPS labor force interview (individuals with a usual residence elsewhere (URE), in the Armed Forces, or children under the age of 15), or other reasons why a housing unit is temporarily not occupied. Such units are classified as Type B noninterviews. These noninterviews are considered unavoidable. Type B noninterview units have a chance of becoming eligible for interview in future months, because the condition is considered temporary (e.g., a vacant unit could become occupied). Therefore, Type B units are reassigned to FRs in subsequent months. These sample addresses remain in sample for the entire 8 months that households are eligible for interview. Each succeeding month, an FR visits the unit to determine whether the unit has changed status, and either continues the Type B classification, revises the noninterview classification, or conducts an interview as applicable. Some of these Type B households are found to be eligible for the Housing Vacancy Survey (HVS). See Chapter 11 for a description of the HVS. Additionally, one final set of households is not interviewed for CPS; these are called Type A households. These are households that the FR has determined as eligible for a CPS interview, but for which no useable data were collected. To be eligible the unit has to be occupied by at least one person eligible for an interview (an individual who is a civilian, at least 15 years old, and does not have a usual residence elsewhere). Even though such households are eligible for interview, they are not interviewed because the household members refuse, are absent during the interviewing period, or are unavailable for other reasons. All Type A cases are subject to full supervisory review before such a determination is made final. Every effort is made to keep such noninterviews to a minimum. All Type A cases remain in sample and are assigned for interview in all succeeding months. Even in cases of confirmed refusals 7– 2 Figure 7– 1. Introductory Letter 7– 3 (cases that still refuse to be interviewed despite supervisory attempts to convert the case), the FR must verify that the same household still resides at that address before submitting a Type A noninterview. Figure 7– 2 shows how the three types of noninterviews are classified and the various reasons for reaching such results. Even if a unit is designated as a noninterview, FRs are responsible for collecting information about the unit. Figure 7– 3 lists the main housing unit items that are collected for noninterviews and summarizes each item briefly. Figure 7– 2. Noninterviews: Types A, B, and C Note: See the CPS Interviewing Manual for more details regarding the answer categories under each type of noninterview. See figure 7– 3 for a list of the main housing unit items asked for noninterview and a brief description of what each item asks. TYPE A 1 2 3 4 No one home Temporarily absent Refusal Other occupied TYPE B 1 2 3 4 5 6 7 8 9 Vacant regular Temporarily occupied by persons with usual residence elsewhere Vacant -- storage of household furniture Unfit or to be demolished Under construction, not ready Converted to temporary business or storage Unoccupied tent site or trailer site Permit granted, construction not started Other Type B -- specify TYPE C 1 2 3 4 5 6 8 9 Demolished House or trailer moved Outside segment Converted to permanent business or storage Merged Condemned Unused line of listing sheet Other -- specify INITIAL INTERVIEW If the unit is not classified as a noninterview, the FR initiates the CPS interview. The FR attempts to interview a knowledgeable adult household member (known as the household respondent). The FRs are trained to ask the questions exactly as worded as they appear on the computer screen. The interview begins with the verification of the unit’s address and confirmation of its eligibility for a CPS interview. Part 1 of Figure 7– 4 shows the household items asked at the beginning of the interview. Once this is established, the interview moves into the demographic portion of the instrument. The primary task of this portion of the interview is to establish the household’s roster (the listing of all household residents at the time of the interview). At this point in the interview, the main concern is to establish an individual’s usual place of residence. (These rules are summarized in Figure 7– 5.) For all individuals residing in the household without a usual residence elsewhere, a number of personal and family demographic characteristics are collected. Part 1 of Figure 7– 6 shows the demographic items asked in MIS-1 households. These characteristics are the relationship to the reference person (the person who owns or rents the home), parent or spouse pointers (if applicable), age, sex, marital status, educational attainment, veteran’s status, current Armed Forces status, race, and origin. As discussed in Figure 7– 7, these characteristics are collected in an interactive format that includes a number of consistency edits embedded in the interview itself. The goal is to collect as consistent a set of demographic characteristics as possible. The final steps in this portion of the interview are to verify the accuracy of the roster. To this end, a series of questions is asked to ensure that all household members have been accounted for. Before moving on to the labor force portion of the interview, the FR is prompted to review the roster and all data collected up to this point. A chance is provided for the FR to correct any incorrect or inconsistent information at this time. The instrument then begins the labor force portion of the interview. In a household’s initial interview, a few additional characteristics are collected after completion of the labor force portion of the interview. This information includes a question on family income and data on all household members’ countries of birth (along with the country of birth of his/her father and mother) and, for the foreign born, data on year of entry into the United States and citizenship status. See Part 2 of Figure 7– 6 for a list of these items. After completing the household roster, the FR collects the labor force data described in Chapter 6. The labor force data are collected from all civilian adult individuals (age 15 and older) who do not have a usual residence elsewhere. To the greatest extent possible, the FR attempts to collect this information from each eligible individual him/herself. In the interest of timeliness and efficiency, however, a household respondent (any knowledgeable adult household member) generally is used to collect the data. Just over one-half of the CPS labor force data are collected by self-response. The bulk of the remainder is collected by proxy from the household respondent. Additionally, in certain limited situations, collection of the data from a nonhousehold member is allowed. All such cases receive direct supervisory review before the data are accepted into the CPS processing system. 7– 4 Figure 7– 3. Noninterviews: Main Housing Unit Items Asked for Types A, B, and C Note: This list of items is not all inclusive. The list covers only the main data items and does not include related items used to arrive at the final response (e.g., probes and verification screens). See CPS Interviewing Manual for illustrations of the actual instrument screens for all CPS items. Housing Unit Items for Type A Cases Item Name Item Asks 1 2 3 4 5 TYPEA ABMAIL PROPER ACCES-scr LIVQRT 6 INOTES-1 Which specific kind of Type A is the case. What is the property’s mailing address. If there is any other building in the property (occupied or vacant). If access to the household is direct or through another unit; this item is answered by the interviewer based on observation. What type of housing unit is it (house/apt., mobile home or trailer, etc.); this item is answered by the interviewer based on observation. If the interviewer wants to make any notes about the case that might help with the next interview. Housing Unit Items for Type B Cases Item Name Item Asks 1 2 3 4 5 6 7 TYPEB ABMAIL BUILD FLOOR PROPER ACCES-scr LIVQRT 8 9 SEASON BCINFO 10 INOTES-1 Which specific kind of Type B is the case. What is the property’s mailing address. If there are any other units (occupied or vacant) in the unit. If there are any occupied or vacant living quarters besides this one on this floor. If there is any other building in the property (occupied or vacant). If access to the household is direct or through another unit; this item is answered by the interviewer based on observation. What is the type of housing unit (house/apt., mobile home or trailer, etc.); this item is answered by the interviewer based on observation. If the unit is intended for occupancy year round, by migratory workers, or seasonally. What are the name, title, and phone number of contact who provided Type B or C information; or if the information was obtained by interviewer observation. If the interviewer wants to make any notes about the case that might help with the next interview. Housing Unit Items for Type C Cases Item Name Item Asks 1 2 3 4 TYPEC PROPER ACCES-scr LIVQRT 5 BCINFO 6 INOTES-1 Which specific kind of Type C is the case. If there is any other building in the property (occupied or vacant). If access to the household is direct or through another unit; this item is answered by the interviewer based on observation. What type of housing unit is it (house/apt., mobile home or trailer, etc.); this item is answered by the interviewer based on observation. What are the name, title, and phone number of contact who provided Type B or C information; or if the information was obtained by interviewer observation. If the interviewer wants to make any notes about the case that might help with the next interview. SUBSEQUENT MONTHS’ INTERVIEWS For households in sample for the second, third, and fourth months, the FR has the option of conducting the interview over the telephone. Use of this interviewing mode must be approved by the respondent. Such approval is obtained at the end of the first month’s interview upon completion of the labor force and supplemental, if any, questions. This is the preferred method for collecting the data; it is much more time and cost efficient. We obtain approximately 85 percent of interviews in these 3 monthsin-samples (MIS) via the telephone. See Part 2 of Figure 7– 4 for the questions asked to determine household eligibility and obtain consent for the telephone interview. A personal visit interview attempt is required for the fifthmonth interview. After this initial attempt, a telephone interview may be conducted provided the original household still occupies the sample unit. This is after a sample unit’s 8-month dormant period and is used to reestablish rapport with the household. Fifth-month households are more likely than any other MIS household to be a replacement household. A replacement household is one in which all the previous month’s residents have moved out and been replaced by an entirely different group of residents. This can and does occur in any MIS except for MIS 1 households. As with their MIS 2, 3, and 4 counterparts, households in their sixth, seventh, and eighth MIS are eligible for telephoning interviewing. Once again we collect about 85 percent of these cases via the telephone. The first thing the FR does in subsequent interviews is update the household roster. The instrument presents a screen (or a series of screens for MIS 5 interviews) that 7– 5 Figure 7– 4. Interviews: Main Housing Unit Items Asked in MIS 1 and Replacement Households Note: This list of items is not all inclusive. The list covers only the main data items and does not include related items used to arrive at the final response (e.g., probes and verification screens). See CPS Interviewing Manual for illustrations of the actual instrument screens for all CPS items. Part 1. Items Asked at the Beginning of the Interview Item Name Item Asks 1 2 3 4 5 6 7 8 9 10 INTRO-b NONTYP VERADD MAILAD STRBLT BUILD FLOOR PROPER TENUR-scrn ACCES-scr 11 12 MERGUA LIVQRT 13 14 LIVEAT HHLIV If interviewer wants to classify case as a noninterview. What type of noninterview the case is (A, B, or C); asked depending on answer to INTRO-b. What is the street address (as verification). What is the mailing address (as verification). If the structure was originally built before or after 4/1/90. If there are any other units (occupied or vacant) in the building. If there are any occupied or vacant living quarters besides the sample unit on the same floor. If there is any other building in the property (occupied or vacant). If unit is owned, rented, or occupied without paid rent. If access to household is direct or through another unit; this item is answered by the interviewer (not read to the respondent). If the sample unit has merged with another unit. What type of housing unit is it (house/apt., mobile home or trailer, etc.); this item is answered by the interviewer (not read to the respondent). If all persons in the household live or eat together. If any other household on the property lives or eats with the interviewed household. Part 2. Items Asked at the End of the Interview Item Name Item Asks 15 16 TELHH-scrn TELAV-scrn 17 18 19 20 21 22 23 TELWHR-scr TELIN-scrn TELPHN BSTTM-scrn NOSUN-scrn THANKYOU INOTES-1 If there is a telephone in the unit. If there is a telephone elsewhere on which people in this household can be contacted; asked depending on answer to TELHH-scrn. If there is a telephone elsewhere, where is the phone located; asked depending on answer to TELAV-scrn. If a telephone interview is acceptable. What is the phone number and whether it is a home or office phone. When is the best time to contact the respondent. If a Sunday interview is acceptable. If there is any reason why the interviewer will not be able to interview the household next month. If the interviewer wants to make any notes about the case that might help with the next interview; also asks for a list of names/ages of ALL additional persons if there are more than 16 household members. verifies the accuracy of the roster. Since households in MIS 5 are returning to sample after an 8-month hiatus, additional probing questions are asked to correctly establish the household’s current roster and update some characteristics. See Figure 7– 8 for a list of major items asked in MIS 5 interviews. If there are any changes, the instrument goes through the steps necessary to add or delete an individual(s). Once all the additions/deletions are completed, the instrument then prompts the FR/interviewer to correct or update any relationship items (e.g., relationship to reference person, marital status, and parent and spouse pointers) that may be subject to change. After completion of the appropriate corrections, the instrument will then take the interview to any items, such as educational attainment, that require periodic updating. The labor force interview in MIS 2, 3, 5, 6, and 7 collects the same information as the MIS 1 interview. MIS 4 and 8 interviews are different in several respects. Additional information collected in these interviews includes a battery of questions for employed wage and salary workers on their usual weekly earnings at their only or main job. For all individuals who are multiple jobholders, information is collected on the industry and occupation of their second job. For individuals who are not in the labor force, we obtain additional information on their previous labor force attachment. Dependent interviewing is another enhancement contributed to the subsequent months’ interviewing by the computerization of the labor force interview. Information collected in the previous month’s interview is imported into the current interview to ease response burden and improve the quality of the labor force data. This change is most noticeable in the collection of main job industry and occupation data. By importing the previous month’s job description into the current month’s interview, we can ascertain whether an individual has the same job as he/she had the preceding month. Not only does this enhance analysis of month-to-month job mobility, it also frees the FR/interviewer from re-entering the detailed industry and occupation descriptions. This speeds the flow through the labor force interview. Other information collected using dependent interviewing is the duration of unemployment (either job search or layoff duration), and the not-in-labor-force subgroups of 7– 6 facilities each month. The facilities generally interview about 88-90 percent of the cases assigned to them. The net result is that about 15 percent of all Include as CPS interviews are completed at a CATI member of facility. household Three facilities are in use: Hagerstown, MD; Tuscon, AZ; and JeffersonA. PERSONS STAYING IN SAMPLE UNIT AT TIME OF INTERVIEW ville, IN. During the time of the initial Person is member of family, lodger, servant, visitor, etc. 1. Ordinarily stays here all the time (sleeps here) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yes phasein of CATI data, and continuing to 2. Here temporarily - no living quarters held for person elsewhere . . . . . . . . . . . . . . . . . Yes this day, there is a controlled selection 3. Here temporarily - living quarters held for person elsewhere . . . . . . . . . . . . . . . . . . . . No criteria (see Chapter 4) that allows the Person is in Armed Forces analysis of any effects of the CATI 1. Stationed in this locality, usually sleeps here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yes 2. Temporarily here on leave - stationed elsewhere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No collection methodology. See Chapter 16 for a discussion of CATI effects on Person is a student - Here temporarily attending school - living quarters held for person elsewhere the labor force data. One of the main 1. Not married or not living with immediate family. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No reasons for using CATI is to ease the 2. Married and living with immediate family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yes recruiting and hiring effort in hard to 3. Student nurse living at school . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yes enumerate areas. It is much easier to B. ABSENT PERSON WHO USUALLY LIVES HERE IN SAMPLE UNIT hire an individual to work in the CATI Person is inmate of institutional special place - Absent because inmate in a facilities than it is to hire individuals to specified institution regardless of whether or not living quarters held for person here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No work as FRs in most major metropolitan Person is temporarily absent on vacation, in general hospital, etc. (including areas. This is especially true in most veterans’ facilities that are general hospitals) - Living quarters held here for person . Yes large cities. Most of the cases sent to Person is absent in connection with job CATI are from the major metropolitan 1. Living quarters held here for person - temporarily absent while ‘‘on the road’’ in areas. CATI is not used in most rural connection with job (e.g., traveling salesperson, railroad conductor, bus driver) . . . Yes 2. Living quarters held here and elsewhere for person but comes here infrequently areas because the sample sizes in these (e.g., construction engineer) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No areas are small enough to keep a single 3. Living quarters held here at home for unmarried college student working away FR busy the entire interview week withfrom home during summer school vacation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yes out causing any undo hardship in comPerson is in Armed Forces - was member of this household at time of induction but currently stationed elsewhere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No pleting the assignment. A concerted effort is made to hire some Spanish speaking Person is a student in school - away temporarily attending school - living quarters held for person here interviewers in the Tuscon Telephone 1. Not married or not living with immediate family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yes Center. This allows us to send Spanish 2. Married and living with immediate family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No 3. Attending school overseas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No speaking cases to this facility for inter4. Student nurse living at school . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No viewing. For obvious reasons stated above, no MIS 1 or 5 cases are sent to C. EXCEPTIONS AND DOUBTFUL CASES the facilities. Person with two concurrent residences - determine length of time person has As stated above, the facilities commaintained two concurrent residences 1. Has slept greater part of that time in another locality. . . . . . . . . . . . . . . . . . . . . . . . . . . No plete all but 10-12 percent of the cases 2. Has slept greater part of that time in sample unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yes sent to them. These uncompleted cases Citizen of foreign country temporarily in the United States are recycled back to the field for fol1. Living on premises of an Embassy, Ministry, Legation, Chancellery, or Consulate . No lowup and final determination. For this 2. Not living on premises of an Embassy, Ministry, etc. reason, the CATI facilities generally cease a. Living here and no usual place of residence elsewhere in the United States . . . . Yes b. Visiting or traveling in the United States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No interviewing the labor force portions of the interview on Wednesday of interretired and disabled. Dependent interviewing is not used in view week. This allows the field staff 3 to 4 days to check the MIS 5 interviews nor is it used for any of the data on the case and complete any required interviewing or collected solely in MIS 4 and 8. classify the case as a noninterview. The field staff is highly Another recent development in the computerization of successful in completing these cases as interviews, genthe CPS is the use of centralized facilities for computer erally interviewing about 80-85 percent of the cases. The assisted telephone interviewing (CATI). CPS has been cases that are sent to the CATI facilities are selected by the experimenting with the use of CATI interviewing since 1983 supervisors in each of the regional offices. The FRs provide and has gradually been introducing it into the CPS operainformation on a household’s probable acceptance of a tions. The first use of CATI in production was the Tri-Cities Test which started in April 1987. Since that time, more and CATI interview, but ultimately it is the supervisor’s decision. more cases have been sent to our CATI facilities for This decision is based on the FR’s input and the need to interviewing. Currently, about 9,200 cases are sent to the balance workloads and meet specific goals on the number Figure 7– 5. Summary Table for Determining Who Is To Be Included As a Member of the Household 7– 7 Figure 7– 6. Interviews: Main Demographic Items Asked in MIS 1 and Replacement Households Note: This list of items in not all inclusive. The list covers only the main data items and does not include related items used to arrive at the final response (e.g., probes and verification screens). See CPS Interviewing Manual for illustrations of the actual instrument screens for all CPS items. Part 1. Items Asked at Beginning of Interview Item Name Item Asks 1 2 HHRESP RPNAME 3 4 5 NEXTNM VERURE HHMEM-scrn 6 7 8 SEX-scrn MCHILD MAWAY 9 10 11 MLODGE MELSE RRP-nscr 12 13 VR-NONREL SBFAMILY 14 15 16 17 18 19 20 21 PAREN-scrn BMON-scrn BDAY-scrn BYEAR-scrn AGEVR MARIT-scrn SPOUS-scrn AFEVE-scrn AFWHE-scrn AFNOW-scrn 22 EDUCA-scrn 23 RACE-scrn 24 ORIGI-scrn 25 SSN-scrn 26 CHANGE What is the line number of the household respondent. What is the name of the reference person (i.e., person who owns/rents home, whose name should appear on line number 1 of the household roster). What is the name of the next person in the household (lines number 2 through a maximum of 16). If the sample unit is the person’s usual place of residence. If the person has his/her usual place of residence elsewhere; asked only when the sample unit is not the person’s usual place of residence. What is the person’s sex; this item is answered by the interviewer (not read to the respondent). If the household roster (displayed on the screen) is missing any babies or small children. If the household roster (displayed on the screen) is missing usual residents temporarily away from the unit (e.g., traveling, at school, in a hospital). If the household roster (displayed on the screen) is missing any lodgers, boarders, or live-in employees. If the household roster (displayed on the screen) is missing anyone else staying in the unit. How is the person related to the reference person; the interviewer shows the respondent a flashcard from which he/she chooses the appropriate relationship category. If the person is related to anyone else in the household; asked only when the person is not related to the reference person. Who on the household roster (displayed on the screen) is the person related to; asked depending on answer to VR-NONREL. What is the parent’s line number. What is the month of birth. What is the day of birth. What is the year of birth. How many years old is the person (as verification). What is the person’s marital status; asked only of persons 15+ years old. What is the spouse’s line number; asked only of persons 15+ years old. If the person ever served on active duty in the U.S. Armed Forces; asked only of persons 17+ years old. When did the person serve; asked only of persons 17+ years old who have served in the U.S. Armed Forces. If the person is now in the U.S. Armed Forces; asked only of persons 17+ years old who have served in the U.S. Armed Forces. Interviewers will continue to ask this item each month as long as the answer is ‘‘yes.’’ What is the highest level of school completed or highest degree received; asked only of persons 15+ years old. This item is asked for the first time in MIS 1, and then verified in MIS 5 and in specific months (i.e., February, July, and October). What is the person’s race; the interviewer shows the respondent a flashcard from which he/she chooses the appropriate race category. What is the person’s origin; the interviewer shows the respondent a flashcard from which he/she chooses the appropriate origin category. What is the person’s social security number; asked only of persons 15+ years old. This item is asked only from December through March, regardless of month in sample. If there has been any change in the household roster (displayed with full demographics) since last month, particularly in the marital status. Part 2. Items Asked at the End of the Interview Item Name Item Asks 27 28 29 30 NAT1 MNAT1 FNAT1 CITZN-scr 31 32 33 CITYA-scr CITYB-scr INUSY-scr 34 FAMIN-scrn What is the person’s country of birth. What is his/her mother’s country of birth. What is his/her father’s country of birth. If the person is a citizen of the U.S.; asked only when neither the person nor both of his/her parents were born in the U.S. or U.S. territory. If the person was born a citizen of the U.S.; asked when the answer to CITZN-scr is yes. If the person became a citizen of the U.S. through naturalization; asked when the answer to CITYA-scr is no. When did the person come to live in the U.S.; asked of U.S. citizens born outside of the 50 states (e.g., Puerto Ricans, U.S. Virgin Islanders, etc.) and of non-U.S. citizens. What is the household’s total combined income during the past 12 months; the interviewer shows the respondent a flashcard from which he/she chooses the appropriate income category. 7– 8 Figure 7– 7. Demographic Edits in the CPS Instrument Note: The following list of edits is not all inclusive; only the major edits are described. The demographic edits in the CPS instrument take place while the interviewer is creating or updating the roster. After the roster is in place, the interviewer may still make changes to the roster (e.g., add/delete persons, change variables) at the Change screen. However, the instrument does not include any more demographic edits past the Change screen, because inconsistencies—between the data collected at that point and the data collected earlier in the interview—are difficult to resolve without risking endless loops. Education Edits 1. The instrument will force interviewers to probe if the education level is inconsistent with the person’s age; interviewers will probe for the correct response if the education entry fails any of the following range checks: • If 19 years old, the person should have an education level below the level of a master’s degree (EDUCA-scrn < 44). • If 16-18 years old, the person should have an education level below the level of a bachelor’s degree (EDUCA-scrn 43). • If younger than 15 years old, the person should have an education below college level (EDUCA-scrn < 40). 2. The instrument will force the interviewer to probe before it allows him/her to lower an education level reported in a previous month in sample. Veterans’ Edit 1. The instrument will display only the answer categories that apply (i.e., periods of service in the Armed Forces), based on the person’s age. For example, the instrument will not display certain answer categories for a 40 year old veteran (e.g., World War I, World War II, Korean war), but it will display them for a 99 year old veteran. Nativity Edits 1. The instrument will force the interviewer to probe if the person’s year of entry into the U.S. is earlier than his/her year of birth. Spouse Line Number Edits 1. If the household roster does not include a spouse for the reference person, the instrument will set the reference person’s SPOUSE line number equal to zero. It will also omit the first answer category (i.e., married spouse present) when it asks for the marital status of the reference person). 2. The instrument will not ask SPOUSE line number for both spouses in a married couple. Once it obtains the SPOUSE line number for the first spouse on the roster, it will fill the second spouse’s SPOUSE line number with the line number of the first spouse. Likewise, the instrument will not ask marital status for both spouses. Once it obtains the marital status for the first spouse on the roster, it will set the second spouse’s marital status equal to that of his/her spouse. 3. Before assigning SPOUSE line numbers, the instrument will verify that there are opposing sex entries for each spouse. If both spouses are of the same sex, the interviewer will be prompted to fix whichever one is incorrect. 4. For each household member with a spouse, the instrument will ensure that his/her SPOUSE line number is not equal to his/her own line number, nor to his/her own PARENT line number (if any). In both cases, the instrument will not allow the interviewer to make the wrong entry and will display a message telling the interviewer to ‘‘TRY AGAIN.’’ Parent Line Number Edits 1. The instrument will never ask for the reference person’s PARENT line number. It will set the reference person’s PARENT line number equal to the line number of whomever on the roster was reported as the reference person’s parent (i.e., an entry of 24 at RRP-nscr), or equal to zero if no one on the roster fits that criteria. 2. Likewise, for each individual reported as the reference person’s child (an entry of 22 at RRP-nscr), the instrument will set his/her PARENT line number equal to the reference person’s line number, without asking for each individual’s PARENT line number. 3. The instrument will not allow more than two parents for the reference person. 4. If the individual is the reference person’s brother or sister (i.e., an entry of 25 at RRP-nscr), the instrument will set his/her PARENT line number equal to the reference person’s PARENT line number. However, the instrument will not do so without first verifying that the parent that both siblings have in common is indeed the one whose line number appears in the reference person’s PARENT line number (since not all siblings have both parents in common). 5. For each household member, the instrument will ensure that his/her PARENT line number is not equal to his/her own line number. In such a case, the instrument will not allow the interviewer to make the wrong entry and will display a message telling the interviewer to ‘‘TRY AGAIN.’’ 7– 9 Figure 7– 8. Interviews: Main Items (Housing Unit and Demographic) Asked in MIS 5 Cases Note: This list of items is not all inclusive. The list covers only the main data items and does not include related items used to arrive at the final response (e.g., probes and verification screens). See CPS Interviewing Manual for illustrations of the actual instrument screens for all CPS items. Housing Unit Items 1 2 3 4 5 6 7 8 9 10 11 12 Item Name Item Asks HHNUM-vr VERADD CHNGPH MAILAD TENUR-scrn TELHH-scrn TELIN-scrn TELPHN BSTTM-scrn NOSUN-scrn THANKYOU INOTES-1 If household is a replacement household. What is the street address (as verification). If current phone number needs updating. What is the mailing address (as verification). If unit is owned, rented, or occupied without paid rent. If there is a telephone in the unit. If a telephone interview is acceptable. What is the phone number and whether it is a home or office phone. When is the best time to contact the respondent. If a Sunday interview is acceptable. If there is any reason why the interviewer will not be able to interview the household next month. If the interviewer wants to make any notes about the case that might help with the next interview; also asks for a list of names/ages of ALL additional persons if there are more than 16 household members. Demographic Items Item Name Item Asks 13 14 15 16 17 RESP1 STLLIV NEWLIV MCHILD MAWAY 18 19 20 MLODGE MELSE EDUCA-scrn 21 CHANGE If respondent is different from the previous interview. If all persons listed are still living in the unit. If anyone else is staying in the unit now. If the household roster (displayed on the screen) is missing any babies or small children. If the household roster (displayed on the screen) is missing usual residents temporarily away from the unit (e.g., traveling, at school, in hospital). If the household roster (displayed on the screen) is missing any lodgers, boarders, or live-in employees. If the household roster (displayed on the screen) is missing anyone else staying in the unit. What is the highest level of school completed or highest degree received; asked for the first time in MIS 1, and then verified in MIS 5 and in specific months (i.e., February, July, and October). If, since last month, there has been any change in the household roster (displayed with full demographics), particularly in the marital status. of cases sent to the facilities. The selection of cases to send to CATI also must take into consideration statistical aspects of the selection. With this constraint in mind, selection of cases is controlled by a series of random group numbers assigned at the time of sample selection that either allow or disallow the assignment of a case to CATI. This control procedure allows the Census Bureau to continue to analyze statistical differences that may exist between the data collected in CATI mode versus data collected in CAPI mode. Figures 7– 9 and 7– 10 show the results of a typical month’s (December 1996) CPS interviewing. Figure 7– 9 lists the outcomes of all the households in the CPS sample. The expectations for normal monthly interviewing are a Type A rate around 6.0 percent with an overall noninterview rate in the 20-22 percent range. In December 1996, the Type A rate was 5.81. For the April 1996 - March 1997 period, the CPS Type A rate was 6.36 percent. The months of January, February, and March 1996 were not used due to the effects of the Federal government shut downs. The overall noninterview rate for December 1996 was 20.34 percent, compared to the 12-month average of 20.75 percent. Figure 7– 10 shows the rates of personal and telephone interviewing in December 1996. It is highly consistent with the usual monthly results for personal and telephone interviews. 7– 10 Figure 7– 9. Interviewing Results (December 1996) Description Total Assigned Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . Interviewed Cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Response Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interviewed CAPI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CAPI Partial Interviews. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assigned CATI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interviewed CATI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CATI Partial Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CATI Recycles to Region. . . . . . . . . . . . . . . . . . . . . . . . . . . Noninterviews. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NI Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type A Noninterviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type A Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No One Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Temporarily Absent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Refused . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Occupied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accessed Instr. - No Progress . . . . . . . . . . . . . . . . . . . . . . Type B Noninterviews. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type B Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Armed Forces Occupied or < age 15 . . . . . . . . . . . . . . . . Temp. Occupied With Persons With URE . . . . . . . . . . . . Vacant Regular (REG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vacant HHLD Furniture Storage. . . . . . . . . . . . . . . . . . . . . Unfit, to be Demolished . . . . . . . . . . . . . . . . . . . . . . . . . . . . Under Construction, Not Ready . . . . . . . . . . . . . . . . . . . . . Converted to Temp. Business or Storage. . . . . . . . . . . . . Unoccupied Tent or Trailer Site . . . . . . . . . . . . . . . . . . . . . Permit Granted, Construction Not Started . . . . . . . . . . . . Other Type B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type C Noninterviews. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type C Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Demolished. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . House or Trailer moved . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outside Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Converted to Perm. Business or Storage . . . . . . . . . . . . . Merged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Condemned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Built after April 1, 1980. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unused Serial No#/Listing Sheet Line. . . . . . . . . . . . . . . . Other Type C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60,234 47,981 94.19% 41,464 108 7,112 6,517 58 595 12,253 20.34% 2,960 5.81% 638 292 1,928 99 3 8,921 14.90% 93 1,254 6,005 374 424 259 186 274 37 15 372 0.62% 39 36 13 26 33 1 76 73 75 Figure 7– 10. Telephone Interview Rates (December 1996) Telephone Total . . . . . . . . . . . . MIS 1&5 . . . . . . . . MIS 2-4, 6-8 . . . . . Personal Indeterminate Total Number Percent Number Percent Number Percent 47,981 11,710 36,271 34,285 2,677 31,608 71.5 22.9 87.1 13,548 8,968 4,580 28.2 76.6 12.6 148 65 83 0.3 0.6 0.2 8–1 Chapter 8. Transmitting the Interview Results INTRODUCTION With the advent of the completely electronic interviewing environment, transmission of the interview results took on a heightened importance to the field representative (FR). Until recently, the FR dropped the completed interviews in the mail and waited for the U.S. Postal Service to complete the task. Now, the data must be prepared for transmission and sent out via computer. This chapter provides a summary of these procedures and how the data are prepared for production processing. The system for transmission of data is centralized at headquarters. All data transfers must pass through headquarters even if that is not the final destination of the information. The system was designed this way for ease of management and to ensure uniformity of procedures within a given survey and between different surveys. The transmission system was designed to satisfy the following requirements: • Provide minimal user intervention. • Upload and/or download in one transmission. • Transmit all surveys in one transmission. • Transmit software upgrades with data. • Maintain integrity of software and assignment. • Prevent unauthorized access. • Handle mail messages. Another important aspect about the system is the headquarters central database system cannot initiate transmissions. Either the FR or the regional offices (ROs) must initiate any transmissions. Computers in the field are not connected to the headquarters computers at all times. Instead, the field computers contact the headquarters computers and all data exchanges take place at the time of such call-ins. The central database system contains a group of servers that store messages and case information required by the FRs or the ROs. When an interviewer calls in, the transfer of data from the FR’s computer to headquarters computers is completed first and then any outgoing data are transferred to the FR’s computer. A major concern with the use of a electronic method of transmitting interview data is the need for complete security of these data. Both the Census Bureau and the Bureau of Labor Statistics (BLS) are required to honor the pledge of confidentiality given to all Current Population Survey (CPS) respondents. The system was designed to safeguard this pledge. All transmissions between the headquarters central database and the FR’s computers are compacted and encrypted for this reason. All transmissions between the headquarters, the ROs, the Centralized Telephone Facilities, and Jeffersonville are over secure telecommunications lines that are leased by the Census Bureau and are accessible only by Census Bureau employees through their computers. TRANSMISSION OF INTERVIEW DATA Telecommunications exchanges between a FR and the central database system usually take place once per day during the interview period. Additional transmissions may be made at any time as needed. Each transmission is a batch process in which all relevant files are automatically transmitted and received. Each FR is expected to make a telecommunications transmission at the end of every work day during the interview period. This is usually accomplished by a preset transmission; that is, each evening the FR sets the FR’s computer up to transmit the completed work and then hooks the computer up to a modem within the FR’s residence. During the night, at a preset time, the computer automatically dials into the headquarters central database and transmits the completed cases. At the same time, the central database returns any messages and other data to complete the FR’s assignment. It is also possible for a FR to make an immediate transmission at any time of the day. The results of such a transmission are identical to a preset transmission, both in the types and directions of various data transfers, but allow the FR instant access to the central database as necessary. This type of procedure is used primarily around the time of closeout when a FR might have one or two straggler cases that need to be received by headquarters before field staff can close out the month’s workload and production processing can begin. The RO staff may also perform a daily data transmission sending in cases that required supervisory review or were completed at the RO. Centralized Telephone Facility Transmission Most of the cases sent to a Centralized Telephone Facility are successfully completed as computer assisted telephone interviews (CATI). Those that cannot be completed from the telephone center are transferred to a FR 8– 2 prior to the end of the interview period. These cases are called ‘‘CATI recycles.’’ Each telephone facility makes daily transmissions of completed cases and recycles to the headquarters database. All the completed cases are batched for further processing. Each recycled case is transmitted directly to the computer of the FR who is assigned the case. Case notes that include the reason for recycle are also transmitted to the FR to assist in follow-up. CATI recycles are placed on the server when a daily transmission is performed for the region. Daily transmissions are performed automatically for each region every hour during the CPS interview week. This provides cases to the FRs that are reassigned or recycled. The RO staff also monitor the progress of the CATI recycled cases Recycle Report. All cases that are sent to a CATI facility are also assigned to a FR by the RO staff. The RO staff keep a close eye on recycled cases to ensure that they are completed on time and to monitor the reasons for recycling so that future recycling can be minimized. The reason for recycling is also closely monitored to ensure that recycled cases are properly handled by the CATI facility and correctly identified as CATI eligible by the FR. Transmission of Interviewed Data From the Centralized Database Each day during the production cycle (see Figure 8-1 for an overview of the daily processing cycle) the field staff send to the production processing system at headquarters four files containing the results of the previous day’s interviewing. A separate file is received from each of the CATI facilities, and all data received from the FRs are batched together and sent as a single file. At this time, cases requiring industry and occupation coding (I&O) are identified, and a file of such cases is created. This file is then used by Jeffersonville coders to assign the appropriate I&O codes. This cycle repeats itself until all data are received by headquarters, usually Tuesday or Wednesday of the week after interviewing begins. By the middle of the interview week, the CATI facilities close down, usually Wednesday, and only one file is received daily by headquarters production processing system. This continues until field closeout day when multiple files may be sent to expedite the preprocessing. Data Transmissions for I&O Coding The I&O data are not actually transmitted to Jeffersonville. Rather, the coding staff directly access the data on headquarters computers through the use of remote monitors in Jeffersonville. When a batch of data has been completely coded, that file is returned to headquarters, and the appropriate data are loaded into the headquarters database. See Chapter 9 for a complete overview of the I&O coding and processing system. Once these transmission operations have been completed, final production processing begins. Chapter 9 provides a description of the processing operation. Figure 8– 1. Overview of CPS Monthly Operations Week 1 (week containing the 5th) Thu Fri Sat Field reps pick up computerized questionnaire via modem Week 2 (week containing the 12th) Sun Mon Tue Wed Field reps and CATI interviewers complete practice interviews and home studies Thu Fri Week 3 (week containing the 19th) Sat Field reps pick up assignments via modem Sun Mon Tue CATI interviewing CATI facilities transmit completed work to headquarters nightly Wed CATI interviewing ends* Thu Fri Week 4 (week containing the 26th) Sat Sun All assignments completed except for telephone holds Mon Tue All interviews completed ROs complete final field closeout CAPI interviewing ● FRs transmit completed work to headquarters nightly ● ROs monitor interviewing assignments Processing begins ● Files received overnight are checked in by ROs and headquarters daily ● Headquarters performs initial processing and sends eligible cases to Jeffersonville for industry and occupation coding Jeffersonville sends back completed I&O cases Wed Thu Fri Week 5 Sat Sun thru Thu Fri Initial process closeout J’ville closeout Headquarters performs final computer Census processing delivers data ● Edits to BLS ● Recodes ● Weighting BLS analyzes data Results released to public at 8:30 am by BLS * CATI interviewing is extended in March and certain other months. 8– 3 9–1 Chapter 9. Data Preparation INTRODUCTION INDUSTRY AND OCCUPATION (I&O) CODING For the Current Population Survey (CPS), the goal of post data collection processing is to transform a raw data file, as collected by interviewers, into a microdata file that can be used to produce estimates. There are several processes needed in order for this to be accomplished. The raw data files must be read and processed. Textual industry and occupation responses are coded. Even though some editing takes place in the instrument at the time of interview (see Chapter 7), further editing is required once all the data are received. Editing and imputations are performed to improve the consistency and completeness of the microdata. New data items are created based upon responses to multiple questions. All of this is in preparation for weighting and estimation (see Chapter 10). The industry and occupation coding operation for a typical month requires ten coders for a period of just over 1 week to code data from 27,000 individuals.1 At other times, these same coders are available for similar activities on other surveys, where their skills can be maintained. The volume of codes has decreased significantly with the introduction of dependent interviewing for I&O codes (see Chapter 6). Only the new monthly CPS cases, as well as those persons whose industry or occupation has changed since the previous month of interviewing, are sent to Jeffersonville to be coded. For those persons whose industry and occupation have not changed, the three-digit codes are brought forward from the previous month of interviewing and require no further coding. A computer assisted industry and occupation coding system is used by the Jeffersonville I&O coders. Files of all eligible I&O cases are sent to this system each day. Each coder works at a computer terminal where the computer screen displays the industry and occupation descriptions that were captured by the field representatives at the time of the interview. The coder then enters three-digit numeric industry and occupation codes used in the 1990 census that represent the industry and occupation descriptions. A substantial effort is directed at supervision and control of the quality of this operation. The supervisor is able to turn the dependent verification setting ‘‘on’’ or ‘‘off’’ at any time during the coding operation. The ‘‘on’’ mode means that a particular coder’s work is verified by a second coder. In addition, a 10 percent sample of each month’s cases is selected to go through a quality assurance system to evaluate the work of each coder. The selected cases are verified by another coder after the current monthly processing has been completed. After this operation is complete, the batch of records is electronically returned to headquarters to be used in the monthly production processing. DAILY PROCESSING For a typical month, computer assisted telephone interviewing (CATI) starts on Sunday of the week containing the 19th of the month and continues through Wednesday of the same week. The answer files representing these interviews are received on a daily basis from Monday through Thursday of this interview week. Separate files are received for each of the three CATI facilities: Hagerstown, Tucson, and Jeffersonville. Computer assisted personal interviewing (CAPI) also begins on the same Sunday and continues through Monday of the following week. The CAPI answer files are received for each day of interviewing, until all the interviewers and regional offices have transmitted the workload for the month. This is generally completed by Wednesday of the following week. These answer files are read, and various computer checks are performed to ensure the data can be accepted into the CPS processing system. These checks include, but are not limited to, ensuring the successful transmission and receipt of the files, item range checks, and rejection of invalid cases. Files containing records needing three-digit industry and occupation (I&O) codes are electronically sent to Jeffersonville for assignment of these codes. Once Jeffersonville has completed the I&O coding, the files are electronically transferred back to headquarters, where the codes are placed on the CPS production file. When all of the expected data for the month are accounted for and all of Jeffersonville’s I&O coding files have been returned and placed on the appropriate records on the data file, editing and imputation are performed. EDITS AND IMPUTATIONS The CPS suffers from two sources of nonresponse. The largest results from noninterview households. We compensate for this data loss in the weighting where essentially the 1 Because of the CPS sample increase in July 2001 from the State Children’s Health Insurance Program, the number of cases for I&O coding has increased to about 30,000. 9–2 weights of noninterviewed households are distributed among interviewed households (see Chapter 10). The second source of data loss is from item nonresponse. Item nonresponse occurs when a respondent either does not know the answer to a question or refuses to provide the answer. Item nonresponse in the CPS is modest (see Chapter 16). We compensate for item nonresponse in the CPS by using 1 of 3 imputation methods. Before the edits are applied, the daily data files are merged and the combined file is sorted by state and PSU within state. This sort ensures that allocated values are from geographically related records; that is, missing values for records in Maryland will not receive values from records in California. This is an important distinction since many labor force and industry and occupation characteristics are geographically clustered. The edits effectively blank all entries in inappropriate questions and ensure that all appropriate questions have valid entries. For the most part, illogical entries or out-ofrange entries have been eliminated since the use of electronic instruments; however, the edits still address these possibilities due to data transmission problems and occasional instrument malfunctions. The main purpose of the edits, however, is to assign values to questions where the response was ‘‘Don’t know’’ or ‘‘Refused.’’ This is accomplished by using 1 of 3 imputation techniques described below. Before discussing these imputation techniques, it is important to note that the edits are run in a deliberate and logical sequence. That is, demographic variables are edited first because several of those variables are used to allocate missing values in the other modules. The labor force module is edited next since labor force status and related items are used to impute missing values for industry and occupation codes and so forth. The three imputation methods used by the CPS edits are described below. 1. Relational imputation infers the missing value from other characteristics on the person’s record or within the household. For instance, if race is missing, it is assigned based on the race of another household member or failing that, taken from the previous record on the file. Similarly, if relationship is missing, it is assigned by looking at age and sex of the person in conjunction with the known relationship of other household members. Missing occupation codes are sometimes assigned by viewing the industry codes and vice versa. This technique is used exclusively in the demographic and industry and occupation edits. If missing values cannot be assigned using this technique, they are assigned using 1 of the 2 following methods. 2. Longitudinal edits are used primarily in the labor force edits. If a question is blank and the record is in the overlap sample, the edit looks at last month’s data to determine whether there was a nonallocated entry for that item. If so, last month’s entry is assigned; otherwise, the item is assigned a value using the appropriate hot deck, as described next. 3. This imputation method is commonly referred to as ‘‘hot deck’’ allocation. This method assigns a missing value from a record with similar characteristics. Hot decks are always defined by age, race, and sex . Other characteristics used in hot decks vary depending on the nature of the question being referenced. For instance, most labor force questions use only age, race, sex, and occasionally another labor force item such as full- or part-time status. This means the number of cells in labor force item hot decks are relatively small, perhaps less than 100. On the other hand, the weekly earnings hot deck is defined by age race, sex, usual hours, occupation, and educational attainment. This hot deck has several thousand cells. All CPS items that require imputation for missing values have an associated hot deck . The initial values for the hot decks are the ending values from the preceding month. As a record passes through the edits, it will either donate a value to each hot deck in its path or receive a value from the hot deck. For instance, in a hypothetical case, the hot deck for question X is defined by the characteristics Black/non-Black, male/female, and age 16-25/25+. Further assume a record has the value of White, male, and age 64. When this record reaches question X, the edits determine whether it has a valid entry. If so, that record’s value for question X replaces the value in the hot deck reserved for non-Black, male, and age 25+. Comparably, if the record was missing a value for item X, it would be assigned the value in the hot deck designated for non-Black, male, and age 25+. As stated above the various edits are logically sequenced, in accordance with the needs of subsequent edits. The edits and codes, in order of sequence, are: 1. Household edits and codes. This processing step performs edits and creates recodes for items pertaining to the household. It classifies households as interviews or noninterviews and edits items appropriately. Hot deck allocations defined by geography are used in this edit. 2. Demographic edits and codes. This processing step ensures consistency between all demographic variables for all individuals within a household. It ensures all interview households have one and only one reference person and that entries stating marital status, spouse, and parents are all consistent. It also creates families based upon these characteristics. It uses longitudinal editing, hot deck allocation defined by related demographic characteristics, and relational imputation. Demographic related recodes are created for both individual and family characteristics. 9–3 3. Labor force edits and codes. This processing step first establishes an edited Major Labor Force Recode (MLR), which classifies adults as either employed, unemployed, or not in the labor force. Based upon MLR, the labor force items related to each series of classification are edited. This edit uses longitudinal editing and hot deck allocation matrices. The hot decks are defined by age, race, and sex and, on occasion, by a related labor force characteristic. 4. Industry and occupation (I&O) edits and codes. This processing step assigns three-digit industry and occupation codes to those I&O eligible persons for which the I&O coders were unable to assign a code. It also ensures consistency, wherever feasible, between industry, occupation, and class of worker. I&O related recodes are also created. This edit uses relational allocation and hot deck allocation. The hot decks are defined by employment status, age, sex, race, and educational attainment. 5. Earnings edits and codes. This processing step performs edits on the earnings series of items for earnings eligible individuals. A usual weekly earnings recode is created to allow earnings amounts to be in a comparable form for all eligible individuals. There is no longitudinal editing because this series of questions is asked only of MIS 4 and 8 households. Hot deck allocation is used here. The hot deck for weekly earnings is defined by age, race, sex, major occupation recode, educational attainment, and usual hours worked. Additional earnings recodes are created. 6. School enrollment edits and codes. School enrollment items are edited for individuals 16-24 years old. Hot deck allocation based on age, race, and sex is used. 10–1 Chapter 10. Estimation Procedures for Labor Force Data INTRODUCTION The Current Population Survey (CPS) is a multistage probability sample of housing units in the United States. It produces monthly labor force and related estimates for the total U.S. civilian noninstitutional population and for various age, sex, race, and ethnic groups. In addition, estimates for a number of other population subdomains of the nation (e.g., families, veterans, persons with earnings, households) are produced on either a monthly or quarterly basis. Each month a sample of eight panels (called rotation groups) is interviewed, with demographic data collected for all occupants of the sample housing units. Labor force data are collected for persons 15 years and older. Each rotation group is itself a representative sample of the U.S. population. The labor force estimates are derived through a number of steps in the estimation procedure. The weighting procedures of the supplements are discussed in Chapter 11. The supplements tend to have higher nonresponse rates. In addition, many of the supplements apply to specific demographic subpopulations and thus differ in coverage from the basic CPS universe. In order to produce national and state estimates from survey data, a weight for each person in the sample is developed through the following steps: • Preparation of simple unbiased estimates from baseweights and special weights derived from CPS sampling probabilities. • Adjustment for nonresponse. • First-stage ratio adjustment to reduce variances due to the sampling of PSUs. • Second-stage ratio adjustment to reduce variances by controlling CPS estimates of population to independent estimates of the current population. • Composite estimation which uses survey data from previous months to reduce the variances. • Seasonal adjustment for key labor force statistics. In addition to estimates of basic labor force characteristics, several other types of estimates are also produced either on a monthly or a quarterly basis. These include: • Household-level estimates and estimates of married couples living in the same household using household and family weights. • Estimates of earnings, union affiliation, and industry and occupation of second jobs collected from respondents in the quarter sample using outgoing rotation group weights. • Estimates of labor force status by age for veterans and nonveterans using veterans’ weights. • Estimates of monthly gross flows using longitudinal weights. The additional estimation procedures provide highly accurate estimates for particular subdomains of the civilian noninstitutional population. The processes described in this chapter have remained essentially unchanged since January 1978. Seasonal adjustment for selected labor force categories has been a part of the estimation procedure since June 1975. Some of these processes have been slightly modified after each decennial census, when the CPS sample is restratified and a new set of sample PSUs is identified. Modifications have been made in the noninterview adjustment and the source and scope of the independent estimates of current population used in the second-stage ratio adjustment. In January 1998, a new compositing procedure was introduced. (See Appendix I.) UNBIASED ESTIMATION PROCEDURE A probability sample is defined as a sample that has a known nonzero probability of selection for each sample unit. With probability samples, unbiased estimators can be obtained. These are estimates that on average, over repeated samples, yield the population values. An unbiased estimator of the population total for any characteristic investigated in the survey may be obtained by multiplying the value of that characteristic for each sample unit (person or household) by the reciprocal of the probability with which that unit was selected and summing the products over all units in the sample (Hansen, 1953). By starting with unbiased estimates from a probability sample, various kinds of estimation and adjustment procedures (such as for noninterview) can be applied with reasonable assurance that the overall accuracy of the estimates will be improved. In the CPS sample for any given month, not all units respond, and this nonresponse is a potential source of bias. This nonresponse averages about 6 to 7 percent. Other factors, such as occasional errors occurring in the sample selection procedure and households or persons missed by interviewers, can also introduce bias. These 10–2 missing households or persons can be considered as having zero probability of selection. These two exceptions not withstanding, the probability of selecting each unit in the CPS is known, and every attempt is made to keep departures from true probability sampling to a minimum. If all units in a sample have the same probability of selection, the sample is called self-weighting, and unbiased estimators can be computed by multiplying sample totals by the reciprocal of this probability. Most of the state samples in the CPS come close to being self-weighting. Basic Weighting The sample designated for the 792-area design was selected with probabilities equal to the inverse of the required state sampling intervals shown in Table 3–1. These sampling intervals are called the basic weights (or baseweights). Almost all sample persons within the same state have the same probability of selection. Exceptions include sample persons in New York and California, where households in New York City and Los Angeles-Long Beach Metropolitan area are selected with higher probability than those in the remainder of these two states. As the first step in the estimation procedure, raw counts from the sample housing units are multiplied by the baseweights. Every person in the same housing unit receives the same baseweight. Effect of Sample Reductions on Basic Weights As time goes on, the number of households and the population as a whole increases, and continued use of the original sampling interval leads to a larger sample size (with an increase in costs). The sampling interval is regularly adjusted (the probability of selection is reduced) in order to maintain a fairly constant sample size and cost (see Appendix C). Special Weighting Adjustments As discussed in Chapter 3, some ultimate sampling units (USUs) are subsampled in the field, because their observed size is much larger than the expected four housing units. During the estimation procedure, housing units in these USUs must receive special weighting factors to account for the change in their probability of selection. For example, an area sample USU expected to have 4 housing units (HUs) but found at the time of interview to contain 36 HUs, could be subsampled at the rate of 1 in 3 to reduce the interviewer’s workload. Each of the 12 designated housing units in this case would be given a special weighting factor of 3. In order to limit the effect of this adjustment on the variance of sample estimates, these special weighting factors are limited to a maximum value of 4. At this stage of CPS estimation process, the special weighting factors are multiplied by the baseweights. The resulting weights are then used to produce ‘‘unbiased’’ estimates. Although this estimate is commonly called ‘‘unbiased,’’ it does still include some negligible bias because the size of the special weighting factor is limited to 4. The purpose of this limitation is to achieve a compromise between an increase in the bias and the variance. ADJUSTMENT FOR NONRESPONSE Nonresponse arises when households or other units of observation that have been selected for inclusion in a survey fail to yield all or some of the data that were to be collected. This failure to obtain complete results from all the units selected can arise from several different sources, depending upon the survey situation. There are two major types of nonresponse: item nonresponse and complete (or unit) nonresponse. Unit nonresponse refers to the failure to collect any survey data from an occupied sample unit. For example, data may not be obtained from an eligible household in the survey because of respondent’s absence, impassable roads, refusal to participate in the interview, or unavailability of the respondents for other reasons. This type of nonresponse in the CPS is called a Type A noninterview. Historically, between 4 and 5 percent of the eligible units in a given month were Type A noninterviews. Recently, the Type A rate has risen to between 6 and 7 percent (see Chapter 16). Item nonresponse occurs when a cooperating unit fails or refuses to provide some specific items of information. Procedures for dealing with this type of nonresponse are discussed in Chapter 9. In the CPS estimation process, the weights for all interviewed households are adjusted to account for occupied sample households for which no information was obtained because of unit nonresponse (Type A noninterviews). This noninterview adjustment is made separately for similar sample areas that are usually, but not necessarily, contained within the same state. Increasing the weights of interviewed sample units to account for eligible sample units which are not interviewed assumes that the interviewed units are similar to the noninterviewed units with regard to their demographic and socioeconomic characteristics. This may or may not be true. Nonresponse bias results when the nonresponding units differ in relevant respects from those which respond to the survey or to the particular items. Noninterview Clusters and Noninterview Adjustment Cells To reduce the size of the bias, the noninterview adjustment is performed within sample PSUs that are similar in MSA (metropolitan statistical area ) status and MSA size. These PSUs are grouped together to form noninterview clusters. In general, PSUs belonging to MSAs of the same 10–3 (or similar) size in the same state belong to the same noninterview cluster. PSUs classified as MSA are assigned to MSA clusters. Likewise, non-MSA PSUs are assigned to non-MSA clusters. Within each cluster, there is a further breakdown into two noninterview adjustment cells (also called residence cells). Each MSA cluster is split into ‘‘central city’’ and ‘‘not central city’’ cells. Each non-MSA cluster is divided into ‘‘urban’’ and ‘‘rural’’ cells, making a total of 254 adjustment cells (or 127 noninterview clusters). Computing Noninterview Adjustment Factors Weighted counts of interviewed and noninterviewed households are tabulated separately for each noninterview adjustment cell. The basic weight multiplied by any special weighting factor is used as the weight for this purpose. The noninterview factor Fij is computed as: Fij ⫽ Zij⫹ Nij Zij where Z ij = is the weighted count of interviewed households in cell j of cluster i, and N ij ⫽ is the weighted count of Type A noninterviewed households in cell j of cluster i. RATIO ESTIMATION Distributions of demographic characteristics derived from the CPS sample in any month will be somewhat different from the true distributions even for such basic characteristics as age, race, sex, and Hispanic1 ethnicity. These particular population characteristics are closely correlated with labor force status and other characteristics estimated from the sample. Therefore, the variance of sample estimates based on these characteristics can be reduced when, by the use of appropriate weighting adjustments, the sample population distribution is brought as closely into agreement as possible with the known distribution of the entire population with respect to these characteristics. This is accomplished by means of ratio adjustments. There are two ratio adjustments in the CPS estimation process: the first-stage ratio adjustment and the second-stage ratio adjustment. In the first-stage ratio adjustment, weights are adjusted so that the distribution of Black and non-Black census population from the sample PSUs in a state corresponds to the Black/non-Black population distribution from the census for all PSUs in the state. In the second-stage ratio adjustment, weights are adjusted so that aggregated CPS sample estimates match independent estimates of population in various age/sex/race and age/sex/ethnicity cells at the national level. Adjustments are also made so that the estimated state populations from CPS match independent state population estimates. These factors are applied to data for each interviewed person except in cells where either of the following situations occur: FIRST-STAGE RATIO ADJUSTMENT • The computed factor is greater than or equal to 2.0. Purpose of the First-Stage Ratio Adjustment • There are fewer than 50 unweighted interviewed households in the cell. If any of these situations occur, the weighted counts are combined for the residence cells within the noninterview cluster. A common adjustment factor is computed and applied to weights for interviewed persons within the cluster. Generally, fewer than 10 noninterview clusters require this type of collapsing in a given month. Weights After the Noninterview Adjustment At the completion of the noninterview adjustment procedure, the weight for each interviewed person is: (baseweight) x (special weighting factor) x (noninterview adjustment factor) At this point, records for all individuals in the same household have the same weight, since the adjustments discussed so far depend only on household characteristics. The purpose of the first-stage ratio adjustment is to reduce the contribution to the variance of sample statelevel estimates arising from the sampling of PSUs. That is, the variance that would still be associated with the statelevel estimates even if we included in the survey all households in every sample PSU. This is called the between-PSU variance. For some states, the betweenPSU variance makes up a relatively large proportion of the total variance, while the relative contribution of the betweenPSU variance at the national level is generally quite small. As can be seen in Table 14–3 the first-stage ratio adjustment causes a rise in the national relative variance factor, but the second-stage ratio adjustment decreases the relative variance factor below that of the noninterview adjustment. Further research into the effect of the combined firstand second-stage ratio adjustment is needed to determine whether the first-stage ratio adjustment is in fact meeting its purpose. 1 Hispanics may be of any race. 10–4 There are several factors to be considered in determining what information to use in applying the first-stage adjustment. The information must be available for each PSU, correlated with as many of the statistics of importance published from the CPS as possible, and reasonably stable over time so that the gain from the ratio adjustment procedure does not deteriorate. The basic labor force categories (unemployed, nonagricultural employed, etc.) could be considered. However, this information could badly fail the stability criterion. The distribution of population by race (Black/non-Black) satisfies all three criteria. By using the Black/non-Black categories, the first-stage ratio adjustment compensates for the fact that the racial composition of an NSR (nonself-representing) sample PSU could differ substantially from the racial composition of the stratum it is representing. This adjustment is not necessary for SR (self-representing) PSUs since they represent only themselves. ␲ sk = 1990 probability of selection for sample PSU k in state s. n ⫽ total number of NSR PSUs (sample and nonsample) in state s. m = number of sample NSR PSUs in state s. The estimate in the denominator of each of the ratios is obtained by multiplying the 1990 census population in the appropriate race cell for each NSR sample PSU by the inverse of the probability of selection for that PSU and summing over all NSR sample PSUs in the state. The Black and non-Black cells are collapsed within a state when a cell meets one of the following criteria: • The factor (FSsj) is greater than 1.3. • The factor is less than 1/1.3=.769230. Computing First-Stage Ratio Adjustment Factors • There are fewer than 4 NSR sample PSUs in the state. The first-stage adjustment factors are based on 1990 census data and are applied only to sample data for the NSR PSUs. Factors are computed for the two race categories (Black, non-Black) for each state containing NSR PSUs. The following formula is used to compute the first-stage adjustment factors for each state: • There are fewer than ten expected interviews in a race cell in the state. Race cells are collapsed within 23 states, resulting in first-stage factors of 1.0 for these states. Eight states are self-representing and have first-stage factors of 1.0 by definition. (See Table 10–1). n FSsj ⫽ 兺 Csij i⫽1 m 兺冋册 k⫽1 1 ␲sk 共Cskj兲 Weights After First-Stage Ratio Adjustment At the completion of the first-stage ratio adjustment, the weight for each responding person is the product of: where FSsj ⫽ the first-stage factor for state s and race cell j (j=Black, non-Black). Csij ⫽ the 1990 16+ census population for NSR PSU i (sample or nonsample) in state s, race cell j. Cskj ⫽ the 1990 16+ census population for NSR sample PSU k in state s, race cell j. (baseweight) x (special weighting adjustment factor) x (noninterview adjustment factor) x (first-stage ratio adjustment factor). The weight after the first-stage adjustment is called the first-stage weight. Records for all individuals in the same household should have the same first-stage weight, except for mixed-race households having some members who are Black and others who are non-Black. 10–5 Table 10–1: State First-Stage Factors for CPS 1990 Design (September - December 1995) State Alabama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alaska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arizona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arkansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . California . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Colorado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connecticut . . . . . . . . . . . . . . . . . . . . . . . . . . . Delaware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . District of Columbia . . . . . . . . . . . . . . . . . . . . Florida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Georgia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hawaii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Idaho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iowa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kentucky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Louisiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massachusetts . . . . . . . . . . . . . . . . . . . . . . . . Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minnesota . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mississippi . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Montana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nebraska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nevada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Hampshire . . . . . . . . . . . . . . . . . . . . . . . New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . . . New Mexico . . . . . . . . . . . . . . . . . . . . . . . . . . New York . . . . . . . . . . . . . . . . . . . . . . . . . . . . . North Carolina . . . . . . . . . . . . . . . . . . . . . . . . North Dakota . . . . . . . . . . . . . . . . . . . . . . . . . Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oklahoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oregon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . . . Rhode Island . . . . . . . . . . . . . . . . . . . . . . . . . South Carolina . . . . . . . . . . . . . . . . . . . . . . . . South Dakota . . . . . . . . . . . . . . . . . . . . . . . . . Tennessee . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Utah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vermont . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Washington . . . . . . . . . . . . . . . . . . . . . . . . . . . West Virginia . . . . . . . . . . . . . . . . . . . . . . . . . Wisconsin . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wyoming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black Non-Black 0.92986976 1.00000000 1.00000000 1.03854268 0.92824011 1.00000000 1.00000000 1.00000000 1.00000000 1.07779736 1.06981965 1.00000000 1.00000000 1.01947743 1.16715920 1.00000000 1.00000000 1.09352656 1.04956759 1.00000000 1.00000000 1.00000000 0.92441097 1.00000000 0.98243024 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 0.83647167 1.07378643 1.00000000 0.97362223 1.01775196 1.00000000 1.17560284 1.00000000 0.93971915 1.00000000 1.08638935 1.23277658 1.00000000 1.00000000 1.00000000 1.00000000 1.22587622 1.00000000 1.00000000 1.02360321 1.00000000** 1.00000000** 0.99625041 1.01550280 1.00000000** 1.00000000* 1.00000000* 1.00000000* 1.00025192 0.98237807 1.00000000** 1.00000000** 1.00254363 0.99747055 1.00000000** 1.00000000** 0.99897341 0.98344772 1.00000000** 1.00000000** 1.00000000* 0.99798724 1.00000000** 1.00997154 1.00000000** 1.00000000** 1.00000000** 1.00000000** 1.00000000* 1.00000000* 1.00000000** 1.00163779 0.98057928 1.00000000** 1.00000085 1.00400963 1.00000000** 0.99367587 1.00000000* 1.05454366 1.00000000** 0.98680199 0.97475648 1.00000000** 1.00000000* 1.00000000** 1.00000000** 0.99959222 1.00000000** 1.00000000** * These states contain only SR PSUs in the 1990 sample design and have an implied first-stage factor of 1.000000. ** Race cells were collapsed. SECOND-STAGE RATIO ADJUSTMENT The second-stage ratio adjustment decreases the error in the great majority of sample estimates. Chapter 14 illustrates the amount of reduction in variance for key labor force estimates. The procedure is also believed to reduce the bias due to coverage errors (see Chapter 15). The procedure adjusts the weights for sample records within each rotation group to control the sample estimates for a number of geographic and demographic subgroups of the population to ensure that these sample-based estimates of population match independent population controls in each of these categories. These independent population controls are updated each month. Three sets of controls are used: • The civilian noninstitutional population 16 years of age and older for the 50 states and the District of Columbia. • Total national civilian noninstitutional population for 14 Hispanic and 5 non-Hispanic age-sex categories (see Table 10–2). • Total national civilian noninstitutional population for 66 white, 42 Black, and 10 ‘‘Other’’ age-sex categories (see Table 10–3). The adjustment is done separately for each of the eight rotation groups which comprise a monthly sample. Adjusting the weights to match one set of controls can cause differences in other controls, so an iterative process is used to simultaneously control all variables. Successive iterations begin with the weights as adjusted by all previous iterations. A total of six iterations is performed, which results in virtual consistency between the sample estimatesandpopulationcontrols.Thethree-way(state,Hispanic/sex/age, race/sex/age) raking ratio estimator is also known as iterative proportional fitting. In addition to reducing the error in many CPS estimates and converging to the population controls within six iterations for most items, the raking ratio estimator has another desirable property. When it converges, this estimator minimizes the statistic 兺i W2iIn共W2i ⲐW1i兲, where W2i ⫽ the final weight for the ith sample record, and W1i ⫽ the weight for the ith record after the first-stage adjustment. Thus, the raking ratio estimator adjusts the weights of the records so that the sample estimates converge to the population controls while, in some sense, minimally affecting the first-stage weights. The reference (Ireland and Kullback, 1968) provides more details on the properties of raking ratio estimation. 10–6 Table 10–2. Initial Cells for Hispanic/Non-Hispanic by Age and Sex Hispanic1 Ages Non-Hispanic Ages Male Female 0-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-49 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50+. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *0-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *6-13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *16+ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *No distinction is made between sexes. * No distinction is made between sexes. 1 Hispanics may be of any race. Table 10–3. Initial Cells for Black/White/Other by Age, Race, and Sex Age Black male Black female 0-1 . . . . . . . . . . . . . . . . . . . . . . . 2-3 . . . . . . . . . . . . . . . . . . . . . . . 4-5 . . . . . . . . . . . . . . . . . . . . . . . 6-7 . . . . . . . . . . . . . . . . . . . . . . . 8-9 . . . . . . . . . . . . . . . . . . . . . . . 10-11 . . . . . . . . . . . . . . . . . . . . . 12-13 . . . . . . . . . . . . . . . . . . . . . 14 . . . . . . . . . . . . . . . . . . . . . . . . 15 . . . . . . . . . . . . . . . . . . . . . . . . 16-17 . . . . . . . . . . . . . . . . . . . . . 18-19 . . . . . . . . . . . . . . . . . . . . . 20-24 . . . . . . . . . . . . . . . . . . . . . 25-29 . . . . . . . . . . . . . . . . . . . . . 30-34 . . . . . . . . . . . . . . . . . . . . . 35-39 . . . . . . . . . . . . . . . . . . . . . 40-44 . . . . . . . . . . . . . . . . . . . . . 45-49 . . . . . . . . . . . . . . . . . . . . . 50-54 . . . . . . . . . . . . . . . . . . . . . 55-59 . . . . . . . . . . . . . . . . . . . . . 60-64 . . . . . . . . . . . . . . . . . . . . . 65+ . . . . . . . . . . . . . . . . . . . . . . . White male Age 0...................... 1...................... 2...................... 3...................... 4...................... 5...................... 6...................... 7...................... 8...................... 9...................... 10-11 . . . . . . . . . . . . . . . . . . 12-13 . . . . . . . . . . . . . . . . . . 14 . . . . . . . . . . . . . . . . . . . . . 15 . . . . . . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . . . 17 . . . . . . . . . . . . . . . . . . . . . 18 . . . . . . . . . . . . . . . . . . . . . 19 . . . . . . . . . . . . . . . . . . . . . 20-24 . . . . . . . . . . . . . . . . . . 25-26 . . . . . . . . . . . . . . . . . . 27-29 . . . . . . . . . . . . . . . . . . 30-34 . . . . . . . . . . . . . . . . . . 35-39 . . . . . . . . . . . . . . . . . . 40-44 . . . . . . . . . . . . . . . . . . 45-49 . . . . . . . . . . . . . . . . . . 50-54 . . . . . . . . . . . . . . . . . . 55-59 . . . . . . . . . . . . . . . . . . 60-62 . . . . . . . . . . . . . . . . . . 63-64 . . . . . . . . . . . . . . . . . . 65-67 . . . . . . . . . . . . . . . . . . 68-69 . . . . . . . . . . . . . . . . . . 70-74 . . . . . . . . . . . . . . . . . . 75+. . . . . . . . . . . . . . . . . . . . Sources of Independent Controls The independent population controls used in the secondstage ratio adjustment are prepared by projecting forward the population figures derived from the 1990 decennial census using information from a variety of other data sources that account for births, deaths, and net migration. Subtracting estimated numbers of resident Armed Forces personnel and institutionalized persons from the resident White female Ages Other1 Other1 male female 0-5 . . . . . . . . . . . . . . . . . . 6-13 . . . . . . . . . . . . . . . . . *14 . . . . . . . . . . . . . . . . . . *15 . . . . . . . . . . . . . . . . . . 16-44. . . . . . . . . . . . . . . . 45+ . . . . . . . . . . . . . . . . . *No distinction is made between sexes. 1 Other includesAmerican Indian, Eskimo, Aleut, Asian, and Pacific Islander. population gives the civilian noninstitutional population. Estimates of net census undercount, determined from the Post-Enumeration Survey, are added to the population projections. One should note that, prepared in this manner, the controls are themselves estimates. However, they are derived independently of the CPS and provide useful information for adjusting the sample estimates. See Appendix D for more details on sources and derivation of the independent controls. 10–7 Computing Initial Second-Stage Ratio Adjustment Factors As mentioned before, the second-stage adjustment involves a three-way rake: These initial factors are not used in the estimates. They are only used to determine whether any cells need to be collapsed. A cell is combined with adjacent (next higher or next lower) age cells in the same race/sex or ethnicity/sex category if: • State • It contains no sample respondents (i.e., Ejk = 0). • Hispanic/sex/age • Its initial factor is less than or equal to 0.6. • Race/sex/age • Its initial factor is greater than or equal to 2.0. In order to prevent the second-stage adjustment from increasing the variance of the sample estimates, estimation cells which have zero estimates (i.e., no sample respondents) or extremely large or small adjustment factors are identified and combined (or collapsed) with others. No collapsing is done for the state rake. Prior to iteration 1 for the Hispanic/sex/age and the race/sex/age rakes, initial adjustment factors are computed by rotation group for each of the cells shown in Tables 10–2 and 10–3. For any particular cell j, the initial factor is computed as: Collapsing cells continues until none of the three criteria are met or all available cells have been collapsed. Once cells are collapsed, all cell definitions are maintained through all six iterations of the raking procedure. In a typical month approximately 10 cells require collapsing. Fjk ⫽ Cj Ⲑ Ejk where Cj ⫽ the population control for cell j (divided by 8, because the raking is done for each rotation group) Ejk ⫽ the CPS estimate for cell j and rotation group k after the first-stage ratio adjustment. Raking For each iteration of each rake an adjustment factor is computed for each cell and applied to the estimate of that cell. The factor is the population control divided by the estimate of the current iteration for the particular cell. The three steps are repeated through six iterations. The following simplified example begins after one state rake. The example shows the raking for two cells in an ethnicity rake and two cells in a race rake. Age/sex cells and one race cell (see Tables 10–2 and 10–3) have been collapsed here for simplification. 10–8 Iteration 1: State rake Hispanic rake Race rake Example of Raking Ratio Adjustment Raking Estimates by Ethnicity and Race Es = Estimate from CPS sample after state rake Ee = Estimate from CPS sample after ethnicity rake Er = Estimate from CPS sample after race rake Fe = Ratio adjustment factor for ethnicity Fr = Ratio adjustment factor for race Iteration 1 of the Ethnicity Rake Non-Black Black Population controls Non-Hispanic Hispanic Population controls Es ⫽ 650 1050 ⫽ 1.265 Fe ⫽ 650⫹180 Ee ⫽ EsFe ⫽ 822 Es=150 250 = 1.471 Fe= 150+20 Ee= EsFe= 221 1000 Es ⫽ 180 1050 ⫽ 1.265 Fe ⫽ 650⫹180 Ee ⫽ EsFe ⫽ 228 Es ⫽ 20 250 ⫽ 1.471 Fe ⫽ 150⫹20 Ee ⫽ EsFe ⫽ 29 300 1050 250 1300 Non-Hispanic Hispanic Population controls Ee ⫽ 822 1000 ⫽ .959 Fr ⫽ 822⫹221 Er ⫽ EeFr ⫽ 788 Ee=221 1000 =.959 Fr= 822+221 Er ⫽ EeFr ⫽ 212 1000 Ee ⫽ 228 300 ⫽ 1.167 Fr ⫽ 228⫹29 Er ⫽ EeFr ⫽ 266 Ee ⫽ 29 300 ⫽ 1.167 Fr ⫽ 228⫹29 Er ⫽ EeFr ⫽ 34 300 1050 250 1300 Iteration 1 of the Race Rake Non-Black Black Population controls Iteration 2 (repeat steps above beginning with sample cell estimates at the end of iteration 1) . . . Iteration 6 Note that the matching of estimates to controls for the race rake causes the cells to differ slightly from the controls for the ethnicity rake or previous rake. With each rake, these differences decrease when cells are matched to the controls for the most recent rake. For the most part, after six iterations the estimates for each cell have converged to the population controls for each cell. Thus, the weight for each record after the second-stage ratio adjustment procedure can be thought of as the weight for the record after the first-stage ratio adjustment multiplied by a series of 18 adjustment factors (six iterations of three rakes). The product of these 18 adjustment factors is called the secondstage ratio adjustment factor. 10–9 Weight After the Second-Stage Ratio Adjustment At the completion of the second-stage ratio adjustment, the record for each person has a weight reflecting the product of: (baseweight) x (special weighting adjustment factor) x (noninterview adjustment factor) x (first-stage ratio adjustment factor) x (second-stage ratio adjustment factor) The weight after the second-stage ratio adjustment is also called the final weight. The estimates produced using the final weights are often called the first- and second-stage combined (FSC) estimates. The second-stage ratio adjustment factors also provide estimates of CPS coverage (see Chapter 16). greatest for estimates of month-to-month change, improvements are also realized for estimates of change over other intervals of time and for estimates of levels in a given month (Breau and Ernst, 1983). Prior to 1985, the two estimators described in the preceding paragraph were the only terms in the CPS composite estimator and were given equal weight. Since 1985, the weights for the two estimators have been unequal and a third term has been included, an estimate of the net difference between the incoming and continuing parts of the current month’s sample. The formula for the composite estimate of a labor force level YtⲐ is Ⲑ ⫹ 䉭t兲 ⫹ ⟨␤ˆ t YtⲐ ⫽ 共1⫺K兲Ŷt ⫹ K共Yt⫺1 where 8 Ŷt ⫽ 兺 xt,i i⫽1 COMPOSITE ESTIMATOR2 Once each record has a final weight, an estimate of level for any given set of characteristics identifiable in the CPS can be computed by summing the final weights for all the sample cases that have that set of characteristics. The process for producing this type of estimate has been variously referred to as a Horvitz-Thompson estimator, a two-stage ratio estimator, or a simple weighted estimator. But the estimator actually used for the derivation of most official CPS labor force estimates that are based upon information collected every month from the full sample (in contrast to information collected in periodic supplements or from partial samples) is a composite estimator. Composite estimation for the CPS modifies the aggregated FSC estimates without adjusting the weights of individual sample records. This is a disadvantage for data users, since composite estimates for a particular month cannot be produced from a microdata file for only that month. However, research is being conducted into the possibility of using a composite estimator from which composite estimates could be produced from a single month’s microdata file (Lent, Miller, and Cantwell, 1994). In general, a composite estimate is a weighted average of several estimates. The composite estimate from the CPS has historically combined two estimates. The first of these is the FSC estimate described above. The second consists of the composite estimate for the preceding month and an estimate of the change from the preceding to the current month. The estimate of the change is based upon data from that part of the sample which is common to the two months (about 75 percent). The higher month-to-month correlation between estimates from the same sample units tends to reduce the variance of the estimate of month-tomonth change. Although the average improvements in variance from the use of the composite estimator are 2 See Appendix I for update. 䉭t ⫽ 4 共xt,i ⫺ xt⫺1, i⫺1兲 and 3兺 i⑀s ␤ˆ t ⫽ i = xt,i ⫽ 1 xt,i 兺 xt,i ⫺ 3兺 i僆s i␧s 1,2,...,8 (month in sample). sum of weights after second-stage ratio adjustment of respondents in month t, and month in sample i with characteristic of interest. S = {2,3,4,6,7,8}. In the formula above, Ŷt is the current month’s FSC estimate, 䉭t is the estimate of change from rotation groups common to months t and t-1, and ␤ˆ t is the estimate of the net difference between the incoming and continuing part of the current month’s sample. The third term, ␤ˆ t, was added primarily because it was found to keep composited and uncomposited estimates closer while leaving intact the variance reduction advantages of the composite estimator. It also is often characterized as an adjustment term for the bias associated with rotation groups or time in sample in the CPS (see Chapter 15). Historically, the incoming parts of the sample (the rotation groups in their first or fifth month-in-sample) have tended to have higher labor force participation and unemployment rates than the continuing parts (Bailar, 1975). The third term offsets, to some degree, the effect of this bias on the estimate of change in the second term. If there was no such bias, then the expected value of ␤ˆ t would be zero. The values of the constants K (K=0.4) and A (A=0.2) were chosen to be approximately optimal for reducing variances of estimates of labor force characteristics. They reflect a compromise of optimal values across a variety of characteristics and types of estimates (Kostanich and Bettin, 1986). 10–10 The compositing formula is applied only to estimated levels. Other types of composite estimates (rates, percents, means, medians) are computed from the component composited levels (e.g., an unemployment rate is computed by taking the composited unemployment level as a percentage of the composited labor force level). SEASONAL ADJUSTMENT A time series is a sequence of observations (measurements or estimates) of a particular measurable phenomenon over time. The use and interpretation of many time series, including many of the monthly labor force time series that are based on the CPS, are enhanced by the process of seasonal adjustment. The objective of seasonal adjustment is to measure and remove from time series the effects of normal seasonality caused by such things as weather, holidays, and school schedules. Seasonality can account for much of the observed month-to-month change in estimates, such as those for employment and unemployment, and can obscure the underlying movements associated with long-term cycles and trends that are of great economic significance for most users. For example, the unadjusted CPS levels of employment and unemployment in June are consistently much higher than those for May because of the influx of students into the labor force. If the only change that occurred in the unadjusted estimates between May and June approximated the normal seasonal change, then the seasonally adjusted estimates for the 2 months should be about the same, indicating that essentially no change occurred in the underlying business cycle and trend even though there may have been a large change in the unadjusted data. Changes that do occur in the seasonally adjusted series reflect changes not associated with normal seasonal change and should provide information about the direction and magnitude of changes in the behavior of trend and business cycle effects. They may, however, also reflect the effects of sampling error and other irregularities, which are not removed by the seasonal adjustment process. Change in the seasonally adjusted series can and occasionally should be in a direction opposite the movement in the unadjusted series. Refinements of the methods used for seasonal adjustment have been under development for decades. The procedure used since 1980 for the seasonal adjustment of the labor force series is the X-11 ARIMA program, developed by Statistics Canada in the late 1970s as an extension and improvement of the widely used X-11 method developed at the U.S. Census Bureau in the 1960s. The X-11 approach to seasonal adjustment is univariate and nonparametric and involves the iterative application of a set of moving averages that can be summarized as one lengthy weighted average (Dagum, 1983). Nonlinearity is introduced by a set of rules and procedures for identifying and reducing the effect of ‘‘extreme values.’’ In most uses of X-11 seasonal adjustment, including that for CPS-based labor force series, the seasonality is estimated as evolving rather than fixed over time. A detailed description of the program is given in Scott and Smith (1974). The current official practice for the seasonal adjustment of the labor force series involves the running of all directly adjusted series through X-11 ARIMA twice each year, after receipt of June and December data, with 6 months of projected factors drawn from each run and used in the subsequent 6 months, and historical revisions drawn from the end-of-year run. This practice allows, among other things, the publication of the seasonal factors prior to their use. Seasonally adjusted estimates of many labor force series, including the levels of the civilian labor force, employment, and unemployment and all unemployment rates, are derived indirectly by arithmetically combining or aggregating the series directly adjusted with X-11 ARIMA. For example, the overall unemployment rate is computed using 12 directly adjusted components—unemployment, agricultural employment, and nonagricultural employment by sex and by age (16-19 and 20+). The principal reason for doing such indirect adjustment is that it ensures that the major seasonally adjusted totals will be arithmetically consistent with at least one set of components. If the totals were directly adjusted along with the components, such consistency would generally not occur because X-11 is not a sum- or ratio-preserving procedure. It is not generally appropriate to apply factors computed for an aggregate series to the components of the aggregate because various components tend to have significantly different patterns of seasonal variation. For up-to-date information and a more thorough discussion on the seasonal adjustment of the labor force series, see the January issues of Employment and Earnings (U.S. Department of Labor). MONTHLY STATE AND SUBSTATE ESTIMATES Employment and unemployment estimates for states and local areas are key indicators of local economic conditions. Under a Federal-state cooperative program, monthly estimates of the civilian labor force and unemployment are prepared for some 6,950 areas, including all states, metropolitan areas, counties, cities of 25,000 population or more, and all cities and towns in New England. The Bureau of Labor Statistics is responsible for the concepts, definitions, technical procedures, validation, and publication of the estimates which are prepared by state employment security agencies. The state and area estimates are used by a wide variety of customers. Federal programs base allocations to states and areas on the data, as well as eligibility determinations for assistance. State and local governments use the estimates for planning and budgetary purposes and to determine the need for local employment and training services. Private industry and individuals use the data to compare and assess labor market developments in states and substate areas. 10–11 The underlying concepts and definitions of all labor force data developed for state and substate areas are consistent with those of the Current Population Survey (CPS). Annual average estimates for all states are derived directly from the CPS. In addition, monthly estimates for 11 large states (California, Florida, Illinois, Massachusetts, Michigan, New Jersey, New York, North Carolina, Ohio, Pennsylvania, and Texas) and two substate areas (New York City and the Los Angeles-Long Beach metropolitan area) also come directly from the CPS; these areas have a sufficiently large sample to meet the BLS standard of reliability. For the remaining 39 states and the District of Columbia, where the CPS sample is not large enough to produce reliable monthly estimates, data are produced using a signal-plus-noise time series models. These models combine current and historical data from the CPS, the Current Employment Statistics (CES) program, and State unemployment insurance (UI) systems to produce estimates that reflect each state’s individual economy. Estimates for substate labor market areas (other than the two direct-use CPS areas) are produced through a building block approach which uses data from several sources, including the CPS, CES, state unemployment insurance systems, and the decennial census to create estimates which are then adjusted to the state modelbased measures of employment and unemployment. Below the labor market area level, estimates are prepared for all counties and cities with populations of 25,000 or more using disaggregation techniques based on decennial census and annual population estimates and current UI statistics. Estimates for States The employment and unemployment estimates for 11 large states (California, Florida, Illinois, Massachusetts, Michigan, New Jersey, New York, North Carolina, Ohio, Pennsylvania, and Texas) are sufficiently reliable to be taken directly from the Current Population Survey (CPS) on a monthly basis. These states are termed ‘‘direct-use’’ states. For the 39 smaller states and the District of Columbia (called ‘‘nondirect-use’’ states) models based on a signalplus-noise approach (Bell and Hillmer, 1990; Tiller, 1992) are used to develop employment and unemployment estimates.3 The model of the signal is a time series model of the true labor force which consists of three components: a variable coefficient regression, a flexible trend, and a flexible seasonal component. The regression components are based on historical and current relationships found within each state’s economy as reflected in the different sources of data that are available for each state—the CPS, the Current Employment Statistics (CES) survey, and the 3 Because of the January 1996 sample reduction, estimates for all states are based on a model. unemployment insurance (UI) system. The noise component of the models explicitly accounts for autocorrelation in the CPS sampling error and changes in the average magnitude of the error. While all the state models have important components in common, they differ somewhat from one another to better reflect individual state labor force characteristics. See Appendix E for more details on the models. Two models—one for the employment-to-population ratio and one for the unemployment rate—are used for each state. The employment-to-population ratio, rather than the employment level, and the unemployment rate, rather than the unemployment level, are estimated primarily because the hyperparameters used in the state-space models are easier to estimate and compare across states for ratios than levels. The employment-to-population ratio models estimate the monthly CPS employment-to-population ratio using the state’s monthly CPS data and employment data from the CES. The models also include trend and seasonal components to account for different seasonal patterns in the CPS not captured by the CES series. The seasonal component accounts for the seasonality in the CPS not explained by the CES while the trend component adjusts for long-run systematic differences between the two series (Evans, Tiller, and Zimmerman, 1993). The unemployment rate models estimate the monthly CPS unemployment rate using the state’s monthly unemployment insurance claims data, along with trend and seasonal components (Zimmerman, Evans, and Tiller, 1993). Once the estimates are developed from the models, levels are calculated for the employment, unemployment, and labor force. State estimates are seasonally adjusted using the X-11 ARIMA seasonal-adjustment procedure twice a year and new seasonal adjustment factors are created. The entire series is used as an input to the seasonal adjustment process; however, for programming purposes only the last 5 years of data are replaced. Benchmarking and Population Controls Once each year, when new population controls are available from the Census Bureau, CPS labor force estimates for all states and the District of Columbia are adjusted to these controls. The model-based estimates for the 39 states and the District of Columbia are reestimated incorporating the recontrolled CPS labor force estimates (see above), and these (monthly) estimates are adjusted, or benchmarked, to the newly population-controlled annual average CPS estimates. The benchmarking technique uses a procedure called the Denton Method (Denton, 1971) which adjusts the annual average of the modelbased employment and unemployment levels to equal the corresponding CPS annual averages, while preserving, as much as possible, the original monthly seasonal pattern of the model-based estimates. 10–12 Substate estimates for previous years are also revised on an annual basis. The updates incorporate any changes in the inputs, such as revisions to establishment-based employment estimates, corrections in claims data, and updated historical relationships. The revised estimates are then readjusted to the latest (benchmarked) state estimates of employment and unemployment. PRODUCING OTHER LABOR FORCE ESTIMATES In addition to basic weighting to produce estimates for persons, several ‘‘special-purpose’’ weighting procedures are performed each month. These include: • Weighting to produce estimates for households and families. • Weighting to produce estimates from data based on only 2 of 8 rotation groups (outgoing rotation weighting for the quarter sample data). • Weighting to produce labor force estimates for veterans and nonveterans (veterans weighting). • Weighting to produce estimates from longitudinally-linked files (longitudinal weighting). Most of these special weights are based on the final weight after the second-stage ratio adjustment. Some also make use of composited estimates. In addition, consecutive monthly estimates are often averaged to produce quarterly or annual average estimates. Each of these procedures is described in more detail below. Family Weight Family weights are used to produce statistics on families and family composition. They also provide the basis for household weights. The family weight is derived from the final weight of the reference person in each household. In most households, it is exactly the reference person’s weight. However, when the reference person is a married man, for purposes of family weights, he is given the same weight as his wife. This is done so that weighted tabulations of CPS data by sex and marital status show an equal number of married women and married men with their spouses present. If the CPS final weights were used for this tabulation (without any further adjustment), the estimated numbers of married women and married men would not be equal, since the second-stage ratio adjustment tends to increase the weights of males more than the weights of females. The wife’s weight is usually used as the family weight, since CPS coverage ratios for women tend to be higher and subject to less month-to-month variability than those for men. Household Weight The same household weight is assigned to every person in the same household and is equal to the family weight of the household reference person. The household weight is used to produce estimates at the household level, such as the number of households headed by a female or the number of occupied housing units. Outgoing Rotation Weights (Quarter Sample Data) Some items in the CPS questionnaire are asked only in households due to rotate out of the sample temporarily or permanently after the current month. These are the households in the rotation groups in their fourth or eighth monthin-sample, sometimes referred to as the ‘‘outgoing’’ rotation groups. Items asked in the outgoing rotations include those on discouraged workers (through 1993), earnings (since 1979), union affiliation (since 1983), and industry and occupation of second jobs of multiple jobholders (beginning in 1994). Since the data are collected from only one-fourth of the sample each month, these estimates are averaged over 3 months to improve their reliability, and published quarterly. Since 1979, most CPS files have included separate weights for the outgoing rotations. These weights were generally referred to as ‘‘earnings weights’’ on files through 1993, and are generally called ‘‘outgoing rotation weights’’ on files for 1994 and subsequent years. In addition to ratio adjustment to independent population controls (in the second stage), these weights also reflect additional constraints that force them to sum to the composited estimates of employment, unemployment, and not in labor force each month. An individual’s outgoing rotation weight will be approximately four times his or her final weight. To compute the outgoing rotation adjustment factors, the final CPS weights of the appropriate records in the two outgoing rotation groups are tallied. CPS composited estimates from the full sample for the labor force categories of employed wage and salary workers, other employed, unemployed, and not in labor force by age, race and sex are used as the controls. The adjustment factor for a particular cell is the ratio of the control total to the weighted tally from the outgoing rotation groups. Collapsing is performed with an adjacent cell (the next higher or lower age group) if: • A cell has no sample respondents, • A composited control total is less than or equal to zero, or • The corresponding adjustment factor is less than or equal to 2.0 or greater than or equal to 8.0 (i.e., one-half or twice the normal adjustment factor of 4). If a cell requires collapsing, it is collapsed with another cell (the next higher or lower age group) in the same sex, race, and labor force category. The adjustment factors are 10–13 recomputed after all collapsing has been performed. The outgoing rotation weights are obtained by multiplying the outgoing ratio adjustment factors by the final CPS weights. For consistency, an outgoing rotation group weight equal to four times the basic CPS family weight is assigned to all persons in the two outgoing rotation groups who were not eligible for this special weighting (military personnel and persons aged 15 and younger). Production of monthly, quarterly, and annual estimates using the quarter sample data and the associated weights is completely parallel to production of uncomposited, simple weighted estimates from the full sample—the weights are summed and divided by the number of months used. The composite estimator is not applicable for these estimates because there is no overlap between the quarter samples in consecutive months. Because the outgoing rotations are all independent samples within any consecutive 12-month period, averaging of these estimates on a quarterly and annual basis realizes relative reductions in variance greater than those achieved by averaging full sample estimates. Family Outgoing Rotation Weight The family outgoing rotation weight is analogous to the family weight computed for the full sample, except that outgoing rotation weights are used, rather than the final weights from the second-stage ratio adjustment. Veterans’ Weights Since 1986, CPS interviewers have collected data on veteran status from all respondents. Veterans’ weights are calculated for all CPS respondents based on their veteran status. This information is used to produce tabulations of employment status for veterans and nonveterans. The process begins with the final weights after the second-stage ratio adjustment. Each respondent is classified as a veteran or a nonveteran. Veterans’ records are classified into six cells: (1) male, pre-Vietnam veteran; (2) female, pre-Vietnam veteran; (3) male, Vietnam veteran; (4) female, Vietnam veteran; (5) male, other veteran; and (6) female, other veteran. The cell definitions change throughout the decade as the minimum age for nonpeacetime veterans increases. The final weights for CPS veterans are tallied into sex/age/type-of-veteran cells using the classifications described above. Separate ratio adjustment factors are computed for each cell, using independently established monthly counts of veterans provided by the Department of Veterans Affairs. The ratio adjustment factor is the ratio of the independent control total to the sample estimate. The final weight for each veteran is multiplied by the appropriate adjustment factor to produce the veteran’s weight. To compute veterans’ weights for nonveterans, a table of composited estimates is produced from the CPS data by sex, race (White/non-White), labor force status (unemployed, employed, and not in the labor force), and age. The veterans’ weights produced in the previous step are tallied into the same cells. The estimated number of veterans is then subtracted from the corresponding cell entry for the composited table to produce nonveterans control totals. The final weights for CPS nonveterans are tallied into the same sex/race/labor force status/age cells. Separate ratio adjustment factors are computed for each cell, using the nonveterans controls derived above. The factor is the ratio of the nonveterans control total to the sample estimate. The final weight for each nonveteran is multiplied by the appropriate factor to produce the nonveterans’ weight. A table of labor force estimates by age status for veterans and nonveterans is published each month. Longitudinal Weights For many years, the month-to-month overlap of 75 percent of the sample households has been used as the basis for estimating monthly ‘‘gross flow’’ statistics. The difference or change between consecutive months for any given level or ‘‘stock’’ estimate is an estimate of net change that reflects a combination of underlying flows in and out of the group represented. For example, the month-to-month change in the employment level is the number of people who went from not being employed in the first month to being employed in the second month minus the number who made the opposite transition. The gross flow statistics provide estimates of these underlying flows and can provide useful insights to analysts beyond those available in the stock data. The estimation of monthly gross flows, and any other longitudinal use of the CPS, begins with a longitudinal matching of the microdata (or person-level) records within the rotation groups common to the months of interest. Each matched record brings together all the information collected in those months for a particular individual. The CPS matching procedure uses the household identifier and person line number as the keys for matching. Prior to 1994, it was also necessary to check other information and characteristics, such as age and sex, for consistency to verify that the match based on the keys was almost certainly a valid match. Beginning with 1994 data, the simple match on the keys provides an essentially certain match. Because the CPS does not follow movers (rather, the sample addresses remain in sample according to rotation pattern), and because not all households are successfully interviewed every month they are in sample, it is not possible to match interview information for all persons in the common rotation groups across the months of interest. The highest percentage of matching success is generally achieved in the matching of consecutive months, where between 90 and 95 percent of the potentially matchable records (or about 67 to 71 percent of the full sample) can usually be matched. The use of CATI and CAPI since 1994 has also introduced dependent interviewing which eliminated much of the erratic differences in response between pairs of months. 10–14 On most CPS files for 1994 forward, there is a longitudinal weight which allows users to estimate gross labor force flows by simply summing up the longitudinal weights after matching. These longitudinal weights reflect the technique that had been used prior to 1994 to inflate the gross flow estimates to appropriate population levels. That technique simply inflates all estimates or final weights by the ratio of the current month’s population controls to the sum of the final weights for the current month in the matched cases by sex. Although the technique does provide estimates consistent with the population levels for the stock data in the current month, it does not force consistency with labor force stock levels in either the current or the previous month, nor does it control for the effects of the bias and sample variation associated with the exclusion of movers, differential noninterview in the matched months, the potential for the compounding of classification errors in flow data, and the particular rotations that are common to the matched months. There have been a number of proposals for improving the estimation of gross labor force flows, but none have yet been adopted in official practice. See Proceedings of the Conference on Gross Flows in Labor Force Statistics (U.S. Department of Commerce and U.S. Department of Labor, 1985) for information on some of these proposals and for more complete information on gross labor force flow data and longitudinal uses of the CPS. Averaging Monthly Estimates CPS estimates are frequently averaged over a number of months. The most commonly computed averages are (1) quarterly, which provide four estimates per year by grouping the months of the calendar year in nonoverlapping intervals of three, and (2) annual, combining all 12 months of the calendar year. Quarterly and annual averages of uncomposited data can be computed by summing the weights for all of the months contributing to each average and dividing by the number of months involved. Averages for calculated cells, such as rates, percents, means, and medians, are computed from the averages for the component levels, not by averaging the monthly values (e.g., a quarterly average unemployment rate is computed by taking the quarterly average unemployment level as a percentage of the quarterly average labor force level, not by averaging the three monthly unemployment rates together). Although such averaging multiplies the number of interviews contributing to the resulting estimates by a factor approximately equal to the number of months involved in the average, the sampling variance for the average estimate is actually reduced by a factor substantially less than that number of months. This is primarily because the CPS rotation pattern and resulting month-to-month overlap in sample units ensure that estimates from the individual months are not independent. The reduction in sampling error associated with the averaging of CPS estimates over adjacent months was studied using 12 months of data collected beginning January 1987 (Fisher and McGuinness, 1993). That study showed that characteristics for which the month-to-month correlation is low, such as unemployment, are helped considerably by such averaging, while characteristics for which the correlation is high, such as employment, benefit less from averaging. For unemployment, variances of national estimates were reduced by about one-half for quarterly averages and about onefifth for annual averages. REFERENCES Bailar, B., (1975) ‘‘The Effects of Rotation Group Bias on Estimates from Panel Surveys,’’ Journal of the American Statistical Association, Vol. 70, pp. 23-30. Bell, W.R., and Hillmer, S.C. (1990), ‘‘The Time Series Approach to Estimation for Repeated Surveys,’’ Survey Methodology, 16, pp. 195-215. Breau, P. and Ernst, L., (1983) ‘‘Alternative Estimators to the Current Composite Estimator,’’ Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 397-402. Copeland, K. R., Peitzmeier, F. K., and Hoy, C. E. (1986), ‘‘An Alternative Method of Controlling Current Population Survey Estimates to Population Counts,’’ Proceedings of the Survey Research Methods Section, American Statistical Association, pp.332-339. Dagum, E.B. (1983), The X-11 ARIMA Seasonal Adjustment Method, Statistics Canada, Catalog No. 12-564E. Denton, F.T. (1971), ‘‘Adjustment of Monthly or Quarterly Series to Annual Totals: An Approach Based on Quadratic Minimization,’’ Journal of the American Statistical Association, 64, pp. 99-102. Evans, T.D., Tiller, R.B., and Zimmerman, T.S. (1993), ‘‘Time Series Models for State Labor Force Estimates,’’ Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 358-363. Fisher, R. and McGuinness R. (1993), ‘‘Correlations and Adjustment Factors for CPS (VAR80 1).’’ Internal Memorandum for Documentation, January 6th, Demographic Statistical Methods Division, U.S. Census Bureau. Hansen, M. H., Hurwitz, W. N., and Madow, W. G.(1953), Sample Survey Methods and Theory, Vol. II, New York: John Wiley and Sons. Harvey, A.C. (1989), Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge: Cambridge University Press. Ireland, C.T., and Kullback, S. (1968), ‘‘Contingency Tables With Given Marginals,’’ Biometrika, 55, pp. 179187. Kostanich, D. and Bettin, P. (1986), ‘‘Choosing a Composite Estimator for CPS,’’ presented for presentation at the International Symposium on Panel Surveys, Washington, DC. 10–15 Lent, J., Miller, S., and Cantwell, P. (1994), ‘‘Composite Weights for the Current Population Survey,’’ Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 867-872. Oh, H.L., and Scheuren, F., (1978), ‘‘Some Unresolved Application Issues in Raking Estimation,’’ Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 723-728. Proceedings of the Conference on Gross Flows in Labor Force Statistics (1985), U.S. Department of Commerce and U.S. Department of Labor. Scott, J.J., and Smith, T.M.F. (1974), ‘‘Analysis of Repeated Surveys Using Time Series Methods,’’ Journal of the American Statistical Association, 69, pp. 674678. Thompson, J.H., (1981), ‘‘Convergence Properties of the Iterative 1980 Census Estimator,’’ Proceedings of the American Statistical Association on Survey Research Methods, American Statistical Association, pp. 182185. Tiller, R.B. (1989), ‘‘A Kalman Filter Approach to Labor Force Estimation Using Survey Data,’’ Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 16-25. Tiller, R.B.(1992), ‘‘Time Series Modeling of Sample Survey Data From the U.S. Current Population Survey,’’ Journal of Official Statistics, 8, pp, 149-166. U.S. Department of Labor, Bureau of Labor Statistics (Published monthly), Employment and Earnings, Washington, DC: Government Printing Office. Young, A.H., (1968), ‘‘Linear Approximations to the Census and BLS Seasonal Adjustment Methods,’’ Journal of the American Statistical Association, 63, pp. 445471. Zimmerman, T.S., Evans, T.D., and Tiller, R.B. (1994), ‘‘State Unemployment Rate Time Series,’’ Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 1077-1082. 11–1 Chapter 11. Current Population Survey Supplemental Inquiries INTRODUCTION In addition to providing data on the labor force status of the population, the Current Population Survey (CPS) is used to collect data for a variety of studies on the entire U.S. population and specific population subsets. Such studies keep the nation informed of the economic and social well being of its people and are conducted for use by Federal and state agencies, private foundations, and other organizations. Supplemental inquiries take advantage of several special features of the CPS: large sample size and general purpose design; highly skilled, experienced interviewing and field staff; and generalized processing systems that can easily accommodate the inclusion of additional questions. Some CPS supplemental inquiries are conducted annually, others every other year, and still others on a one-time basis. The frequency and recurrence of a supplement depend on what best meets the needs of the supplement’s sponsor. It is important that any supplemental inquiry meet strict criteria discussed in the next section. Producing supplemental data from the CPS involves more than just including additional questions. Separate data processing is required to edit responses for consistency and to impute missing values. Another weighting method is often necessary because the supplement targets a different universe from the basic CPS. A supplement can also engender a different level of response or cooperation from respondents. CRITERIA FOR SUPPLEMENT INQUIRIES As a basis for undertaking supplements for federal agencies or other sponsors, a number of criteria which determine the acceptability of such work have been developed and refined over the years. These criteria were developed by the Census Bureau in consultation with the Bureau of Labor Statistics (BLS). The staff of the Census Bureau, working with the sponsoring agency, develops the survey design, including the methodology, questionnaires, pretesting options, interviewer instructions and processing requirements. The Census Bureau provides a written description of the statistical properties associated with each supplement. The same standards of quality that apply to the basic CPS apply to the supplements. The following criteria are considered before undertaking a supplement: 1. The subject matter of the inquiry must be in the public interest. 2. The inquiry must not have an adverse effect on the CPS or other Census Bureau programs. The questions must not cause respondents to question the importance of the survey and result in losses of response or quality. It is essential that the image of the Census Bureau as the objective fact finder for the Nation is not damaged. Other important functions of the Census Bureau, such as the decennial censuses or the economic censuses, must not be affected in terms of quality or response rates or in congressional acceptance and approval of these programs. 3. The subject matter must be compatible with the basic CPS survey. This is to avoid introducing a concept that could affect the accuracy of responses to the basic CPS information. For example, a series of questions incorporating a revised labor force concept that could inadvertently affect responses to the standard labor force items would not be included. 4. The inquiry must not slow down the work of the basic survey or impose a response burden that may affect future participation in the basic CPS. In general, the supplemental inquiry must not add more than 10 minutes of interview time per respondent or 25 minutes per household. All conflicts or competition for the use of Census Bureau staff or facilities that arise in dealing with the supplemental inquiry are resolved by giving the basic CPS first priority. The Census Bureau will not jeopardize the schedule for completing CPS or other Census Bureau work to favor completing a supplemental inquiry within some specified time frame. 5. The subject matter must not be overly sensitive. This criterion is imprecise, and its interpretation has changed over time. For example, the subject of birth expectations, once considered sensitive, has been included as a CPS supplemental inquiry. 6. It must be possible to meet the objectives of the inquiry through the survey method. That is, it must be possible to translate the survey objectives into meaningful questions, and the respondent must be able to supply the information required to answer the questions. 11–2 7. If the supplemental information is to be collected during the CPS interview, the inquiry must be suitable for the personal visit/telephone procedures used in the CPS. 8. All data must abide by the Census Bureau’s enabling legislation, which, in part, ensures that no information will be released that can identify an individual. Requests for name, address, social security number, or other information that can directly identify an individual will not be included. In addition, information that could be used to indirectly identify an individual with a high probability of success (e.g., small geographic areas in conjunction with income or age) will be suppressed. 9. The cost of supplements must be borne by the sponsor, regardless of the nature of the request or the relationship of the sponsor to the ongoing CPS. The questionnaires developed for the supplement are subject to the Census Bureau’s pretesting policy. This policy was established in conjunction with other sponsoring agencies to encourage questionnaire research aimed at improving data quality. Even though the proposed inquiry is compatible with the criteria given in this section, the Census Bureau does not make the final decision regarding the appropriateness or desirability of the supplemental survey. The Office of Management and Budget, through its Statistical Policy Division, reviews the proposal to make certain it meets government wide standards regarding the need for the data and appropriateness of the design and ensures that the survey instruments, strategy, and response burden are acceptable. RECENT SUPPLEMENTAL INQUIRIES The scope and type of CPS supplemental inquiries vary considerably from month to month and from year to year. Generally, in any given month, a respondent who is selected for the supplement is asked the additional questions that comprise the supplemental inquiry after completing the regular part of the CPS. Table 11-1 summarizes CPS supplemental inquiries that were conducted between September 1994 and December 2001. The Housing Vacancy Supplement (HVS) is unusual in that it is the only supplement that is conducted every month. This supplement collects additional information (e.g., number of rooms, plumbing, and rental/sales price) on housing units identified as vacant in the basic CPS. Probably the most widely used supplement is the Annual Demographic Survey (ADS) which is conducted every March. This supplement collects data on work experience, several sources of income, migration, household composition, health insurance coverage, and receipt of noncash benefits. The basic CPS weighting is not always appropriate for supplements, since supplements tend to have higher nonresponse rates. In addition, supplement universes are generally different from the basic CPS universe. Thus, some supplements require weighting procedures different from the basic CPS. These variations are described for two of the major supplements, the Housing Vacancy Survey and the Annual Demographic Survey, in the following sections. Housing Vacancy Survey Supplement Description of supplement The HVS is a monthly supplement to the CPS sponsored by the Census Bureau. The supplement is administered when the CPS encounters a unit in sample that is intended for year-round or seasonal occupancy and is currently vacant, or occupied by persons with a usual residence elsewhere. The interviewer asks a reliable respondent (e.g., the owner, a rental agent, or a knowledgeable neighbor) questions on year built; number of rooms, bedrooms, and bathrooms; how long the housing unit has been vacant; the vacancy status (for rent, for sale, etc); and when applicable, the selling price or rent amount. The purpose of the HVS is to provide current information on the rental and homeowner vacancy rates, home ownership rates, and characteristics of units available for occupancy in the United States as a whole, geographic regions, and inside and outside metropolitan areas. The rental vacancy rate is a component of the index of leading economic indicators which is used to gauge the current economic climate. Although the survey is performed monthly, data for the nation and for Northeast, South, Midwest, and West regions are released quarterly and annually. The data released annually include information for states and large metropolitan areas. Calculation of vacancy rates The HVS collects data on year-round and seasonal vacant units. Vacant year-round units are those intended for occupancy at any time of the year, even though they may not be in use year-round. In resort areas, a housing unit which is intended for occupancy on a year-round basis is considered a year-round unit; those intended for occupancy only during certain seasons of the year are considered seasonal. Also, vacant housing units held for occupancy by migratory workers employed in farm work during the crop season are classified as seasonal. The rental and homeowner vacancy rates are the most prominent HVS statistics. The vacancy rates are determined using information collected by the HVS and CPS since the formulas use both vacant and occupied housing units. The rental vacancy rate is calculated as the ratio of vacant year-round units for rent to the sum of renter occupied units, vacant year-round units rented but awaiting occupancy, and vacant year-round units for rent. 11–3 Table 11–1. Current Population Survey Supplements September 1994 - December 2001 Title Month Purpose Sponsor Housing Vacancy Monthly Provide quarterly data on vacancy rates and characteristics of vacant Census units. Health/Pension September 1994 Provide information on health/pension coverage for persons 40 years PWBA of age and older. Information includes benefit coverage by former as well as current employer and reasons for noncoverage, as appropriate. Amount, cost, employer contribution, and duration of benefits are also measured. Periodicity: As requested. Lead Paint Hazards Awareness December 1994 June 1997 Provide information on the current awareness of the health hazards HUD associated with lead-based paint. Periodicity: As requested. Contingent Workers February 1995, 1997, 1999, Provide information on the type of employment arrangement workers BLS 2001 have on their current job and other characteristics of the current job such as earnings, benefits, longevity, etc., along with their satisfaction with and expectations for their current jobs. Periodicity: Biennial. Annual Demographic Supplement March 1995 - 2001 Food Security April 1995, September 1996, Data that will measure hunger and food security. It will provide data on FNS April 1997, August 1998, April food expenditures, access to food, and food quality and safety. 1999, September 2000, April 2001, December 2001 Race and Ethnicity May 1995, July 2000 Use alternative measurement methods to evaluate how best to collect BLS/Census these types of data. Marital History June 1995 Information from ever married persons on marital history. Periodicity: Quin- Census/BLS quennial. Fertility June 1995, 1998, 2000 Data on the number of children that women aged 15-44 have ever had Census/BLS and the children’s characteristics. Periodicity: Quinquennial. Educational Attainment July 1995 Tested several methods of collecting these data. Tested both the cur- BLS/Census rent method (highest grade completed or degree received) and the old method (highest grade attended and grade completed). Veterans August 1995, August 1997, Data for veterans of the United States on Vietnam-theatre status, service- BLS September 1999, August 2001 connected income, effect of a service-connected disability on current labor force participation, and participation in veterans’ programs. Periodicity: Biennial. School Enrollment October 1994, 1995, 1996, Provide information on school enrollment, junior or regular college atten- BLS/Census/NCES 1997, 1998, 1999, 2000, 2001 dance, and high school graduation. Periodicity: Annual. Tobacco Use September 1995, January 1996, May 1996, September 1998, January 1999, May 1999, January 2000, May 2000, June 2001, November 2001 Data for 15+ population on current and former use of tobacco products; NCI restrictions of smoking in workplace for employed persons; and personal attitudes toward smoking; a repeat of supplements in 1995/1996. Supplements will be repeated in 1998/1999. Periodicity: As requested. Displaced Workers February 1996, 1998, 2000 Provide data on workers who lost a job in the last 5 years because of BLS plant closing, shift elimination, or other work-related reason. Periodicity: Biennial. Job Tenure/Occupational Mobility February 1996, 1998, 2000 Collect data that will measure an individual’s tenure with his/her current BLS employer and in his/her current occupation. Periodicity: As requested. Child Support April 1996, 1998, 2000 Identify households with absent parents and provide data on child sup- OCSE port arrangements, visitation rights of absent parent, amount and frequency of actual versus awarded child support, and health insurance coverage. Data are also provided on why child support was not received and/or awarded. April data will be matched to March data. Voting and Registration November 1994, 1996, 1998, Provide demographic information on persons who registered and did not Census 2000 register to vote. Also measures number of persons who voted and reasons for not registering. Periodicity: Biennial. Work Schedule/ Home-Based Work May 1997, 2001 Provide information about multiple job holdings and work schedules and BLS telecommuters who work at a specific remote site. Telephone Availability March, July & November 1994-2001 Collect data on whether there is a phone in the HU, on which to con- FCC tact the person; is a telephone interview acceptable. Computer Use/Internet Use August 2000, September 2001, Obtain information about household access to computers and the use NTIA November 1994, October 1997 of the internet or worldwide web. Data concerning work experience, several sources of income, migra- Census/BLS tion, household composition, health insurance coverage, and receipt of noncash benefits. Periodicity: Annual. 11–4 The homeowner vacancy rate is calculated as the ratio of vacant year-round units for sale to the sum of owner occupied units, vacant year-round units sold but awaiting occupancy, and vacant year-round units for sale. Weighting procedure Since the HVS universe differs from the CPS universe, the HVS records require a different weighting procedure from the CPS records. The HVS records are weighted by the CPS basic weight, the CPS special weighting factor, and two HVS adjustments. (Refer to Chapter 10 for a description of the two CPS weighting adjustments.) The two HVS adjustments are referred to as the HVS first-stage ratio adjustment and the HVS second-stage ratio adjustment. The HVS first-stage ratio adjustment is comparable to the CPS first-stage ratio adjustment in that it reduces the contribution to variance from the sampling of PSUs. The adjustment factors are based on 1990 census data. For each state, they are calculated as the ratio of the statelevel census count of vacant year-round housing units in all NSR PSUs to the state-level estimate, using census counts, of vacant year-round housing units in NSR PSUs. The first-stage adjustment factors are applied to both vacant year-round and seasonal housing units in NSR PSUs. The HVS second-stage ratio adjustment, which applies to vacant year-round and seasonal housing units in SR and NSR PSUs, is calculated as the ratio of the weighted CPS interviewed housing units after CPS second-stage ratio adjustment to the weighted CPS interviewed housing units after CPS first-stage ratio adjustment. The cells for the HVS second-stage adjustment are calculated within each month-in-sample by census region and type of area (metropolitan/nonmetropolitan, central city/balance of MSA, and urban/rural). This adjustment is made to all eligible HVS records. The final weight for each HVS record is determined by calculating the product of the CPS basic weight, the CPS special weighting factor, the HVS first-stage ratio adjustment, and the HVS second-stage ratio adjustment. Note that the occupied units in the denominator of the vacancy rate formulas use a different final weight since the data come from the CPS. The final weight applied to the renter and owner-occupied units is the CPS household weight. (Refer to Chapter 10 for a description of the CPS household weight.) Annual Demographic Supplement (March Supplement) Description of supplement The ADS is sponsored by the Census Bureau and the Bureau of Labor Statistics (BLS). The Census Bureau has collected data in the ADS since 1947. From 1947 to 1955, the ADS took place in April and from 1956 to 2001 the ADS took place in March. In 2002, a sample increase was implemented requiring more time for data collection. Thus, additional ADS interviews are now taking place in February and April. However, even with this sample increase, most of the data collection still occurs in March. The supplement collects data on family characteristics, household composition, marital status, migration, income from all sources, information on weeks worked, time spent looking for work or on layoff from a job, occupation and industry classification of the job held longest during the year, health insurance coverage, and receipt of noncash benefits. A major reason for conducting the ADS around the month of March is to obtain better income data. It was thought that since March is the month before the deadline for filing federal income tax returns, respondents were likely to have recently prepared tax returns or be in the midst of preparing such returns and could report their income more accurately than at any other time of the year. The ADS sample consists of the March CPS sample, plus additional CPS households identified in the prior November and following April CPS samples. Starting in 2002, the eligible ADS sample households are: 1. The entire March CPS sample. 2. Hispanic households — identified in November (from all month-in-sample (MIS) rotation groups) and in April (MIS 1,5). 3. Non-Hispanic non-White households — identified in November (MIS 1, 5-8) and April (MIS 1,5). 4. Non-Hispanic White households with children 18 years or younger — identified in November (MIS 1, 5-8) and April (MIS 1,5). Prior to 2002, only the November CPS households containing at least one person of Hispanic origin were added to the ADS. The added households in 2002, along with a general sample increase in selected states (see Appendix J), are collectively known as the State Children’s Health Insurance Program (SCHIP) sample expansion. The added households improve the reliability of the ADS estimates for the Hispanic households, non-Hispanic nonWhite households, and non-Hispanic White households with children 18 years or younger. Because of the characteristics of CPS sample rotation (see Chapter 3), the additional sample from the November and April CPS rotation groups are completely different from those in the March CPS. The additional sample cases increase the effective sample size of the ADS compared to the March CPS sample alone. The ADS sample includes 18 MIS rotation groups for Hispanic households, 15 MIS rotation groups for non-Hispanic non-White households, 15 MIS rotation groups for non-Hispanic White households with children 18 years or younger, and 8 MIS rotation groups for all other households. 11–5 Table 11-2. Summary of ADS Interview Month CPS month/Hispanic status 1 November Hispanic Hispanic Month in sample Month in sample 3 4 5 6 7 8 1 2 3 4 5 6 7 8 March NI2 3 Feb. NI 2 Feb. Feb./Apr. 1 March Non-Hispanic3 Hispanic1 April Nonmover 1 Non-Hispanic March 2 Mover Non-Hispanic3 Apr. NI2 Apr. March NI2 Apr. NI2 Apr. NI2 1 Hispanics may be of any race. NI - Not interviewed for the ADS. 3 The non-Hispanic group includes both non-Hispanic non-Whites and non-Hispanic Whites with children ≤ 18 years old. 2 The March and April ADS eligible cases are administered the ADS questionnaire in those respective months (see Table 11-2). The April cases are classified as ‘‘split path’’ cases because they receive the ADS, while other households receive the supplement scheduled for April. The November eligible Hispanic households are administered the ADS questionnaire in March, regardless of rotation group (MIS 1-8). (Note that November MIS 5-8 households have already completed all 8 months of interviewing for the CPS before March, and the November MIS 1-4 households have an extra contact scheduled for the ADS before the 5th interview of the CPS later in the year.) The November eligible non-Hispanic households (in MIS 1, 5-8) are administered the ADS questionnaire in either February or April. November ADS eligible cases in MIS 1 and 5 are interviewed for the CPS in February (in MIS 4 and 8, respectively) so the ADS questionnaire is administered in February. (These are also split path cases, since other households in the rotation groups get the regular supplement scheduled for February). The November MIS 6-8 eligible cases are split between the February and April CPS interviewing months. Mover households, defined as households with a different reference person when compared to the November CPS interview, are found at the time of the ADS interview. Mover households identified from the November eligible sample are removed from the ADS sample. Mover households identified in the March and April eligible samples receive the ADS questionnaire. The ADS sample universe is slightly different from the CPS. The CPS completely excludes military personnel while the ADS includes military personal who live in households with at least one other civilian adult. These differences require the ADS to have a different weighting procedure from the regular CPS. Weighting procedure Prior to weighting, missing supplement items are assigned values based on hot deck imputation, a system that uses a statistical matching process. Values are imputed even when all the supplement data are missing. Thus, there is no separate adjustment for households that respond to the basic survey but not to the supplement. The ADS records are weighted by the CPS basic weight, the CPS special weighting factor, the CPS noninterview adjustment, the CPS first-stage ratio adjustment, and the CPS secondstage ratio adjustment procedure. (Refer to Chapter 10 for a description of these adjustments.) Also there is an additional ADS noninterview adjustment for the November ADS sample, a SCHIP Adjustment Factor, a family equalization adjustment, and an adjustment for Armed Forces members. The November eligible sample, the March eligible sample, and the April eligible sample are weighted separately up to the second-stage weighting adjustment. The samples are then combined so that one second-stage adjustment procedure is performed. The flowchart in Figure 11-1 illustrates the weighting process for the ADS sample. Households from November eligible sample. The households from the November eligible sample start with their basic CPS weight as calculated in November, modified by their November CPS special weighting factor and November CPS noninterview adjustment. At this point, a second noninterview adjustment is made for those November eligible households that are still occupied, but an interview could not be obtained in the February, March, or April CPS. Then, the November ADS sample weights are adjusted by the CPS first-stage adjustment ratio and finally, the weights are adjusted by the SCHIP Adjustment Factor. The ADS noninterview adjustment for November eligible sample. The second noninterview adjustment is applied to the November eligible sample households to reflect noninterviews of occupied housing units that occur in the February, March, or April CPS. If a noninterviewed household is actually a Mover household, it would not be eligible for interview. Since the mover status of noninterviewed households is not known, we assume that the proportion of 11–6 11–7 mover households is the same for interviewed and noninterviewed households. This is reflected in the noninterview adjustment. With this exception, the noninterview adjustment procedure is the same as described in Chapter 10. The weights of the interviewed households are adjusted by the noninterview factor as described below. At this point, the noninterviews and those November Mover households are removed from any further weighting of the ADS. The noninterview adjustment factor, Fij, is computed as follows: Fij = where: Zij = Nij = Bij = Zij + Nij + Bij Zij + Bij the weighted number of November eligible sample households interviewed in the February, March, or April CPS in cell j of cluster i. the weighted number of November eligible sample occupied, noninterviewed housing units in the February, March, or April CPS in cell j of cluster i. the weighted number of November eligible sample Mover households identified in the February, March, or April CPS in cell j of cluster i. The weighted counts used in this formula are those after the November CPS noninterview adjustment is applied. The clusters refer to the variously defined regions that compose the United States. These include clusters for the Northeast, Midwest, South, and West, as well as clusters for particular cities or smaller areas. Within each of these clusters is a pair of residence cells. These could be (1) Central City and balance of MSA, (2) MSA and non-MSA, or (3) Urban and Rural, depending on the type of cluster. SCHIP adjustment factor for the November eligible sample. The SCHIP adjustment factor is applied to Nonmover eligible households that contain residents who are Hispanic, non-Hispanic non-White, and non-Hispanic Whites with children 18 years or younger to compensate for the increased sample in these demographic categories. Hispanic households receive a SCHIP adjustment factor of 8/18 and non-Hispanic non-White households and nonHispanic White households with children 18 years or younger receive a SCHIP adjustment factor of 8/15. (See Table 11-3.) After this adjustment is applied, the November ADS sample is ready to be combined with the March and April eligible samples for the application of the secondstage ratio adjustment. Households from April eligible sample. The households from the April eligible sample start with their basic CPS weight as calculated in April, modified by their April CPS special weighting factor, the April CPS noninterview adjustment, and the SCHIP adjustment factor. After the SCHIP adjustment factor is applied, the April eligible sample is ready to be combined with the November and March eligible samples for the application of the second-stage ratio adjustment. SCHIP adjustment factor for the April eligible sample. The SCHIP adjustment factor is applied to April eligible households that contain residents who are Hispanic, non-Hispanic non-Whites, or non-Hispanic Whites with children 18 years or younger to compensate for the increased sample in these demographic categories regardless of mover status. Hispanic households receive a SCHIP adjustment factor of 8/18 and non-Hispanic non-White households and nonHispanic White households with children 18 years or younger receive a SCHIP adjustment factor of 8/15. Table 11-3 summarizes these weight adjustments. Households from the March eligible sample. The March eligible sample households start with their basic CPS weight, modified by their CPS special weighting factor, the March CPS noninterview adjustment, the March CPS first-stage ratio adjustment (as described in Chapter 10), and the SCHIP adjustment factor. SCHIP adjustment factor for the March eligible sample. The SCHIP adjustment factor is applied to the March eligible nonmover households that contain residents who are Hispanic, non-Hispanic non-White, and non-Hispanic Whites with children 18 years or younger to compensate for the increased sample in these demographic categories. Hispanic households receive a SCHIP adjustment factor of 8/18 and non-Hispanic non-White households and nonHispanic White resident households with children 18 years or younger receive a SCHIP adjustment factor of 8/15. Mover households and all other households receive a SCHIP adjustment of 1. Table 11-3 summarizes these weight adjustments. Combined household eligible sample from November, March, and April CPS. At this point, the eligible sample from November, March, and April are combined. The remaining adjustments are applied to this combined sample file. ADS second-stage ratio adjustment. The second-stage ratio adjustment adjusts the ADS estimates so that they agree with independent age, sex, race, and ethnicity population controls as described in Chapter 10. The same procedure, used for CPS, is used for the ADS. Additional ADS weighting. After the ADS weight (the weight through the second-stage procedure) is determined, the next step is to determine the final ADS weight. There are two more weighting adjustments applied to the ADS sample cases. One is for family equalization. Without this adjustment there would be more married men than married women. Weights, mostly of males, are adjusted to 11–8 Table 11-3. Summary of SCHIP Adjustment Factor CPS month/Hispanic status 1 November Hispanic Hispanic Month in sample Month in sample 3 4 5 6 7 8 1 2 3 8/15 3 1 April Non-Hispanic 8/18 3 8/15 02 4 5 6 7 8 8/18 02 3 1 Non-Hispanic Hispanic Nonmover 1 Non-Hispanic March 2 Mover 0 2 8/15 1 8/18 1 8/15 8/18 02 8/15 8/18 02 8/15 8/18 8/15 02 8/15 1 Hispanics may be of any race. Zero weight indicates the cases are ineligible for the ADS. 3 The non-Hispanic group includes both non-Hispanic non-Whites and non-Hispanic Whites with children ≤ 18 years old. 2 give a husband and wife the same weight, while maintaining the overall age/race/sex/Hispanic control totals. The second adjustment is for members of the Armed Forces. Family equalization. The family equalization procedure categorizes adults (at least 15 years old) into seven groups based on sex and household composition: 1. Female partners in female/female unmarried partner households 2. All other civilian females 3. Married males, spouse present 4. Male partners in male/female unmarried partner households 5. Other civilian male heads 6. Male partners in male/male unmarried partner households 7. All other civilian males Three different methods, dependent on the household composition, are used to assign the ADS weight to other members of the household. The methods are 1) assigning the weight of the householder to the spouse or partner, 2) averaging the weights of the householder and partner, or 3) computing a ratio adjustment factor and multiplying the factor by the ADS weight. Armed forces. Male and female members of the Armed Forces living off post or living with their families on post are included in the ADS as long as there is at least one civilian adult living in the same household, whereas the CPS excludes all Armed Forces members. Households with no civilian adults in the household, i.e., households with all Armed Forces members, are excluded from the ADS. The weights assigned to the Armed Forces members, included in the ADS, is the same weight civilians receive through the SCHIP adjustment. Control totals, used in the secondstage factor, do not include Armed Forces members so the Armed Force members do not go through the secondstage ratio adjustment. During family equalization, a male Armed Forces member with a spouse or partner is reassigned the weight of his spouse/partner. SUMMARY Although this discussion focuses on only two CPS supplements, the HVS and the ADS, every supplement has its own unique objectives. The additional questions, edits, and imputations are tailored to each supplement’s data needs. For many supplements this also means altering the weighting procedure to reflect a different universe, account for a modified sample, or adjust for a higher rate of nonresponse. The weighting revisions discussed here for HVS and ADS are only indicative of the types of modifications that might be used for a supplement. 12–1 Chapter 12. Data Products From the Current Population Survey INTRODUCTION Information collected in the Current Population Survey (CPS) is made available by both the Bureau of Labor Statistics and the Census Bureau through broad publication programs which include news releases, periodicals, and reports. CPS-based information is also available on magnetic tapes, CD-ROM, and computer diskettes and can be obtained online through the Internet. This chapter lists many of the different types of products currently available from the survey, describes the forms in which they are available, and indicates how they can be obtained. This chapter is not intended to be an exhaustive reference for all information available from the CPS. Furthermore, given the rapid ongoing improvements occurring in computer technology, it can be expected that greater numbers of the CPS-based products will be electronically accessible in the future. BUREAU OF LABOR STATISTICS Each month, employment and unemployment data are published initially in The Employment Situation news release about 2 weeks after data collection is completed. The release includes a narrative summary and analysis of the major employment and unemployment developments together with tables containing statistics for the principal data series. The news release is also available electronically on the Internetandcanbeaccessedathttp://stats.bls.gov:80/newsrels.htm. Subsequently, more detailed statistics are published in Employment and Earnings, a monthly periodical. The detailed tables provide information on the labor force, employment, and unemployment by a number of characteristics, such as age, sex, race, marital status, industry, and occupation. Estimates of the labor force status and detailed characteristics of selected population groups not published on a monthly basis, such as Vietnam-era veterans and Hispanics1 are published every quarter. Data are also published quarterly on usual median weekly earnings classified by a variety of characteristics. In addition, the January issue of Employment and Earnings provides annual averages on employment and earnings by detailed occupational categories, union affiliation, and employee absences. About 25,000 of the monthly labor force data series plus quarterly and annual averages are maintained in LABSTAT, the BLS public database, on the Internet. They can be 1 Hispanics may be of any race. accessed from http://stats.bls.gov/datahome.htm. In most cases, these data are available from the inception of the series through the current month. Approximately 250 of the most important estimates from the CPS are presented monthly and quarterly on a seasonally adjusted basis. The CPS is used also for a program of special inquiries to obtain detailed information from particular segments or for particular characteristics of the population and labor force. About four such special surveys are made each year. The inquiries are repeated annually in the same month for some topics, including the earnings and total incomes of individuals and families (published by the Census Bureau); the extent of work experience of the population during the calendar year; the marital and family characteristics of workers; the employment of school-age youth, high school graduates and dropouts, and recent college graduates; and the educational attainment of workers. Surveys are also made periodically on subjects such as contingent workers, job tenure, displaced workers, and disabled veterans. Generally, the persons who provide information for the monthly CPS questions also answer the supplemental questions. Occasionally, the kind of information sought in the special surveys requires the respondent to be the person about whom the questions are asked. The results of these special surveys are first published as news releases and subsequently in the Monthly Labor Review or BLS reports. In addition to the regularly tabulated statistics described above, special data can be generated through the use of the CPS individual (micro) record files. These files contain records of the responses to the survey questionnaire for all individuals in the survey. While the microdata can be used simply to create additional cross-sectional detail, an important feature of their use is the ability to match the records of specific individuals at different points in time during their participation in the survey. (The actual identities of these individuals are protected on all versions of the files made available to noncensus staff.) By matching these records, data files can be created which lend themselves to some limited longitudinal analysis and the investigation of shortrun labor market dynamics. An example is the statistics on gross labor force flows, which indicate how many persons move among the labor force status categories each month. Microdata files are available for all months since January 1976 and for various months in prior years. These data are made available on magnetic tape, CD-ROM, or diskette. 12– 2 Annual averages from the CPS for the four census regions and nine divisions, the 50 states and the District of Columbia, 50 large metropolitan areas, and 17 central cities are published annually in Geographic Profile of Employment and Unemployment. Data are provided on the employed and unemployed by selected demographic and economic characteristics. Table 12– 1 provides a summary of the CPS data products available from BLS. U.S. CENSUS BUREAU The U.S. Census Bureau has been analyzing data from the Current Population Survey and reporting the results to the public for over five decades. The reports provide information on a recurring basis about a wide variety of social, demographic, and economic topics. In addition, special reports on many subjects have also been produced. Most of these reports have appeared in 1 of 3 series issued by the Census Bureau: P-20, Population Characteristics; P-23, Special Studies; and P-60, Consumer Income. Many of the reports are based on data collected as part of the March demographic supplement to the CPS. However, other reports use data from supplements collected in other months (as noted in the listing below). A full inventory of these reports as well as other related products is documented in: Subject Index to Current Population Reports and Other Population Report Series, CPR P23-192, which is available from the Government Printing Office, or the Census Bureau. Most reports have been issued in paper form; more recently, some have been made available on the Internet (http://www.census.gov). Generally, reports are announced by press release, and are released to the public via the Census Bureau Public Information Office. Census Bureau Report Series P-20, Population Characteristics. Regularly recurring reports in this series include topics such as geographic mobility, educational attainment, school enrollment (October supplement), marital status, households and families, Hispanic origin, the Black population, fertility (June supplement), voter registration and participation (November supplement), and the foreign-born population. P-23, Special Studies. Information pertaining to special topics, including one-time data collections, as well as research on methods and concepts are produced in this series. Examples of topics include computer ownership and usage, child support and alimony, ancestry, language, and marriage and divorce trends. P-60, Consumer Income. Regularly recurring reports in this series include information concerning families, individuals, and households at various income and poverty levels, shown by a variety of demographic characteristics. Other reports focus on health insurance coverage and other noncash benefits. In addition to the population data routinely reported from the CPS, Housing Vacancy Survey (HVS) data are collected from a sample of vacant housing units in the Current Population Survey (CPS) sample. Using these data, quarterly and annual statistics are produced on rental vacancy rates and home ownership rates for the United States, the four census regions, location inside and outside metropolitan areas (MAs), the 50 states and the District of Columbia, and the 75 largest MAs. Information is also made available on national home ownership rates by age of householder, family type, race, and Hispanic origin. A press release is issued each quarter as well as quarterly and annual data tables on the Internet. Supplement Data Files Public use microdata files containing supplement data are available from the Census Bureau. These files contain the full battery of basic labor force and demographic data along with the supplement data. A standard documentation package containing a record layout, source and accuracy statement, and other relevant information is included with each file. (The actual identities of the individuals surveyed are protected on all versions of the files made available to noncensus staff.) These files can be purchased through the Customer Services Branch of the Census Bureau and are available in either tape or CD-ROM format. The CPS homepage is the other source for obtaining these files. Eventually, we plan to add most historical files to the site along with all current and future files. 12– 3 Table 12– 1. Bureau of Labor Statistics Data Products From the Current Population Survey Product Description Periodicity Source Cost News Releases College Enrollment and Work Activity of High School Graduates An analysis of the college enrollment and work activity of the prior year’s high school graduates by a variety of characteristics Annual October CPS supplement Free (1) Contingent and Alternative Employment Arrangements An analysis of workers with ‘‘contingent’’ employment arrangements (lasting less than 1 year) and alternative arrangements including temporary and contract employment by a variety of characteristics Biennial January CPS supplement Free (1) Displaced Workers An analysis of workers who lost jobs in the prior 3 years due to plant or business closings, position abolishment, or other reasons by a variety of characteristics Biennial February CPS supplement Free (1) Employment Situation of Vietnam-Era Veterans An analysis of the work activity and disability status of persons who served in the Armed Forces during the Vietnam era Biennial September CPS supplement Free (1) Job Tenure of American Workers An analysis of employee tenure by industry and a variety of demographic characteristics Biennial February CPS supplement Free (1) State and Regional Unemployment An analysis of state and regional employment and unemployment Annual CPS annual averages Free (1) The Employment Situation Seasonally adjusted and unadjusted data on the Nation’s employed and unemployed workers by a variety of characteristics Monthly (2) Monthly CPS Free (1) Union Membership An analysis of the union affiliation and earnings of the Nation’s employed workers by a variety of characteristics Annual Monthly CPS; outgoing rotation groups Free (1) Usual Weekly Earnings of Wage Median usual weekly earnings of full- and part-time wage and and Salary Workers salary workers by a variety of characteristics Quarterly (3) Monthly CPS; outgoing rotation groups Free (1) Annual March CPS supplement Free (1) Work Experience of the Population An examination of the employment and unemployment experience of the population during the entire preceding calendar year by a variety of characteristics Periodicals Employment and Earnings A monthly periodical providing data on employment, unemployment, hours, and earnings for the Nation, states, and metropolitan areas Monthly (3) CPS; other surveys and programs $35.00 domestic; $43.75 foreign Monthly Labor Review A monthly periodical containing analytical articles on employment, unemployment, and other economic indicators, book reviews, and numerous tables of current labor statistics Monthly CPS; other surveys and programs $29.00 domestic; $36.25 foreign Other Publications A Profile of the Working Poor An annual report on workers whose families are in poverty by work experience and various characteristics Annual March CPS supplement Free Geographic Profile of Employment and Unemployment An annual publication of employment and unemployment data for regions, states, and metropolitan areas by a variety of characteristics Annual CPS annual averages $9.00 Issues in Labor Statistics Brief analysis of important and timely labor market issues Occasional CPS; other surveys and programs Free 12– 4 Table 12– 1. Bureau of Labor Statistics Data Products From the Current Population Survey—Con. Product Description Periodicity Source Cost Microdata Files Job Tenure and Occupational Mobility Biennial February CPS supplement (4) Displaced Workers Biennial February CPS supplement (4) Contingent Work Biennial January CPS supplement (4) Annual March CPS supplement (4) Biennial September CPS supplement (4) Annual Demographic Survey Veterans Annual October CPS supplement (4) Occasional May CPS supplement ( 4) Annual Annual CPS (4) National Labor Force Data Monthly Labstat(5) (1 ) Regional Data Monthly Labstat(5) (1 ) National Labor Force Data Monthly Microfiche Free Regional, State, and Area Data Monthly Microfiche Free School Enrollment Work Schedules/Home-Based Work Annual Earnings (outgoing rotation groups) Time Series (Macro) Files Unpublished Tabulations 1 2 3 4 5 Accessible from the Internet (http://stats.bls.gov:80/newsrels.htm). About 3 weeks following period of reference. About 5 weeks after period of reference. Diskettes ($80); cartridges ($165-$195); tapes ($215-$265); and CD-ROMs ($150). Electronic access via the Internet (http://stats.bls.gov). Note: Prices noted above are subject to change. 13–1 Chapter 13. Overview of Data Quality Concepts INTRODUCTION It is far easier to put out a figure than to accompany it with a wise and reasoned account of its liability to systematic and fluctuating errors. Yet if the figure is … to serve as the basis of an important decision, the accompanying account may be more important than the figure itself. John W. Tukey (1949, p. 9) The quality of any estimate based on sample survey data can and should be examined from two perspectives. The first is based on the mathematics of statistical science, and the second stems from the fact that survey measurement is a production process conducted by human beings. From both perspectives, survey estimates are subject to error, and to avoid misusing or reading too much into the data, we should use them only after their potential error of both sorts has been examined relative to the particular use at hand. In this chapter, we give an overview of these two perspectives on data quality, discuss their relationship to each other from a conceptual viewpoint, and define a number of technical terms. The definitions and discussion are applicable to all sample surveys, not just the Current Population Survey (CPS). Succeeding chapters go into greater detail about the specifics as they relate to CPS. QUALITY MEASURES IN STATISTICAL SCIENCE The statistical theory of finite population sampling is based on the concept of repeated sampling under fixed conditions. First, a particular method of selecting a sample and aggregating the data from the sample units into an estimate of the population parameter is specified. The method for sample selection is referred to as the sample design (or just the design). The procedure for producing the estimate is characterized by a mathematical function known as an estimator. After the design and estimator have been determined, a sample is selected and an estimate of the parameter is computed. The difference between the value of the estimate and the population parameter is referred to here as the sampling error, and it will vary from sample to sample (Särndal, Swensson, and Wretman, 1992, p. 16). Properties of the sample design-estimator methodology are determined by looking at the distribution of estimates that results from taking all possible samples that could be selected using the specified methodology. The expected (or mean) value of the squared sampling errors over all possible samples is known as the mean squared error. The mean squared error is generally accepted as the standard overall measure of the quality of a proposed designestimator methodology. The mean value of the individual estimates is referred as the expected value of the estimator. The difference between the expected value of a particular estimator and the value of the population parameter is known as sampling bias. When the bias of the estimator is zero, the estimator is said to be unbiased. The mean value of the squared difference of the values of the individual estimates and the expected value of the estimator is known as the sampling variance of the estimator. The variance measures the magnitude of the variation of the individual estimates about their expected value while the mean squared error measures the magnitude of the variation of the estimates about the value of the population parameter of interest. It can be shown that the mean squared error can be decomposed into the sum of the variance and the square of the bias. Thus, for an unbiased estimator, the variance and the mean squared error are equal. Quality measures of a design-estimator methodology expressed in this way, that is, based on mathematical expectation assuming repeated sampling, are inherently grounded on the assumption that the process is correct and constant across sample repetitions. Unless the measurement process is uniform across sample repetitions, the mean squared error is not by itself a full measure of the quality of the survey results. The assumptions associated with being able to compute any mathematical expectation are extremely rigorous and rarely practical in the context of most surveys. For example, the basic formulation for computing the true mean squared error requires that there be a perfect list of all units in the universe population of interest, that all units selected for a sample provide all the requested data, that every interviewer be a clone of an ideal interviewer who follows a predefined script exactly and interacts with all varieties of respondents in precisely the same way, and that all respondents comprehend the questions in the same way and have the same ability to recall from memory the specifics needed to answer the questions. Recognizing the practical limitations of these assumptions, sampling theorists continue to explore the implications of alternative assumptions which can be expressed in terms of mathematical models. Thus, the mathematical expression for variance has been decomposed in various 13– 2 ways to yield expressions for statistical properties that include not only sampling variance, but simple response variance (a measure of the variability among the possible responses of a particular respondent over repeated administrations of the same question) (Hansen, Hurwitz, and Bershad, 1961) and correlated response variance, one form of which is interviewer variance (a measure of the variability among responses obtained by different interviewers over repeated administrations). Similarly, when a particular design-estimator fails over repeated sampling to include a particular set of population units in the sampling frame or to ensure that all units provide the required data, bias can be viewed as having components such as coverage bias, unit nonresponse bias, or item nonresponse bias (Groves, 1989). For example, a survey administered solely by telephone could result in coverage bias for estimates relating to the total population if the nontelephone households were different than the telephone households with respect to the characteristic being measured (which almost always occurs). One common theme of these types of models is the decomposition of total mean squared error into one set of components resulting from the fact that estimates are based on a sample of units rather than the entire population (sampling error) and another set of components due to alternative specifications of procedures for conducting the sample survey (nonsampling error). (Since nonsampling error is defined negatively, it ends up being a catch-all term for all errors other than sampling error.) Conceptually, nonsampling error in the context of statistical science has both variance and bias components. However, when total mean squared error is decomposed mathematically to include a sampling error term and one or more other ‘‘nonsampling error’’ terms, it is often difficult to categorize such terms as either variance or bias. The term nonsampling error is used rather loosely in the survey literature to denote mean squared error, variance, or bias in the precise mathematical sense and to imply error in the more general sense of process mistakes (see next section). Some nonsampling error components which are conceptually known to exist have yet to be expressed in practical mathematical models. Two examples are the bias associated with the use of a particular set of interviewers and the variance associated with the selection of one of the numerous possible sets of questions. In addition, the estimation of many nonsampling errors — and sampling bias — is extremely expensive and difficult or even impossible in practice. The estimation of bias, for example, requires knowledge of the truth, which may be sometimes verifiable from records (e.g., number of hours paid for by employer) but often is not verifiable (e.g., number of hours actually worked). As a consequence, survey organizations typically concentrate on estimating the one component of total mean squared error for which practical methods have been developed — variance. Since a survey is generally conducted only once with one specific sample of units, it is impossible to compute the actual sampling variance. In simple cases, it is frequently possible to construct an unbiased estimator of variance. In the case of complex surveys like CPS, estimators have been developed which typically rely on the proposition — usually well-grounded — that the variability among estimates based on various subsamples of the one actual sample is a good proxy for the variability among all the possible samples like the one at hand. In the case of CPS, 160 subsamples or replicates are used in variance estimation for the 1990 design. (For more specifics, see Chapter 14.) It is important to note that the estimates of variance resulting from the use of this and similar methods are not merely estimates of sampling variance. The variance estimates include the effects of some nonsampling errors, such as response variance and intra-interviewer correlation. On the other hand, users should be aware of the fact that for some statistics these estimates of standard error might be significant underestimates of total error, an important consideration when making inferences based on survey data. To draw conclusions from survey data, samplers rely on the theory of finite population sampling from a repeated sampling perspective: If the specified sample design-estimator methodology were implemented repeatedly and the sample size sufficiently large, the probability distribution of the estimates would be very close to a normal distribution. Thus, one could safely expect 90 percent of the estimates to be within two standard errors of the mean of all possible sample estimates (standard error is the square root of the estimate of variance) (Gonzalez et al., 1975; Moore, 1997). However, one cannot claim that the probability is .90 that the true population value falls in a particular interval. In the case of a biased estimator due to nonresponse, undercoverage, or other types of nonsampling error, confidence intervals may not cover the population parameter at the desired 90 percent rate. In such cases, a standard error estimator may indirectly account for some elements of nonsampling error in addition to sampling error and lead to confidence intervals having greater than the nominal 90 percent coverage. On the other hand, if the bias is substantial, confidence intervals can have less than the desired coverage. QUALITY MEASURES IN STATISTICAL PROCESS MONITORING The process of conducting a survey includes numerous steps or components, such as defining concepts, translating concepts into questions, selecting a sample of units from what may be an imperfect list of population units, hiring and training interviewers to ask persons in the sample unit the questions, coding responses to questions into predefined categories, and creating estimates which take into account the fact that not everyone in the population of interest had a chance to be in the sample and not all of those in the sample elected to provide responses. It is a 13– 3 process where the possibility exists at each step of making a mistake in process specification and in deviating during implementation from the predefined specifications. For example, we now recognize that the initial labor force question used in CPS for many years (‘‘What were you doing most of last week--...’’) was problematic to many respondents (see Chapter 6). Moreover, many interviewers tailored their presentation of the question to particular respondents, for example, saying ‘‘What were you doing most of last week—working, going to school, etc.?’’ if the respondent was of school age. Having a problematic question is a mistake in process specification; varying question wording in a way not prespecified is a mistake in process implementation. Errors or mistakes in process contribute to nonsampling error in that they would contaminate results even if the whole population were surveyed. Parts of the overall survey process which are known to be prone to deviations from the prescribed process specifications and thus could be potential sources of nonsampling error in the CPS are discussed in Chapter 15, along with the procedures put in place to limit their occurrence. A variety of quality measures have been developed to describe what happens during the survey process. These measures are vital for managers and staff working on a survey as reflecting process quality, but they can also be useful to users of the various products of the survey process — both individual responses and their aggregations into statistics, since they can aid in determining a particular product’s potential limitations and whether it is appropriate for the task at hand. Chapter 16 contains a discussion of quality indicators and, in a few cases, their potential relationship to nonsampling errors. SUMMARY The quality of estimates made from any survey, including CPS, is a function of innumerable decisions made by designers and implementers. As a general rule of thumb, designers make decisions aimed at minimizing mean squared error within given cost constraints. Practically speaking, statisticians are often compelled to make decisions on sample designs and estimators based on variance alone; however, in the case of CPS, the availability of external population estimates and data on rotation group bias makes it possible to do more than that. Designers of questions and data collection procedures tend to focus on limiting bias and assume the specification of exact question wording and ordering will naturally limit the introduction of variance. Whatever the theoretical focus of the designers, the accomplishment of the goal is heavily dependent upon those responsible for implementing the design. Implementers of specified survey procedures, like interviewers and respondents, are presumably concentrating on doing the best job they know how to do given time and knowledge constraints. Consequently, process monitoring through quality indicators, such as coverage and response rates, is necessary to determine when additional training or revisions in process specification are needed. Continuing process improvement must be a vital component of survey management if the quality goals set by designers are to be achieved. REFERENCES Gonzalez, M. E., Ogus, J. L., Shapiro, G., and Tepping, B. J. (1975), ‘‘Standards for Discussion and Presentation of Errors in Survey and Census Data,’’ Journal of the American Statistical Association, 70, No. 351, Part II, 5-23. Groves, R. M. (1989), Survey Errors and Survey Costs, New York: John Wiley & Sons. Hansen, M. H., Hurwitz, W. N., and Bershad, M. A. (1961), ‘‘Measurement Errors in Censuses and Surveys,’’ Bulletin of the International Statistical Institute, 38(2), pp. 359-374. Moore, D. S. (1997), Statistics Concepts and Controversies, 4th Edition, New York: W. H. Freeman. Särndal, C., Swensson, B., and Wretman, J. (1992), Model Assisted Survey Sampling, New York: SpringerVerlag. Tukey, J. W. (1949), ‘‘Memorandum on Statistics in the Federal Government,’’ American Statistician, 3, No. 1, pp. 6-17; No. 2, pp. 12-16. 14–1 Chapter 14. Estimation of Variance INTRODUCTION The following two objectives are considered in estimating variances of the major statistics of interest for the Current Population Survey (CPS): 1. Estimate the variance of the survey estimates for use in various statistical analyses. 2. Evaluate the effect of each of the stages of sampling and estimation on the overall precision of the survey estimates for the evaluation of the survey design. CPS variance estimates take into account the magnitude of the sampling error as well as the effects of some nonsampling errors, such as response variance and intrainterviewer correlation. Certain aspects of the CPS sample design, such as the use of one sample PSU per nonselfrepresenting stratum and the use of systematic sampling within PSUs, make it impossible to obtain a completely unbiased estimate of the total variance. The use of ratio adjustments in the estimation procedure also contributes to this problem. Although imperfect, the current variance estimation procedure is accurate enough for all practical uses of the data, as well as reflecting the effects of the separate stages of sample selection and estimation on the total variance. Variance estimates of selected characteristics and tables, which also show the effects of estimation steps on variances, are presented at the end of this chapter. VARIANCE ESTIMATES BY THE REPLICATION METHOD Replication methods are able to provide satisfactory estimates of variance for a wide variety of designs using probability sampling, even when complex estimation procedures are used. This method requires that the sample selection, the collection of data, and the estimation procedures be independently carried through (replicated) several times. The dispersion of the resulting estimates can be used to measure the variance of the full sample. The Method Obviously, one would not consider repeating the entire Current Population Survey several times each month simply to obtain variance estimates. A practical alternative is to draw a set of random subsamples from the full sample surveyed each month, using the same principles of selection as are used for the full sample, and to apply the regular CPS estimation procedures to these subsamples. We refer to these subsamples as replicates. The number of replicates to use is based on a trade-off between cost and the reliability of the variance estimator; that is, increasing the number of replicates decreases the variance of the variance estimator. Prior to the design introduced after the 1970 census, variance estimates were computed using 40 replicates. The replicates were subjected to only the second-stage ratio adjustment for the same age-sex-race categories used for the full sample at the time. The noninterview and first-stage ratio adjustments were not replicated. Even with these simplifications, limited computer capacity allowed for the computation of variances for only 14 characteristics. For the 1970 design, an adaptation of the Keyfitz method of calculating variances was used. These variance estimates were derived using the Taylor approximation, dropping terms with derivatives higher than the first. By 1980, improvements in computer memory capacity allowed for the calculation of variance estimates for many characteristics with replication of all stages of the weighting through compositing. Note that the seasonal adjustment has not been replicated. A study from an earlier design indicated that the seasonal adjustment of CPS estimates had relatively little impact on the variances; however, it is not known what impact this adjustment would have on the current design variances. Starting with the 1980 design, variances were computed using a modified balanced half-sample approach. The sample was divided to form 48 replicates that retained all the features of the sample design, for example, the stratification and the within-PSU sample selection. For total variance, a pseudo-first-stage design was imposed on the CPS by dividing large self-representing (SR) PSUs into smaller areas called Standard Error Computation Units (SECUs) and combining small nonself-representing (NSR) PSUs into paired strata or pseudostrata. One NSR PSU was selected randomly from each pseudostratum for each replicate. Forming these pseudostrata was necessary since the first stage of the sample design has only one NSR PSU per stratum in sample. However, pairing the original strata for variance estimation purposes creates an upward bias in the variance estimator. For self-representing PSUs each SECU was divided into two panels, and one panel was selected for each replicate. One column of a 48 by 48 Hadamard orthogonal matrix was assigned to each SECU or pseudostratum. The unbiased weights were multiplied by replicate factors of 1.5 for the selected panel and 0.5 for 14–2 the other panel in the SR SECU or NSR pseudostratum (Fay, Dippo, and Morganstein, 1984). Thus the full sample was included in each replicate, but differing weights for the half-samples were determined by the matrix. These 48 replicates were processed through all stages of the CPS weighting through compositing. The estimated variance for the characteristic of interest was computed by summing a squared difference between each replicate estimate (Ŷr) and the full sample estimate (Ŷ0) The complete formula1 is Var(Ŷ0) = 4 48 48 兺(Ŷr - Ŷ0)2. r=1 Due to costs and computer limitations, variance estimates were calculated for only 13 months (January 1987 through January 1988) and for about 600 estimates at the national level. Replication estimates of variances at the subnational level were not reliable because of the small number of SECUs available (Lent, 1991). Based on the 13 months of variance estimates, generalized sampling errors (explained later) were calculated. (See Wolter 1985 or Fay 1984, 1989 for more details on half-sample replication for variance estimation.) METHOD FOR ESTIMATING VARIANCE FOR 1990 DESIGN The general goal of the current variance estimation methodology, the method in use since July 1995, is to produce consistent variances and covariances for each month over the entire life of the design. Periodic maintenance reductions in the sample size and the continuous addition of new construction to the sample complicated the strategy needed to achieve this goal. However, research has shown that variance estimates are not adversely affected as long as the cumulative effect of the reductions is less than 20 percent of the original sample size (Kostanich, 1996). Assigning all future new construction sample to replicates when the variance subsamples are originally defined provides the basis for consistency over time in the variance estimates. The current approach to estimating the 1990 design variances is called successive difference replication. The theoretical basis for the successive difference method was discussed by Wolter (1984) and extended by Fay and Train (1995) to produce the successive difference replication method used for the CPS. The following is a description of the application of this method. Successive USUs2 (ultimate sampling units) formed from adjacent hit strings (see Chapter 3) are paired in the order of their selection to take advantage of the systematic nature of the CPS within-PSU sampling scheme. Each USU usually occurs in two consecutive pairs; for example, (USU1, USU2), (USU2, USU3), (USU3, USU4), etc. A pair then is similar to a SECU in the 1980 design variance methodology. For each USU within a PSU, two pairs (or SECUs) of neighboring USUs are defined based on the order of selection — one with the USU selected before and one with the USU selected after it. This procedure allows USUs adjacent in the sort order to be assigned to the same SECU, thus better reflecting the systematic sampling in our variance estimator. Also, the large increase in the number of SECUs and in the number of replicates (160 vs. 48) over the 1980 design increases the precision of the variance estimator. Replicate Factors for Total Variance Total variance is composed of two types of variance, the variance due to sampling of housing units within PSUs (within-PSU variance) and the variance due to the selection of a subset of all NSR PSUs (between-PSU variance). Replicate factors are calculated using a 160 by 1603 Hadamard orthogonal matrix. To produce estimates of total variance, replicates are formed differently for SR and NSR sample. Note that between-PSU variance cannot be estimated directly using this methodology. Rather, it is the difference between the estimates of total variance and within-PSU variance. NSR strata are combined into pseudostrata within each state, and one NSR PSU from the pseudostratum is randomly assigned to each panel of the replicate as in the 1980 design variance methodology. Replicate factors of 1.5 or 0.5 adjust the weights for the NSR panels. These factors are assigned based on a single row from the Hadamard matrix and are further adjusted to account for the unequal sizes of the original strata within the pseudostratum (Wolter, 1985). In most cases these pseudostrata consist of a pair of strata except where an odd number of strata within a state requires that a triplet be formed. In this case, two rows from the Hadamard matrix are assigned to the pseudostratum resulting in replicate factors of about 0.5, 1.7, and 0.8; or 1.5, 0.3, and 1.2 for the three PSUs. All USUs in a pseudostratum are assigned the same row number(s). For SR sample, two rows of the Hadamard matrix are assigned to each pair of USUs creating replicate factors, fr for r = 1,...,160 3 fir ⫽ 1 ⫹ 共2兲 1 Usually balanced half-sample replication uses replicate factors of 2 and 0 with the formula, 1 k Var(Ŷo) = 兺(Ŷr - Ŷo)2. k r=1 where k is the number of replicates. The factor of 4 in our variance estimator is the result of using replicate factors of 1.5 and 0.5. 2 An ultimate sampling unit is usually a group of four neighboring housing units. ⫺ 2 3 ai⫹1,r ⫺ 共2兲 ⫺ 2 ai⫹2,r where ai,r equals a number in the Hadamard matrix (+1 or -1) for the ith USU in the systematic sample. This formula yields replicate factors of approximately 1.7, 1.0, or 0.3. 3 Rows 1 and 81 have been dropped from the matrix. 14–3 As in the 1980 methodology, the unbiased weights (baseweight x special weighting factor) are multiplied by the replicate factors to produce unbiased replicate weights. These unbiased replicate weights are further adjusted through the noninterview adjustment, the first-stage ratio adjustment, the second-stage ratio adjustment, and compositing just as the full sample is weighted. A variance estimator for the characteristic of interest is a sum of squared differences between each replicate estimate (Ŷr) and the full sample estimate (Ŷo) The formula is Var(Ŷo) = 4 160 160 r=1 兺 (Ŷr - Ŷo)2. Note that the replicate factors 1.7, 1.0, and 0.3 for the self-representing portion of the sample were specifically constructed to yield ‘‘4’’ in the above formula in order that the formula remains consistent between SR and NSR areas (Fay and Train, 1995). Replicate Factors for Within-PSU Variance The above variance estimator can also be used for within-PSU variance. The same replicate factors used for total variance are applied to SR sample. For NSR sample, alternate row assignments are made for USUs to form pairs of USUs in the same manner that was used for the SR assignments. Thus for within-PSU variance all USUs (both SR and NSR) have replicate factors of approximately 1.7, 1.0, or 0.3. The successive difference replication method is used to calculate total national variances and within-PSU variances for some states and metropolitan areas. Improved reliability for within-PSU subnational variance estimates is expected due to the increased number of SECUs over the 1980 design. The reliability of between-PSU variance estimates for subnational variance estimates (e.g., state estimates) and the reliability of national estimates of variance should be about the same as for the 1980 design. For more detailed information regarding the formation of replicates, see the internal Census Bureau memorandum (Gunlicks, 1996). VARIANCES FOR STATE AND LOCAL AREA ESTIMATES For estimates at the national level, total variances are estimated from the sample data by the successive difference replication method previously described. For local areas that are coextensive with one or more sample PSUs, a variance estimator can be derived using the methods of variance estimation used for the SR portion of the national sample. It is anticipated that these estimates of variance will be more reliable than the 1980 procedure for areas with comparable sample sizes due to the change in methodology. However, estimates for states and areas which have substantial contributions from NSR sample areas have variance estimation problems that are much more difficult to resolve. Complicating Factors for State Variances Most states contain a small number of NSR sample PSUs. Pairing them into pseudostrata reduces the number of NSR SECUs still further and increases reliability problems. Also, the component of variance resulting from sampling PSUs can be more important for state estimates than for national estimates in states where the proportion of the population in NSR strata is larger than the national average. Further, creating pseudostrata for variance estimation purposes introduces a between-stratum variance component that is not in the sample design, causing overestimation of the true variance. The between-PSU variance which includes the between-stratum component is relatively small at the national level for most characteristics, but it can be much larger at the state level (Gunlicks, 1993; Corteville, 1996). Thus, this additional component should be accounted for when estimating state variances. Research is in progress to produce improved state and local variances utilizing the within-PSU variances obtained from successive difference replication and perhaps some modeling techniques. These variances will be provided as they become available. GENERALIZING VARIANCES With some exceptions, the standard errors provided with published reports and public data files are based on generalized variance functions (GVFs). The GVF is a simple model that expresses the variance as a function of the expected value of the survey estimate. The parameters of the model are estimated using the direct replicate variances discussed above. These models provide a relatively easy way to obtain an approximate standard error on numerous characteristics. Why Generalized Standard Errors Are Used It would be possible to compute and show an estimate of the standard error based on the survey data for each estimate in a report, but there are a number of reasons why this is not done. A presentation of the individual standard errors would be of limited use, since one could not possibly predict all of the combinations of results that may be of interest to data users. Also, for estimates of differences and ratios that users may compute, the published standard errors would not account for the correlation between the estimates. Most importantly, variance estimates are based on sample data and have variances of their own. The variance estimate for a survey estimate for a particular month generally 14–4 has less precision than the survey estimate itself. This means that the estimates of variance for the same characteristic may vary considerably from month-to-month or for related characteristics (that might actually have nearly the same level of precision) in a given month. Therefore, some method of stabilizing these estimates of variance, for example, by generalization or by averaging over time, is needed to improve their reliability. Experience has shown that certain groups of CPS estimates have a similar relationship between their variance and expected value. Modeling or generalization may provide more stable variance estimates by taking advantage of these similarities. The Generalization Method The GVF that is used to estimate the variance of an estimated population total, X, is of the form Var共X̂) ⫽ aX2 ⫹ bX 共14.1兲 where a and b are two parameters estimated using least squares regression. The rationale for this form of the GVF model is the assumption that the variance of X̂ can be expressed as the product of the variance from a simple random sample for a binomial random variable and a ‘‘design effect.’’ The design effect (deff) accounts for the effect of a complex sample design relative to a simple random sample. Defining P = X/N as the proportion of the population having the characteristic X, where N is the population size, and Q = 1-P, the variance of the estimated total X̂, based on a sample of n individuals from the population, is Var共X̂兲 ⫽ N2PQ共deff兲共14.2兲 n In generalizing variances, all estimates that follow a common model such as 14.1 (usually the same characteristics for selected demographic or geographic subgroups) are grouped together. This should give us estimates in the same group that have similar design effects. These design effects incorporate the effect of the estimation procedures, particularly the second stage, as well as the effect of the sample design. In practice, the characteristics should be clustered similarly by PSU, by USU, and among persons within housing units. For example, estimates of total persons classified by a characteristic of the housing unit or of the household, such as the total urban population, number of recent migrants, or persons of Hispanic4 origin, would tend to have fairly large design effects. The reason for this is that these characteristics usually appear among all persons in the sample household and often among all households in the USU as well. On the other hand, lower design effects would result for estimates of labor force status, education, marital status, or detailed age categories, since these characteristics tend to vary among members of the same household and among households within a USU. Notice also that for many subpopulations of interest, N is a control total used in the second-stage ratio adjustment. In these subpopulations, as X approaches N, the variance of X approaches zero, since the second-stage ratio adjustment guarantees that these sample population estimates match independent population controls5 (Chapter 10). The GVF model satisfies this condition. This generalized variance model has been used since 1947 for the CPS and its supplements, although alternatives have been suggested and investigated from time to time (Valliant, 1987). The model has been used to estimate standard errors of means or totals. Variances of estimates based on continuous variables (e.g., aggregate expenditures, amount of income, etc.) would likely fit a different functional form better. The parameters, a and b, are estimated by use of the model for relative variance This can be written as Var(X̂) ⫽ ⫺ 共deff兲 ( )( ) N X2 n N ⫹ 共deff兲 () N n Vx2 ⫽ a ⫹ X. Letting a⫽⫺ b N and b⫽ 共deff兲N n gives the functional form Var共X̂兲 ⫽ aX2 ⫹ bX. We choose a⫽ ⫺ b X , where the relative variance (Vx2) is the variance divided by the square of the expected value of the estimate. The a and b parameters are estimated by fitting a model to a group of related estimates and their estimated relative variances. The relative variances are calculated using the successive difference replication method. The model fitting technique is an iterative weighted least squares procedure, where the weight is the inverse of the square of the predicted relative variance. The use of these weights prevents items with large relative variances from unduly influencing the estimates of the a and b parameters. b N where N is a control total so that the variance will be zero when X = N. 4 Hispanics may be of any race. The variance estimator assumes no variance on control totals, even though they are estimates. 5 14–5 Usually at least a year’s worth of data are used in this model fitting process and each group of items should comprise at least 20 characteristics with their relative variances, although occasionally fewer characteristics are used. Direct estimates of relative variances are required for estimates covering a wide range, so that observations are available to ensure a good fit of the model at high, low, and intermediate levels of the estimates. It must be remembered that, by using a model to estimate the relative variance of an estimate in this way, we introduce some error, since the model may substantially and erroneously modify some legitimately extreme values. Generalized variances are computed for estimates of month-to-month change as well as for estimates of monthly levels. Periodically, the a and b parameters are updated to reflect changes in the levels of the population totals or changes in the ratio (N/n) which result from sample reductions. This can be done without recomputing direct estimates of variances as long as the sample design and estimation procedures are essentially unchanged (Kostanich, 1996). How the Relative Variance Function is Used After the parameters a and b of expression (14.1) are determined, it is a simple matter to construct a table of standard errors of estimates for publication with a report. In practice, such tables show the standard errors that are appropriate for specific estimates, and the user is instructed to interpolate for figures not explicitly shown in the table. However, many reports present a list of the parameters, enabling data users to compute generalized variance estimates directly. A good example is a recent monthly issue of Employment and Earnings (U.S. Department of Labor) from which the following table was taken. Example: The approximate standard error, sXˆ , of an estimated monthly level X̂ can be obtained with a and b from the above table and the formula sXˆ ⫽ 公aX̂2 ⫹ bX̂ Assume that in a given month there are an estimated 6 million unemployed men in the civilian labor force (X̂ = 6,000,000). From Table 14–1 a = -0.000015749 and b = 2464.91 , so sx ⫽ 公共⫺0.000015749兲共6,000,000兲2 ⫹ 共2464.91兲共6,000,000兲 ⬇ 119,000 An approximate 90-percent confidence interval for the monthly estimate of unemployed men is between 5,810,000 and 6,190,000 [or 6,000,000 ± 1.6(119,000)]. VARIANCE ESTIMATES TO DETERMINE OPTIMUM SURVEY DESIGN The sample design and the estimation procedures used in the CPS have changed many times since the start of the Table 14–1. Parameters for Computation of Standard Errors for Estimates of Monthly Levels1 Characteristic Unemployment: Total or White. . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . a b −0.000015749 0.000191460 −0.000098631 2464.91 2621.89 2704.53 1 Parameters reflect variances for 1995. See Appendix H for variance estimates since January 1996. 2 Hispanics may be of any race. survey. These changes result from a continuing program of research and development with the objective of optimizing the use of the techniques and resources which are available at a given time. Changes also occur when reliability requirements are revised. Estimates of the components of variance attributable to each of the several stages of sampling and the effect of the separate steps in the estimation procedure on the variance of the estimate are needed to develop an efficient sample design. The tables which follow show variance estimates computed, using replication methods, by type (total and withinPSU), and by stage of estimation. Averages over 6 months have been used to improve the reliability of the estimated monthly variances. The 6-month period, July-December 1995, was used for estimation because the sample design was unchanged throughout the period. Appendix H provides similar information on variances and components of variance after the January 1996 sample reduction. Variance Components Due to Stages of Sampling Table 14–2 indicates, for the first- and second-stage combined (FSC) estimate, how the several stages of sampling affect the total variance of each of the given characteristics. The FSC estimate is the estimate after the first- and second-stage ratio adjustments are applied. Within-PSU variance and total variance are computed as described earlier in this chapter. Between-PSU variance is estimated by subtracting the within-PSU variance from the total variance. Due to variation of the variance estimates, the between-PSU variance is sometimes negative. The far right two columns of Table 14–2 show the percentage within-PSU variance and the percentage between-PSU variance in the total variance estimate. For all characteristics shown in Table 14–2, the proportion of the total variance due to sampling housing units within PSUs (within-PSU variance) is larger than that due to sampling a subset of NSR PSUs (between-PSU variance). In fact, for most of the characteristics shown, the within-PSU component accounts for over 90 percent of the total variance. For civilian labor force and not in labor force characteristics, almost all of the variance is due to sampling housing units within PSUs. For the total and White employed in agriculture, the within-PSU component still 14–6 Table 14–2. Components of Variance for FSC Monthly Estimates [Monthly averages: July - Dec. 1995] FSC1 estimate (x 106) Standard error (x 105) Coefficient of variation (percent) Unemployed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.30 5.31 1.57 1.15 1.35 1.38 1.19 0.68 0.63 0.57 Employed - Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.47 3.23 0.10 0.60 0.29 Employed - Nonagriculture. . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Civilian-noninstitutional population 16 years old and over Percent of total variance Within Between 1.89 2.24 4.32 5.51 4.25 98.4 98.1 100.4 94.2 98.7 1.6 1.9 −0.4 5.8 1.3 1.49 1.43 0.19 0.73 0.28 4.30 4.42 18.52 12.27 9.50 63.2 66.1 105.9 83.4 97.0 36.8 33.9 −5.9 16.6 3.0 122.34 104.05 13.29 10.72 6.40 3.34 3.05 1.30 1.43 0.96 0.27 0.29 0.98 1.34 1.50 92.8 94.4 100.3 86.3 108.1 7.2 5.6 −0.3 13.7 −8.1 Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.11 112.58 14.97 12.47 8.04 2.89 2.61 1.18 1.09 0.98 0.22 0.23 0.79 0.87 1.22 99.0 99.3 99.9 99.2 103.5 1.0 0.7 0.1 0.8 −3.5 Not in labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65.96 54.67 8.37 6.31 6.61 2.89 2.61 1.18 1.09 1.00 0.44 0.48 1.41 1.73 1.52 99.0 99.3 99.9 99.2 101.5 1.0 0.7 0.1 0.8 −1.5 1 First- and second-stage combined (FSC). Hispanics may be of any race. 2 accounts for 60 to 70 percent of the total variance, while the between-PSU component accounts for the remaining 30 to 40 percent. The between-PSU component is also larger (between 10 and 20 percent) for employed in agriculture, Hispanic origin, and employed in nonagriculture, Hispanic origin than for other characteristics shown in the table. Because the FSC estimate of total civilian labor force and total not in labor force must add to the independent population controls (which are assumed to have no variance), the standard errors and variance components for these estimated totals are the same. They are not the same for the teenage category because the estimates do not completely converge to the control totals during the secondstage ratio adjustment. TOTAL VARIANCES AS AFFECTED BY ESTIMATION Table 14–3 shows how the separate estimation steps affect the variance of estimated levels by presenting ratios of relative variances. It is more instructive to compare ratios of relative variances than the variances themselves, since the various stages of estimation can affect both the level of an estimate and its variance (Hanson, 1978; Train, Cahoon, and Makens, 1978). The unbiased estimate uses the baseweight with special weighting factors applied. The noninterview estimate includes the baseweights, the special weighting adjustment, and the noninterview adjustment. The first- and second-stage ratio adjustments are used in the estimate for the first- and second-stage combined (FSC). In Table 14–3, the figures for unemployed show, for example, that the relative variance of the FSC estimate of level is 3.590 x 10-4 (equal to the square of the coefficient of variation in Table 14–2). The relative variance of the unbiased estimate for this characteristic would be 1.06 times as large. If the noninterview stage of estimation is also included, the relative variance is only 1.05 times the size of the relative variance for the FSC estimate of level. Including the first stage of estimation raises the relative variance factor to 1.13. The relative variance for total unemployed after applying the second stage without the first stage is about the same as the relative variance that results from applying the first- and second-stage adjustments. The relative variance as shown in the last column of this table illustrates that the first-stage ratio adjustment has little effect on the variance of national level characteristics in the context of the overall estimation process. The first-stage adjustment was implemented in order to reduce variances of state-level estimates. Its actual effect on these state-level estimates has not yet been examined. The second-stage adjustment, however, appears to greatly reduce the total variance, as intended. This is 14–7 Table 14–3. Effects of Weighting Stages on Monthly Relative Variance Factors [Monthly averages: July - Dec. 1995] Relative variance factor1 Relative variance of FSC estimate of level (x 10–4) Unbiased estimator NI2 NI & FS3 NI & SS4 Unemployed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.590 5.004 18.626 30.347 18.092 1.06 1.11 1.30 1.14 1.13 1.05 1.11 1.28 1.14 1.13 1.13 1.16 1.28 1.19 1.14 1.00 1.00 1.00 1.00 0.99 Employed - Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.460 19.514 342.808 150.548 90.235 0.98 0.98 1.03 1.01 1.08 0.97 0.97 1.02 0.98 1.08 1.04 1.03 1.04 1.06 1.10 0.96 0.96 1.01 0.95 1.00 Employed - Nonagriculture . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.075 0.086 0.957 1.783 2.264 4.71 5.54 6.73 3.19 2.03 4.47 5.44 6.60 3.18 2.00 6.85 7.55 6.61 3.27 2.05 0.96 0.97 1.00 0.95 1.00 Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.047 0.054 0.623 0.764 1.497 7.28 8.96 9.87 7.03 2.69 6.84 8.74 9.62 6.95 2.64 10.98 12.33 9.64 7.41 2.74 0.99 0.99 1.01 0.99 0.99 Not in labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.192 0.228 1.994 2.984 2.312 2.79 3.22 4.21 2.99 2.04 2.67 3.16 4.11 2.92 2.03 4.15 4.50 3.88 3.09 2.23 0.99 0.99 1.01 0.99 1.00 Civilian-noninstitutional population 16 years old and over Unbiased estimator with– 1 Relative variance factor is the ratio of the relative variance of the specified level to the relative variance of the FSC level. NI = Noninterview. 3 FS = First-stage. 4 SS = Second-stage. 5 Hispanics may be of any race. 2 especially true for characteristics that comprise high proportions of age, sex, or race/ethnicity subclasses, such as White, Black, or Hispanic persons in the civilian labor force or employed in nonagricultural industries. Without the second stage, the relative variances of these characteristics would be 5, 6, or even 11 to 12 times as large. For smaller groups, such as unemployed and employed in agriculture, the second-stage adjustment does not have as dramatic an effect. After the second-stage ratio adjustment, a composite estimator is used to improve estimates of month-to-month change by taking advantage of the 75 percent of the total sample that continues from the previous month (see Chapter 10). Table 14–4 compares the variance and relative variance of the composited estimates of level to those of the FSC estimates. For example, the estimated variance of the composited estimate of unemployed persons of Hispanic origin is 3.659 x 109. The variance factor for this characteristic is 0.92, implying that the variance of the composited estimate is 92 percent of the variance of the estimate after the first and second stages. The relative variance, which takes into account the estimate of the number of people with this characteristic, is 0.94 times the size of the relative variance of the FSC estimate. The two factors are similar for most characteristics, indicating that compositing tends to have a small effect on the level of most estimates. DESIGN EFFECTS Table 14-5 shows the design effects for the total variance for selected labor force characteristics. A design effect (deff) is the ratio of the variance from complex sample design or a sophisticated estimation method to the variance of a simple random sample design. The design effects in this table were computed by solving equation 14.2 for deff and replacing N/n in the formula with an estimate of the national sampling interval. Estimates of P and Q were obtained from the 6 months of data. For unemployed, the design effect is 1.314 for the uncomposited (FSC) estimate and 1.229 for the composited estimate. This means that, for the same number of sample cases, the design of the CPS (including the sample selection, weighting, and compositing) increases the variance by about 23 percentage points over the variance of an unbiased estimate based on a simple random sample. On the other hand, for the civilian labor force the design of the CPS decreases the variance by about 24 percentage points. Note that the design effects for composited estimates are generally lower than those for the FSC estimates indicating again the tendency of the compositing to reduce the variance of most estimates. 14–8 Table 14–4. Effect of Compositing on Monthly Variance and Relative Variance Factors [Monthly averages: July - Dec. 1995] Variance of composited estimate of level (x 109) Variance factor1 Relative variance factor2 Unemployed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.690 13.003 4.448 3.659 3.279 0.93 0.93 0.97 0.92 1.00 0.94 0.94 1.00 0.94 1.01 Employed - Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.789 18.781 0.302 4.916 0.713 0.93 0.93 0.85 0.91 0.92 0.94 0.93 0.95 0.88 0.93 Employed - Nonagriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.359 79.145 14.489 18.466 8.508 0.85 0.85 0.85 0.90 0.92 0.84 0.85 0.85 0.90 0.92 Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69.497 58.329 12.037 10.448 8.607 0.83 0.85 0.86 0.88 0.89 0.83 0.85 0.87 0.88 0.89 Not in labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69.497 58.329 12.037 10.448 9.004 0.83 0.85 0.86 0.88 0.89 0.83 0.85 0.85 0.88 0.88 Civilian-noninstitutional population 16 years old and over 1 Variance factor is the ratio of the variance of a composited estimate to the variance of an FSC estimate. Relative variance factor is the ratio of the relative variance of a composited estimate to the relative variance of an FSC estimate. 3 Hispanics may be of any race. 2 Note: Composite formula constants used: K = 0.4 and A = 0.2. Table 14–5. Design Effects for Total Monthly Variances [Monthly averages: July - Dec. 1995] Design effects for total variance Civilian-noninstitutional population 16 years old and over After first and second stages After compositing Unemployed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.314 1.318 1.424 1.689 1.183 1.229 1.227 1.403 1.574 1.191 Employed - Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.153 3.091 1.680 4.370 1.278 2.958 2.875 1.516 3.903 1.188 Employed - Nonagriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.144 0.904 0.659 0.976 0.723 0.966 0.770 0.564 0.880 0.664 Civilian labor force. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.913 0.672 0.487 0.491 0.606 0.760 0.576 0.421 0.433 0.540 Not in labor force. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.913 0.829 0.841 0.939 0.763 0.760 0.709 0.723 0.824 0.679 1 Hispanics may be of any race. 14–9 REFERENCES Corteville, J. (1996), ‘‘State Between-PSU Variances and Other Useful Information for the CPS 1990 Sample Design (VAR90-16),’’ Memorandum for Documentation, March 6th, Demographic Statistical Methods Division, U. S. Census Bureau. Fay, R.E., (1984) ‘‘Some Properties of Estimates of Variance Based on Replication Methods,’’ Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 495-500. Fay, R. E., (1989) ‘‘Theory and Application of Replicate Weighting for Variance Calculations,’’ Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 212-217. Fay, R., Dippo, C., and Morganstein, D. (1984), ‘‘Computing Variances From Complex Samples With Replicate Weights,’’ Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 489-494. Fay, R., and Train, G. (1995), ‘‘Aspects of Survey and Model-Based Postcensal Estimation of Income and Poverty Characteristics for States and Counties,’’ Proceedings of the Section on Government Statistics, American Statistical Association, pp. 154-159. Gunlicks, C. (1993), ‘‘Overview of 1990 CPS PSU Stratification (S_S90_DC-11),’’ February 24, Memorandum for Documentation, Demographic Statistical Methods Division, U.S. Census Bureau. Gunlicks, C. (1996), ‘‘1990 Replicate Variance System (VAR90-20),’’ Internal Memorandum for Documentation, June 4th, Demographic Statistical Methods Division, U.S. Census Bureau. Hanson, R. H. (1978), The Current Population Survey: Design and Methodology, Technical Paper 40, Washington, DC: Government Printing Office. Kostanich, D. (1996), ‘‘Proposal for Assigning Variance Codes for the 1990 CPS Design (VAR90-22),’’ Internal Memorandum to BLS/Census Variance Estimation Subcommittee, June 17th, Demographic Statistical Methods Division, U.S. Census Bureau. Kostanich, D., (1996), ‘‘Revised Standard Error Parameters and Tables for Labor Force Estimates: 1994-1995 (VAR80-6),’’ Internal Memorandum for Documentation, February 23rd, Demographic Statistical Methods Division, U.S. Census Bureau. Lent, J. (1991), ‘‘Variance Estimation for Current Population Survey Small Area Estimates,’’ Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 11-20. Train, G., Cahoon, L., and Makens, P. (1978), ‘‘The Current Population Survey Variances, Inter-Relationships, and Design Effects,’’ Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 443-448. U.S. Dept. of Labor, Bureau of Labor Statistics, Employment and Earnings, 42, p.152, Washington, DC: Government Printing Office. Valliant, R. (1987), ‘‘Generalized Variance Functions in Stratified Two-Stage Sampling,’’ Journal of the American Statistical Association, 82, pp. 499-508. Wolter, K. (1984), ‘‘An Investigation of Some Estimators of Variance for Systematic Sampling,’’ Journal of the American Statistical Association, 79, pp. 781-790. Wolter, K. (1985), Introduction to Variance Estimation, New York: Springer-Verlag. 15–1 Chapter 15. Sources and Controls on Nonsampling Error INTRODUCTION Nonsampling error can enter the survey process at any point or stage, and many of these errors are not readily identifiable. Nevertheless, the presence of these errors can affect both the bias and variance components of the total survey error. The effect of nonsampling error on the estimates is difficult to measure accurately. For this reason, the most appropriate strategy is to examine the potential sources of nonsampling error and take steps to prevent these errors from entering the survey process. This chapter discusses the various sources of nonsampling error and the measures taken to control their presence in the Current Population Survey (CPS). Many sources of nonsampling error were studied and well-charted in the pre-January 1994 CPS (e.g., see Brooks and Bailar, 1978; McCarthy, 1978), but the full extent of nonsampling error in the post-January 1994 CPS is not nearly so well examined. It is also unclear how much previous research on nonsampling error in the CPS is applicable to the redesigned instrument with computerized data collection. Although the intent of the new questionnaire and automation was to reduce nonsampling error, new sources of error may emanate from these methodological advances. Some potential sources of nonsampling error can be investigated with the available CPS and reinterview data, but there is only about 2 years of data available for analyses and comparisons at the time of this writing, and this may not be sufficient for a full analysis. Nevertheless, it is clear that there are two main types of nonsampling error in the CPS. The first is error imported from other frames or sources of information, such as Decennial Census omissions. The second type is considered preventable, such as when the sample is not completely representative of the intended population, withinhousehold omissions, respondents not providing true answers to a questionnaire item, or errors produced during the processing of the survey data. Chapter 16 discusses the presence of CPS nonsampling error, but not the effect of the error on the estimates. The present chapter, however, focuses on the sources and operational efforts used to control the occurrence of error in the survey processes. Each section discusses a procedure aimed to reduce either coverage, nonresponse, response, or data processing errors. Despite the effort to treat each control as a separate entity, they nonetheless affect the survey in a general way. For example, training, even if focused on a specific problem, can have broadreaching total survey effects. An understanding of the sources and causes of nonsampling error is necessary to design procedures to control the effects of these errors on the estimates. An important source of nonsampling error that is difficult to understand and control arises out of the interaction of the people involved in the survey. The methods by which administrators, analysts, programmers, and field personnel interact and exchange information are a potential source of nonsampling error. One attempt to reduce this source of nonsampling error is demonstrated by the use of Deming’s quality management procedures in the 1990 sample redesign (Waite, 1991). It emphasizes the relationship between process design and quality improvement. It is a nonbureaucratic approach that has all involved persons working together toward a common goal. Each person is accountable and responsible for the verification of the consistency, reliability, and accuracy of his/her output before it is used as input to a succeeding process or stage. The quality of all previous processes affects the quality of all succeeding processes. Communication across all stages is enhanced by the inclusion of the entire staff involved in the operations, planning, and implementation of the processes up-front, even before the writing of specifications. Included in the notions of interaction and exchange of information across all personnel levels and processes are feedback, continuous improvement, the introduction of automation and computerization, and training. Feedback, when accompanied by corrective measures, is vital and must flow in all directions between the field representative, the regional office, headquarters, and the data preparation office in Jeffersonville, Indiana. (Focus groups are an example of an effective source of feedback.) Continuous improvement in every process and stage of the CPS is a requirement, not an option. Also, automation and computerization are major factors in the control of nonsampling error, greatly improving uniformity of operations, as well as data access and control. The degree of automation is continuously increasing, thus affecting the amount and allocation of responsibilities. Finally, training is a continuing task in each regional office, the Jeffersonville office, and headquarters, and it is a major factor in controlling the occurrence of nonsampling error. (See Appendix F for a more detailed discussion on training procedures in the regional offices.) This chapter deals with many sources and many controls for nonsampling error. It covers coverage error, error from nonresponse, response error, and processing errors. Many of these types of errors interact, and they can occur 15–2 anywhere in the survey process. Although it is believed that some of the nonsampling errors are small, their full effects on the survey estimates are unknown. The CPS attempts to prevent such errors from entering the survey process or tries to keep them as small as possible. SOURCES OF COVERAGE ERROR Coverage error exists when a survey does not completely represent the population of interest. When conducting a sample survey, a primary goal is to give every unit (for example, person or housing unit) in the target universe a known probability of being selected into the sample. When this occurs, the survey is said to have 100 percent coverage. This is rarely the case, however, since errors can enter the system during almost any phase of the survey process. A bias in the survey estimates results if characteristics of units erroneously included or excluded from the survey differ from those correctly included in the survey. Historically in the CPS, the net effect of coverage errors has been a loss in population (resulting from undercoverage). The primary sources of CPS coverage error are: 1. Frame omission. Erroneous frame omission of housing units occurs when the list of addresses is incomplete. This can occur in all of the four sampling frames (i.e., unit, permit, area, and group quarters; see Chapter 3). Since these erroneously omitted housing units cannot be sampled, undercoverage of the target population results. Reasons for frame omissions are: • Census Address List: The census address list may be incomplete or some units are not locatable. This occurs when the census lister fails to canvass an area thoroughly or misses units in multiunit structures. • New Construction: Some areas of the country are not covered by building permit offices and are not recanvassed (unit frame) to pick up new housing. New housing units in these areas have zero probability of selection. • Mobile Homes: Mobile homes that move into areas that are not recanvassed (unit frame) also have a zero probability of selection. 2. Misclassification of housing units. Erroneous frame inclusions of housing units occur when housing units outside an appropriate area boundary are misclassified as inside the boundary. Other erroneous inclusions occur when a single unit is recorded as two units through faulty application of the housing unit definition. Since erroneously included housing units can be sampled, and if the errors are not detected and corrected, overcoverage of the target population results. Misclassification also occurs when an occupied housing unit is treated as vacant. 3. Within-housing unit omissions. Undercoverage of persons can arise from failure to list all usual residents of a housing unit on the questionnaire and from misclassifying a household member as a nonmember. 4. Within-housing unit inclusions. Overcoverage can occur because of the erroneous inclusion of persons on the roster for a housing unit. For instance, persons with a usual residence elsewhere are treated as members of the sample housing unit. Other sources of coverage error are omission of homeless persons from the frame, unlocatable addresses from building permit lists, conversion of nonresidential to residential units, and missed housing units due to the start dates for the sampling of permits.1 Newly constructed group quarters are another possible source of undercoverage. If noticed on the permit lists, the general rule is to purposely remove these group quarters; if not noticed and the field representative discovers a newly constructed group quarters, the case is stricken from the roster of sample addresses and never visited again for CPS. For example, dormitories are group quarters in the 1990 decennial census and are included in the group quarters frame. They are picked up in the other frames, not in the permit frame. The sampling frames are designed to limit missed housing units in the census frame from affecting coverage of the CPS frames (see Chapter 3). Through various studies, it has been determined that the number of housing unit misses is small; this is because of the way the sampling frames are designed and because of the routines in place to ensure that the listing of addresses is performed correctly. A measure of potential undercoverage is the coverage ratio which incorporates the completeness of the coverage because of frame omissions, misclassification of housing units, and within-housing unit omissions and inclusions. For example, the coverage ratio for the total population, at least 16 years of age, is approximately 92 percent (January 1997). In general, there is low coverage for Blacks (about 83 percent), with Black males of ages 25-34 having the lowest coverage. Females have higher coverage. See Chapter 16 for further discussion of coverage error. CONTROLLING COVERAGE ERROR This section focuses on those processes during the sample design and sample preparation stages which aim to control housing unit omissions. Other sources of undercoverage can be viewed as response error and are described in a later section of this chapter. 1 Sampling of permits began with those issued in 1989; the month varied depending on the size of structure and region. Housing units whose permits were issued before the start month in 1989 and not built by the time of the census may be missed. 15–3 Sample Design The CPS sample selection is coordinated with the samples of nine other demographic surveys. These surveys undergo the same verification processes, so the discussion in the rest of this section is in no way unique to the CPS. Sampling verification basically consists of testing and production. They are different in that the intent of the testing phase is to design the processes better to improve quality, and the intent of the production phase is to verify or inspect quality to improve the processes. Both phases are coordinated across the various surveys. Sampling intervals, expected sample sizes, and the premise that a housing unit is selected for only one survey for 10 years are continually verified. The testing phase tests and verifies the various sampling programs before any sample is selected, ensuring that the programs work individually and collectively. Smaller data sets with unusual observations test the performance of the system in extreme situations. For example, unusual circumstances encountered in the 1990 decennial census are used to test the 1990 CPS sampling procedures. These circumstances include PSUs with no sample, crews of maritime vessels, group quarters, and Indian Reservations. Programs are then written to verify that procedures are in place to handle the problems. Also, actual data from the 1990 decennial census are used to test and verify the sampling processes. The production phase of sampling verification treats output verification, especially expected sample sizes, focusing on whether the system ran successfully on actual data. The following are illustrations and do not represent any particular priority or chronology: 1. PSU probabilities of selection are verified; for example, their sum should be one. 2. Edits are done on file contents, checking for blank fields, out-of-range data, etc. Listing sheet review. The listing sheet review is a check on each month’s listings to keep the nonsampling error as low as possible. Its aim is to ensure that the interviews will be conducted at the correct units and that all units have one and only one chance of selection. This review also plays a role in the verification of the sampling, subsampling, and relisting. At present, the timing of the review is such that not all errors detected in the listing sheet review are able to be corrected in time for incorporation into CPS. However, automation of the review is a major advancement towards improving the timing of the process and, thus, a more accurate CPS frame. Listing sheet reviews for the unit, permit, area, and group quarters samples are performed in varying degrees by both the regional offices and the Jeffersonville office. The fact that these reviews are performed across different parts of the Census Bureau organization is, in itself, a means of controlling nonsampling error. In terms of the reviews performed by the regional offices, of utmost importance is the review of field representative explanations made in the remarks column or footnotes section of the listing sheet. Other processes that the regional offices follow to verify and correct the listing sheets for samples from each type of frame follow. After a field representative makes an initial visit either to a multiunit address from a unit frame or to a sample from a permit frame, the Unit/Permit Listing Sheet is reviewed. This review occurs the first time the basic address or sample from a permit frame is in sample for any survey. However, if major changes occur at an address on a subsequent visit, a review of the revised listing is done. If there is evidence that the field representative encountered a special or unusual situation, the materials are compared to the instructions in the Listing and Coverage Manual for Field Representatives (see Chapter 4) to ensure that the situation was handled correctly. Depending on whether it is a permit frame sample or a multiunit address from a unit frame, the following are verified: 3. When applicable, the files are coded to check for consistency; for example, the number of PSUs ‘‘in’’ should equal the number of PSUs ‘‘out’’ at PSU selection. 1. Were correct entries made on the listing sheet when either fewer or more units were listed than expected? 4. Information is centralized; for example, the sampling rates for all states and across all surveys are located in one parameter file. 3. Were new listing sheets prepared? 5. As a form of overall consistency verification, the output at several stages of sampling is compared to that of the former CPS design. 5. Was an extra unit discovered? 2. Was CPS the first survey to interview at the multiunit address? 4. Were no units listed? 6. Was a sample unit added without a serial number? 7. Did the unit designation change? Sample Preparation Sample preparation activities, in contrast to those of sample design, are ongoing. The listing sheet review and check-in of sample are monthly production processes, and the listing check is an ongoing field process. 8. Was a sample unit demolished or condemned? Errors and omissions are corrected and brought to the attention of the field representative. This may involve contacting the field representative for more information. 15–4 The regional office review of Area and Group Quarters Listing Sheets is performed on newly listed or updated samples from area or group quarters frames. Basically, this review is for completeness of identification. As stated previously, listing sheet reviews are also performed by the Jeffersonville office. The following are activities involved in the reviews for the unit sample: 1. Use the mailout control to check-in listing sheets from the regional offices and followup any that are missing. 2. If subsampling was done, review for accuracy. 3. If any large multiunit addresses were relisted, review for accuracy. 4. If there were any address changes, key the changes into the address updates system for future samples at the same basic address. 5. Keep tallies by regional office and survey of when any of situations 2 through 4 above were encountered; provide headquarters with these tallies on a quarterly basis. In terms of the review of listing sheets for permit frame samples, Jeffersonville checks to see whether the number of units is more than expected and whether there are any changes to the address. If there are more units than expected, Jeffersonville determines whether the additional units already had a chance of selection. If so, the regional office is contacted with instructions for updating the listing sheet. If there are any address changes, Jeffersonville provides updates to headquarters for potential future samples at the same basic address. For the area and group quarters sample, the listing sheets remain in the regional offices. However, Jeffersonville receives a summary of the sampling that was done by each regional office ensuring that, given the sampling rate, the regional office sampled the appropriate number of units. If not, the regional office is contacted and corrections are made. The majority of listing sheets are found without any inaccuracies during the regional office and Jeffersonville reviews across all frame types. When errors or omissions are detected, they usually occur in batches, signifying a misinterpretation of instructions by a regional office or a newly hired field representative. Therefore, because of these low error rates during the listing sheet review, the intent and value of this review are being revisited. Check-in of sample. Depending on the sampling frame and mode of interview, the monthly process of check-in of sample can occur many times as the sample cases progress through the regional offices, the Jeffersonville office, and headquarters. Check-in of sample describes the processes by which the regional offices verify that the field representatives receive all the sample cases that they are supposed to and that all are completed and returned. Since the CPS is now conducted entirely by either computer assisted personal or telephone interview, a closer control of the tracking of a sample case exists than was ever possible. Control systems are in place to control and verify the sample count. The regional offices use these systems to control the sample during the period when it is active, and all discrepancies must be resolved before the office is able to certify that the workload in the database is correct. Since these systems aid in the balancing of the workload, time and effort can then be spent elsewhere. Listing check. The purpose of the regional offices’ listing check is to provide assurance that quality work is done on the listing operation and to provide feedback to the field representatives about their performance, with possible retraining when a field representative misunderstands or misapplies a listing concept. In the field, it checks a sample of each field representative’s work to see whether he/she began the listing of units at the correct location, listed the units correctly, stayed in the appropriate geographic boundaries, and included all the units. Since it is not a part of the monthly production cycle and it can occur either before or after an interview, its intent is not to capture and correct data. Rather, it is an attempt, over time, to counteract any misunderstanding of listing and coverage procedures. Since January 1993, the listing check has been a separate operation from reinterview; it is generally combined with some other personal visit field operation, such as observation. To avoid a bias in the designation of the units to be interviewed, the listing is performed without any knowledge of which units will be in the interview or reinterview. A listing check is performed only on area frame samples for field representatives who list or update. These field representatives are randomly checked once during the fiscal year, and depending on how close assignments are to each other, at least two assignments are checked to better evaluate the field representative’s listing skills. The check of the unit frame sample was dropped when address samples were converted to samples from the unit frame in the 1990 Sample Redesign. This was done since little of the unit frame sample requires listing. Also, because permit and group quarters samples constitute such a small part of the universe, no formal listing check is done for sample in these frames. SOURCES OF ERROR DUE TO NONRESPONSE There are a variety of sources of nonresponse error in the CPS. Noninterviews could include housing units that are vacant, not yet built, demolished, or are not residential living units; however, these types of noninterviews are not considered to be nonresponses since they are out-ofscope and not eligible for interview. (See Chapter 7 for a description of the types of noninterviews.) Nonresponse error does occur when households that are eligible for interview are not interviewed for some reason: a respondent refuses to participate in the survey, is incapable of 15–5 completing the interview, or is not available or not contacted by the interviewer during the survey period, perhaps due to work schedules or vacation. These household noninterviews are called Type A noninterviews. There are additional kinds of nonresponse in the CPS. Individual persons within the household may refuse to be interviewed, resulting in person nonresponse. Person nonresponse has not been much of a problem in the CPS because any responsible adult in the household is able to report for other persons in the household as a proxy reporter. Also, panel nonresponse exists when those persons who live in the same household during the entire time they are in the CPS sample may not agree to be interviewed each of the 8 months. Thus, panel nonresponse can be important if the CPS data are used longitudinally. Finally, some respondents who complete the CPS interview may be unable or unwilling to answer specific questions resulting in item nonresponse. Imputation procedures are implemented for item nonresponse. However, because there is no way of ensuring that the errors of item imputation will balance out, even on an expected basis, item nonresponse also introduces potential bias into the estimates. For a sense of the magnitude of the error due to nonresponse, the household noninterview rate was 6.34 percent in January 1997. For item nonresponse, it is interesting to note allocation rates of 0.3 percent for the characteristic of Labor Force Status, 1.4 percent for Race, and 1.7 percent for Occupation, all using January 1997 weighted data. (See Chapter 16 for discussion on various quality indicators of nonresponse error.) CONTROLLING ERROR DUE TO NONRESPONSE Field Representative Guidelines2 Response/nonresponse rate guidelines have been developed for field representatives to help ensure the quality of the data collected. Maintaining high response rates is of primary importance, and response/nonresponse guidelines have been developed with this in mind. These guidelines, when used in conjunction with other sources of information, are intended to assist supervisors in identifying field representatives needing performance improvement. A field representative whose response rate, household noninterview rate (Type A), or minutes per case fall below the fully acceptable range based on one quarter’s work is considered in need of additional training and development, and the CPS supervisor takes appropriate remedial action. National and regional response performance data are also provided to permit the regional office staff to judge whether their activities are in need of additional attention. 2 See Appendix F for a detailed discussion, especially in terms of the performance evaluation system. Summary Sheets Another way to control nonresponse error is the production and review of summary sheets. Although produced by headquarters after the release of the monthly data products, they are used to detect changes in historical response patterns. Also, since they are distributed throughout headquarters and the regional offices, other indications of data quality and consistency can be focused upon. Contents of selected summary report tables are as follows: noninterview rates by regional office; monthly comparisons to prior year; noninterview-to-interview conversion rates; resolution status of computer assisted telephone interview cases; interview status by month-in-sample; daily transmittals; and coverage ratios. Headquarters and Regional Offices Working as a Team As detailed in a Methods and Performance Evaluation Memorandum (Reeder, 1997), the Census Bureau and the Bureau of Labor Statistics formed an interagency work group to examine CPS nonresponse in detail. One goal was to share possible reasons and solutions for the declining CPS response rates. A list of 31 questions was prepared for the regional offices to help aid in the understanding of CPS field operations, to solicit and share the regional offices’ views on the causes of the increasing nonresponse rates, and to evaluate methods to decrease these rates. All of the answers provide insight on CPS operations that may affect nonresponse and followup procedures for household noninterviews, but a listing of a few will suffice: • The majority of regional offices responded that there is written documentation of the followup process for CPS household noninterviews; • The standard process was that a field representative must let the regional office know about a possible household noninterview as soon as possible; • Most regions attempt to convert confirmed refusals under certain circumstances; • All regions provide monthly feedback to their CPS field representatives on their household noninterview rates; and • About half of the offices responded that they provide specific regional based training/activities for field representatives on converting or avoiding household noninterviews. Response to one question deserves special attention: Most offices use letters in a consistent manner for followup of noninterviews. Most regional offices also include informational brochures tailored to the respondent with the letters. 15–6 SOURCES OF RESPONSE ERROR The survey interviewer asks a question and collects a response from the respondent. Response error exists if the response is not the true answer. Reasons for response error include: • The respondent misinterprets the question, tries to look better by inflating or deflating the response, does not know the true answer and guesses (for example, recall effects), or chooses a response randomly; • The interviewer reads the question incorrectly, does not follow the appropriate skip pattern, misunderstands or misapplies the questionnaire, or records the wrong answer; • The data collection modes (for example, personal visit and telephone) elicit different response rates, as well as the frequency of interview. Studies have shown that response rates generally decline as the length of time in survey increases; and • The questionnaire does not elicit correct responses, due to a nonuser-friendly format, complicated skip patterns, and difficult coding procedures. Thus, response error can arise from many sources. The survey instrument, the mode of data collection, the interviewer, and the respondent are the focus of this section3, as well as their interactions as discussed in The Reinterview Program. In terms of magnitude, measures of response error are obtainable through the reinterview program; specifically, Chapter 16 discusses the index of inconsistency. It is a ratio of the estimated simple response variance to the estimated total variance, arising from sampling and simple response variance. When identical responses are obtained from trial to trial, both the simple response variance and the index of inconsistency are zero. Theoretically, the index has a range of 0 to 100. For example, the index of inconsistency for the labor force characteristic of unemployed for February through July 1994 is considered moderate at 30.8. CONTROLLING RESPONSE ERROR The Survey Instrument4 The survey instrument involves the CPS questionnaire, the computer software that runs the questionnaire, and the mode by which the data are collected. The modes are 3 Most discussion in this section is applicable whether the interview is conducted via computer assisted personal interview, telephone interview by the field representative or a centralized telephone facility, or computer assisted telephone interview. 4 Many of the topics in this section are presented in more detail in Chapter 6. personal visits or telephone calls made by field representatives and telephone calls made by interviewers at centralized telephone centers. Regardless of the mode, the questionnaire and the software are basically the same (See Chapter 7.) Software. Changes in the early 1990s resulted in the use of computer assisted interviewing technology in the CPS. Designed in an automated environment, this technology allows very complex skip patterns and other procedures which combine data collection, data input, and a degree of in-interview consistency editing into a single operation. This technology provides an automatic selection of questions for each interview. The screens display response options, if applicable, and information about what to do next. The interviewer does not have to worry about skip patterns, with the possibility of error. Appropriate proper names, pronouns, verbs, and reference dates are automatically filled into the text of the questions. If there is a refusal to answer a demographic item, that item is not asked again in later interviews; rather, it is longitudinally allocated. This exhibits the trade-off of nonsampling error existing for the item versus the possibility of a total noninterview. The instrument provides opportunities for the field representative to review and correct any incorrect/inconsistent information before the next series of questions is asked, especially in terms of the household roster. In later months, the instrument passes industry and occupation information forward to be verified and corrected. In addition to reducing response and interviewer burden, this also avoids erratic variations in industry and occupation codes among pairs of months for people who have not changed jobs, but describe their industry and occupation differently in the 2 months. The questionnaire. Beginning in January 1994, the CPS used a questionnaire in which the wording was different from what was collected previously for almost every labor force concept. One of the objectives in redesigning the CPS questionnaire was to reduce the potential for response error in the questionnaire-respondent-interviewer interaction and to improve measurement of CPS concepts. The approaches used to lessen the potential for response error (i.e., enhanced accuracy) were: shorter and clearer question wording, splitting complex questions into two or more questions, building concept definitions into question wording, reducing reliance on volunteered information, explicit and implicit strategies for the respondent to provide numeric data, and the use of precoded response categories for open-ended questions. Although they were also used with the pencil and paper questionnaire, interviewer notes recorded at the end of the interview are critical to obtaining reliable and accurate responses. Mode of data collection. As stated in Chapters 7 and 16, the first and fifth months’ interviews are done in person whenever possible, while the remaining interviews may be conducted via telephone either by the field interviewer or 15–7 an interviewer from a centralized telephone facility. Although each mode has its own set of performance guidelines that must be adhered to, similarities do exist. The controls detailed in The Interviewer of this section, and Error Due to Nonresponse, which are mainly directed at personal visits, are basically valid for the calls made from the centralized facility via the supervisor’s listening in. Continuous testing and improvements. Experience gained during the CPS redesign and research on questionnaire design has assisted in the development of methods for testing new or revised questions for the CPS. In addition to reviewing new questions to ensure that they will not jeopardize the collection of basic labor force information and to determine whether the questions are appropriate additions to a household survey about the labor force, the wording of new questions are tested to gauge whether respondents are correctly interpreting the questions. Chapter 6 provides an extensive listing of the various methods of testing and the Census Bureau developed a set of protocols for pretesting demographic surveys. There is also interest in improving existing questions. The ‘‘don’t know’’ and refusal rates for specific questions are monitored, inconsistencies caused by instrument-directed paths through the survey or instrument-assigned classifications are looked for during the estimation process, and interviewer notes recorded at the conclusion of the interview are reviewed. Also, focus groups with CPS interviewers and supervisors are periodically conducted. Despite the benefits of adding new questions and improving existing ones, changes in the CPS are approached cautiously until the effects are measured and evaluated. When possible, methods to bridge differences caused by changes or techniques to avoid the disruption of historical series are included in the testing. In fact, for the 18 months prior to the implementation of the 1994 redesigned CPS, a parallel survey was conducted using the new methodology and, for the first 5 months after the redesigned survey was introduced, a parallel survey was conducted using the unrevised procedures. Results from the parallel survey were used to anticipate the effect the changes would have on the survey estimates, nonresponse rates, etc. (Kostanich and Cahoon, 1994; Polivka, 1994; and Thompson, 1994.) Comparable testing and parallel surveys were used for previous revisions to the questionnaire. The Interviewer Interviewer training, observation, monitoring, and evaluation are all methods used to control nonsampling error, arising from both the frame and data collection. For further discussion of field representative guidelines, see this chapter’s Error Due to Nonresponse section and Appendix F. Group training and home study are continuing tasks in each regional office to control various nonsampling errors, and they are tailored to the types of duties and length of service of the interviewer. Observations, including those of the listing check, are extensions of classroom training and provide on-the-job training and on-the-job evaluation. Field observation is one of the methods used by the supervisor to check and improve the performance of the field representative. It provides a uniform method for assessing the field representative’s attitudes toward the job and use of the computer and evaluating the field representative’s ability to apply CPS concepts and procedures during actual work situations. There are three types of observations: initial, general performance review, and special needs. Across all types, the observer stresses good interviewing techniques: asking questions as worded and in the order presented on the questionnaire, adhering to instructions on the instrument and in the manuals, knowing how to probe, recording answers in the correct manner and in adequate detail, developing and maintaining good rapport with the respondent conducive to an exchange of information, avoiding questions or probes that suggest a desired answer to the respondent, and determining the most appropriate time and place for the interview. The emphasis is on correcting habits which interfere with the collection of reliable statistics. The Respondent: Self Versus Proxy The CPS Interviewing Manual (see Chapter 4) states that any household member 15 years of age or older is technically eligible to act as a respondent for the household. The field representative attempts to collect the labor force data from each eligible individual; however, in the interests of timeliness and efficiency, any knowledgeable adult household member can provide the information. Also, the survey instrument is structured so that every effort is made to interview the same respondent every month. Just over one-half of the CPS labor force data are collected by self-response, and most of the remainder is collected by proxy from a household respondent. The use of a nonhousehold member as a household respondent is only allowed in certain limited situations; for example, the household may consist of a single person whose physical or mental health does not permit a personal interview. As a caveat, there has been a substantial amount of research into self versus proxy reporting, including research involving CPS respondents. Much of the research indicates that self-reporting is more reliable than proxy reporting, particularly when there are motivational reasons for self and proxy respondents to report differently. For example, parents may intentionally ‘‘paint a more favorable picture’’ of their children than fact supports. However, there are some circumstances in which proxy reporting is more accurate, such as in responses to certain sensitive questions. Interviewer/Respondent Interaction Rapport with the respondent is a means of improving data quality. This is especially true for personal visits, which 15–8 are required for months-in-sample one and five whenever possible. By showing a sincere understanding and interest in the respondent, a friendly atmosphere is created in which the respondent can talk honestly and openly. Interviewers are trained to ask questions exactly as worded and to ask every question. If the respondent misunderstands or misinterprets a question, the question is repeated as worded and the respondent is given another chance to answer; probing techniques are used if a relevant response is still not obtained. The respondent should be left with a friendly feeling towards the interviewer and the Census Bureau, clearing the way for future contacts. The Reinterview Program - Quality Control and Response Error5 One of the objectives of the reinterview program is to evaluate individual field representative performance. Thus, it is a significant part of quality control. It checks a sample of the work of a field representative and identifies and measures aspects of the field procedures which may need improvement. It is also critical in the identification and prevention of data falsification. The reinterview program also provides a measure of response error. Responses from first and second interviews at selected households are compared and differences are identified and analyzed. This helps to evaluate the accuracy of the original survey results; as a byproduct, instructions, training, and procedures are also evaluated. SOURCES OF MISCELLANEOUS ERRORS Data processing errors are one focus of this final section. Their sources can include data entry, industry and occupation coding, and methodologies for edits, imputations, and weighting. Also, the CPS Population Controls are not error-free; a number of approximations or assumptions are used in their derivations. Other sources are composite estimation and modeling errors which may arise from, for example, seasonally adjusted series for selected labor force data, and monthly model-based state labor force estimates. CONTROLLING MISCELLANEOUS ERRORS Parallel Survey Among specific processes that are used to control nonsampling error in each of these sources, the parallel survey mentioned in Response Error crosses several sources. The 18 months of the parallel survey were used to test the survey instrument and the development of the processing 5 Appendix G provides an overview of the design and methodology of the entire Reinterview Program. system. It ensured that the edits, imputations, and weighting program were working correctly. It often uncovered related problems, such as incorrect skip-patterns. (The Bureau of Labor Statistics was involved extensively in the review of the edits.) Industry and Occupation Coding Verification To be ultimately accepted into the CPS processing system, files containing records needing three-digit industry and occupation codes are electronically sent to the Jeffersonville office for the assignment of these codes (see Chapter 9). Once completed and transmitted back to headquarters, the remainder of the production processing, including edits, weighting, microdata file creation, and tabulations can begin. Using on-line industry and occupation reference materials, the Jeffersonville coder enters three-digit numeric industry and occupation codes that represent the description for each case on the file. If the coder cannot determine the proper code, the case is assigned a referral code, which will later be coded by a referral coder. A substantial effort is directed at the supervision and control of the quality of this operation. The supervisor is able to turn the dependent verification setting on or off at any time during the coding operation. (In the on mode, a particular coder’s work is to be verified by a second coder. Additionally, a 10 percent sample of each month’s cases is selected to go through a quality assurance system to evaluate each coder’s work. The selected cases are verified by another coder after the current monthly processing has been completed. Upon completion of the coding and possible dependent verification of a particular batch, all cases for which a coder assigned at least one referral code must be reviewed and coded by a referral coder. Edits, Imputation, and Weighting As detailed in Chapter 9, there are six edit modules: household, demographic, industry and occupation, labor force, earnings, and school enrollment. Each module establishes consistency between logically related items, assigns missing values using relational imputation, longitudinal editing, or cross-sectional imputation, and deletes inappropriate entries. Each module also sets a flag for each edit step that can potentially affect the unedited data. Consistency is one of the checks used to control nonsampling error. Are the data logically correct, and are the data consistent within the month? For example, if a respondent says that he/she is a doctor, is he/she old enough to have achieved this occupation? Are the data consistent from the previous month and across the last 12 months? The imputation rates should normally stay about the same as the previous month’s, taking seasonal patterns into account. Are the universes verified, and how consistent are the interview rates? In terms of the weighting, a check is 15–9 made for zero or very large weights. If such outliers are detected, verification and possible correction follow. Another method to validate the weighting is to look at coverage ratios which should fall within certain historical bounds. A key working document used by headquarters for all of these checks is a three-sheet tabulation of monthly summary statistics that highlights the six edit modules and the weighting program for the past 12 months. Modeling Errors Extensive verification at the time of program development was done to ensure that it was working correctly. Also, as part of the routine production processing at headquarters and as part of the routine check-in of the data by the Bureau of Labor Statistics, both compute identical tables in composited and uncomposited modes. These tables are then checked cell for cell to ensure that all uncomposited cells across the two agencies and that all composited cells across the two agencies are identical. If so, and they always have been, the data are deemed to have been computed correctly. It may be hypothesized that although modeling may indeed reduce some sampling and nonsampling errors, other nonsampling error due to the misspecification of the model may be introduced. Regardless, this final section focuses on a few of the methods of nonsampling error reduction, as applied to the seasonal adjustment programs, monthly model-based state labor force estimates, and composite estimation for selected labor force data. (See Chapter 10 for a discussion of these procedures.) Changes that occur in a seasonally adjusted series reflect changes other than those arising from normal seasonal change; they are believed to provide information about the direction and magnitude of changes in the behavior of trend and business cycle effects. They may, however, also reflect the effects of sampling and nonsampling errors, which are not removed by the seasonal adjustment process. Research into the sources of these irregularities, specifically nonsampling error, can then lead to controlling their effects and even removal. The seasonal adjustment programs contain built-in checks as verification that the data are well-fit and that the modeling assumptions are reasonable. These diagnostic measures are a routine part of the output. The processes for controlling nonsampling error during the production of monthly model-based state labor force estimates are very similar to those used for the seasonal adjustment programs. Built-in checks exist in the programs and, again, a wide range of diagnostics are produced that indicate the degree of deviation from the assumptions. CPS Population Controls REFERENCES The CPS composite estimator is used for the derivation of most official CPS labor force estimates that are based upon information collected every month from the full sample (i.e., not collected in periodic supplements or from partial samples.) As detailed in Chapter 10, it modifies the aggregated estimates without adjusting the weights of individual sample records by combining the simple weighted estimate and the composite estimate for the preceding month plus an estimate of the change from the preceding to the current month. Total and state-level CPS population controls are developed by the Census Bureau independently from the collection and processing of CPS data. These monthly independent projections of the population are used for the iterative, second-stage weighting of the CPS data. All of the estimates start with the last census, with administrative records and projection techniques providing updates. (See Appendix D for a detailed discussion of the methodologies.) As a means of controlling nonsampling error throughout the processes, numerous internal consistency checks in the programming are performed. For example, input files containing age and sex detail are compared to independent files that give age and sex totals. Second, internal redundancy is intentionally built into the programs that process the files, as well as in files that contain overlapping/ redundant data. Third, a clerical review (i.e., two-person review) of all handwritten input data is performed. A final and extremely important means of assuring that quality data are input into the CPS population controls is the continuous research into improvements in methods of making population estimates and projections. Brooks, C.A. and Bailar, B.A. (1978), An Error Profile: Employment as Measured by the Current Population Survey, Statistical Policy Working Paper 3, Washington, D.C.: Subcommittee on Nonsampling Errors, Federal Committee on Statistical Methodology, U.S. Department of Commerce. Kostanich, D.L. and Cahoon, L.S. (1994), Effect of Design Differences Between the Parallel Survey and the New CPS, CPS Bridge Team Technical Report 3, dated March 4. McCarthy, P.J. (1978), Some Sources of Error in Labor Force Estimates from the Current Population Survey, Background Paper No. 15, Washington, D.C.: National Commission on Employment and Unemployment Statistics. Polivka, A.E. (1994), Comparisons of Labor Force Estimates from the Parallel Survey and the CPS during 1993: Major Labor Force Estimates, CPS Overlap Analysis Team Technical Report 1, dated March 18. Reeder, J.E. (1997), Regional Response to Questions on CPS Type A Rates, Bureau of the Census, CPS Office Memorandum No. 97-07, Methods and Performance Evaluation Memorandum No. 97-03, January 31, 1997. 15–10 Thompson, J. (1994), Mode Effects Analysis of Major Labor Force Estimates, CPS Overlap Analysis Team Technical Report 3, April 14,1994. Waite, P.J. (1991), Continuing the Project Development Plan, Bureau of the Census internal memorandum, June 7, 1991. 16–1 Chapter 16. Quality Indicators of Nonsampling Errors (Updated coverage ratios, nonresponse rates, and other measures of quality can be found by clicking on ‘‘Quality Measures’’ at www.bls.census.gov/cps) INTRODUCTION Coverage Ratios Chapter 15 contains a description of the different sources of nonsampling error in the CPS and the procedures intended to limit those errors. In the present chapter, several important indicators of potential nonsampling error are described. Specifically, coverage ratios, response variance, nonresponse rates, mode of interview, time in sample biases, and proxy reporting rates are discussed. It is important to emphasize that unlike sampling error, these indicators show only the presence of potential nonsampling error, not the actual degree of nonsampling error present. Nonetheless, these indicators of nonsampling error are regularly used to monitor and evaluate data quality. For example, surveys with high nonresponse rates are judged to be of low quality, but the actual nonsampling error of concern is not the nonresponse rate itself, but rather nonresponse bias, that is, how the respondents differ from the nonrespondents on the variables of interest. Although it is possible for a survey with a lower nonresponse rate to have greater nonresponse bias than a survey that has a higher nonresponse rate (if the nonrespondents differ to a greater degree from the respondents in the survey with the lower nonresponse rate), one would generally expect that greater nonresponse indicates a greater potential for bias. Unfortunately, while it is relatively easy to measure nonresponse rates, it is extremely difficult to measure or even estimate nonresponse bias. Thus, these indicators are simply a measurement of the potential presence of nonsampling errors. We are not able to quantify the effect the nonsampling error has on the estimates, and we do not know the combined effect of all sources of nonsampling error. One way to estimate the coverage error present in a survey is to compute a coverage ratio. A coverage ratio is a ratio of the estimated number of persons in a specific demographic group from the survey over an independent population total for that group. The CPS coverage ratios are computed by dividing a CPS estimate using the weights after the first-stage ratio adjustment by the independent population controls used to perform the second-stage ratio adjustment. See Chapter 9 for more information on computation of weights. Population controls are not error free. A number of approximations or assumptions are required in deriving them. See Appendix D for details on how the controls are computed. Chapter 15 highlighted potential error sources in the population controls. Undercoverage exists when the coverage ratio is less than 1.0 and an overcoverage when the ratio is greater than 1.0. Table 16−1 shows the average monthly coverage ratios for January 1996 to April 1996. It is estimated that the CPS has a 92.6 percent coverage rate for the first 4 months of 1996. This implies that the survey is missing approximately 7.4 persons per 100. This rate has been fairly consistent throughout the 1990s. It is difficult to compare this coverage rate to prior decades because of sample design changes and changes in the method that produces the population totals, but the total coverage rate was estimated at 96.3 percent during the mid-1970s (Hanson, 1978). In terms of race, Whites have the highest coverage ratio (93.8 percent) while Blacks have the lowest (83.9 percent). Whites and Other1 races are significantly different from Blacks, but not significantly different from each other. Black males age 20-29 have the lowest coverage ratio (66.2 percent) of any race/age group. Females across all races except Other race have higher coverage ratios than males. Hispanics2 also have relatively low coverage rates. Historically, Hispanics and Blacks have lower coverage rates than Whites for each age group, particularly the 20-29 age group. This is by no fault of the interviewers or the CPS process. These lower coverage rates for minorities affect labor force estimates because persons that are missed by the CPS are almost surely on the average quite different from those that are included. The persons that are missed COVERAGE ERRORS When conducting a sample survey, the primary goal is to give every person in the target universe a known probability of selection into the sample. When this occurs, the survey is said to have 100 percent coverage. This is rarely the case, however. Errors can enter the system during almost any phase of the survey process, from frame creation to interviewing. A bias in the survey estimates results when characteristics of persons erroneously included or excluded from the survey differ from those correctly included in the survey. Historically in the CPS, the net effect of coverage errors has been a loss in population (resulting from undercoverage). 1 Other includes American Indian, Eskimo, Aleut, Asian, and Pacific Islander. 2 Hispanics may be of any race. 16–2 Table 16–1. CPS Coverage Ratios by Age, Sex, Race, and Ethnicity [Average for January 1996–April 1996] Age, sex, race, and ethnicity Average monthly coverage ratio Age, sex, race, and ethnicity Average monthly coverage ratio Age, sex, race, and ethnicity Average monthly coverage ratio Age, sex, race, and ethnicity Average monthly coverage ratio Total1 . . . . . . . . . . . . . . . . . . 0.9255 White . . . . . . . . . . . . . . . . . . 0.9376 Black . . . . . . . . . . . . . . . . 0.8392 Other2 race. . . . . . . . . . . . 0.9255 Hispanic3 . . . . . . . . . . . . . 0.8331 Male . . . . . . . . . . . . . . . . . . . 16-19 . . . . . . . . . . . . . . . . . 20-29 . . . . . . . . . . . . . . . . . 30-39 . . . . . . . . . . . . . . . . . 40-59 . . . . . . . . . . . . . . . . . 60-69 . . . . . . . . . . . . . . . . . 70+. . . . . . . . . . . . . . . . . . . 0.9163 Male . . . . . . . . . . . . . . . . . . 0.8974 16-19 . . . . . . . . . . . . . . . . 0.8406 20-29 . . . . . . . . . . . . . . . . 0.9091 30-39 . . . . . . . . . . . . . . . . 0.9357 40-59 . . . . . . . . . . . . . . . . 0.9705 60-69 . . . . . . . . . . . . . . . . 0.9672 70+ . . . . . . . . . . . . . . . . . . 0.7727 Male . . . . . . . . . . . . . . . . . . 0.8328 16-19 . . . . . . . . . . . . . . . . 0.6616 20-29 . . . . . . . . . . . . . . . . 0.7104 30-39 . . . . . . . . . . . . . . . . 0.8383 40-59 . . . . . . . . . . . . . . . . 0.8735 60-69 . . . . . . . . . . . . . . . . 0.8644 70+ . . . . . . . . . . . . . . . . . . 0.9119 Male . . . . . . . . . . . . . . . . . . 0.8754 16-19 . . . . . . . . . . . . . . . . 0.8615 20-29 . . . . . . . . . . . . . . . . 0.8623 30-39 . . . . . . . . . . . . . . . . 0.9616 40-59 . . . . . . . . . . . . . . . . 1.0110 60-69 . . . . . . . . . . . . . . . . 1.0236 70+ . . . . . . . . . . . . . . . . . . 0.7841 0.8693 0.7485 0.7790 0.7904 0.7899 0.7982 Female. . . . . . . . . . . . . . . . . 16-19 . . . . . . . . . . . . . . . . . 20-29 . . . . . . . . . . . . . . . . . 30-39 . . . . . . . . . . . . . . . . . 40-59 . . . . . . . . . . . . . . . . . 60-69 . . . . . . . . . . . . . . . . . 70+. . . . . . . . . . . . . . . . . . . 0.9575 Female . . . . . . . . . . . . . . . . 0.9385 16-19 . . . . . . . . . . . . . . . . 0.9130 20-29 . . . . . . . . . . . . . . . . 0.9501 30-39 . . . . . . . . . . . . . . . . 0.9613 40-59 . . . . . . . . . . . . . . . . 0.9681 60-69 . . . . . . . . . . . . . . . . 1.0173 70+ . . . . . . . . . . . . . . . . . . 0.8931 Female . . . . . . . . . . . . . . . . 0.8676 16-19 . . . . . . . . . . . . . . . . 0.8170 20-29 . . . . . . . . . . . . . . . . 0.8650 30-39 . . . . . . . . . . . . . . . . 0.9374 40-59 . . . . . . . . . . . . . . . . 0.9541 60-69 . . . . . . . . . . . . . . . . 0.9820 70+ . . . . . . . . . . . . . . . . . . 0.9380 Female . . . . . . . . . . . . . . . . 0.9071 16-19 . . . . . . . . . . . . . . . . 0.8966 20-29 . . . . . . . . . . . . . . . . 0.9023 30-39 . . . . . . . . . . . . . . . . 0.9743 40-59 . . . . . . . . . . . . . . . . 1.0185 60-69 . . . . . . . . . . . . . . . . 1.0025 70+ . . . . . . . . . . . . . . . . . 0.8820 0.8658 0.9015 0.8920 0.8766 0.8310 0.8684 1 The estimated standard errors on the Total, White, Black, Other Race, and Hispanic groups are 0.0058, 0.0064, 0.0180, 0.0322, and 0.0198, respectively. 2 Other includes American Indian, Eskimo, Aleut, Asian, and Pacific Islander. 3 Hispanics may be of any race. Figure 16–1. Average Yearly Noninterview and Refusal Rates for the CPS 1964-1996 16–3 are accounted for in the CPS, but they are given labor force characteristics the same as for those that are included in the CPS. This produces bias in CPS estimates. NONRESPONSE As noted in Chapter 15, there are a variety of sources of nonresponse in the CPS, such as unit or household nonresponse, panel nonresponse, and item nonresponse. Unit nonresponse, referred to as Type A noninterviews, represents households that are eligible for interview but were not interviewed for some reason. Type A noninterviews occur because a respondent refuses to participate in the survey, is too ill or is incapable of completing the interview, is not available or not contacted by the interviewer, perhaps because of work schedules or vacation, during the survey period. Because the CPS is a panel survey, households who respond 1 month may not respond during a following month. Thus, there is also panel nonresponse in the CPS, which can become particularly important if CPS data are used longitudinally. Finally, some respondents who complete the CPS interview may be unable or unwilling to answer specific questions in the CPS resulting in some level of item nonresponse. Type A Nonresponse Type A noninterview rate. The Type A noninterview rate is calculated by dividing the total number of Type A households (refusals, temporarily absent, noncontacts, and other noninterviews) by the total number of eligible households (which includes Type As and interviewed households). As can be seen in Figure 16−1, the noninterview rate for the CPS has remained relatively stable at around 4 to 5 percent for most of the past 35 years; however, there have been some changes during this time. It is readily apparent from Figure 16−1 that there was a major change in the CPS nonresponse rate in January 1994, which reflects the launching of the redesigned survey using computer assisted survey collection procedures. This rise is discussed below. The end of 1995 and the beginning of 1996 also show a jump in Type A noninterview rates that were chiefly because of disruptions in data collection because of shutdowns of the Federal Government (see Butani, Kojetin, and Cahoon, 1996). The relative stability of the overall noninterview rate from 1960 to 1994 masks some underlying changes that have occurred. Specifically, the refusal portion of the noninterview rate has shown an increase over this same period with the bulk of the increase in refusals taking place from the early 1960s to the mid-1970s. In the late 1970s, there was a leveling off so that refusal rates were fairly constant until 1994. To compensate for this increase, there was a corresponding decrease in the rate of noncontacts and other noninterviews. There also appears to be seasonal variation in both the overall noninterview rates and the refusal rates (see Figure 16−2). During the year, the noninterview and refusal rates have tended to increase after January until they reached a peak in March or April at the time of the income supplement. At this point, there was a steep drop in noninterview and refusal rates that extended below the starting point in January until they bottomed out in July or August. The rates then increased and approached the initial level. This pattern has been evident most years in the recent past and appears to be similar for 1994. Effect of the transition to a redesigned survey with computer assisted data collection on noninterview rates. With the transition to the redesigned CPS questionnaire using computerized data collection in January 1994, there was a noticeable increase in the Type A nonresponse rates as can be seen in Figures 16−1 and 16−2. As part of this transition, there were several procedural changes in the collection of data for the CPS, and the adjustment to these new procedures may account for this increase. For example, the CAPI instrument now required the interviewers to go through the entire interview, while previously some interviewers may have conducted shortened interviews with reluctant respondents, obtaining answers to only a couple of critical questions. Another change in the data collection procedures was an increased reliance on using centralized telephone interviewing in CPS. Households not interviewed by the CATI centers by Wednesday night of interview week are recycled back to the field representatives on Thursday morning. These recycled CATI cases can present difficulties for the field representatives because there are only a few days left to make contact before the end of the interviewing period. As depicted in Figure 16-2, there has been greater variability in the monthly Type A nonresponse rates in CPS since the transition in January 1994. In addition, because of events such as the Federal Government shutdowns that affected nonresponse rates, it remains to be seen at what level the Type A nonresponse rates will stabilize for the new CPS. The annual overall Type A rate, the refusal rate, and noncontact rate (which includes temporarily absent households and other noncontacts) are shown in Table 16−2 for the period 1993-1996. Data beyond 1996 were not available at the time of this writing. Table 16–2. Components of Type A Nonresponse Rates, Annual Averages for 1993-1996 [Percent distribution] Nonresponse rate 1993 1994 1995 1996 Overall Type A. . . . . . . . . . . . . . Noncontact . . . . . . . . . . . . . . . . . Refusal . . . . . . . . . . . . . . . . . . . . Other . . . . . . . . . . . . . . . . . . . . . . 4.69 1.77 2.85 .13 6.19 2.30 3.54 .32 6.86 2.41 3.89 .34 6.63 2.28 4.09 .25 Panel nonresponse. Households are selected into the CPS sample for a total of 8 months in a 4-8-4 pattern as described in Chapter 3. Many families in these households 16–4 Figure 16–2. Monthly Noninterview and Refusal Rates for the CPS 1984-1996 Note: The break in the refusal rate indicates government furlough. may not be in the CPS the entire 8 months because of moving (movers are not followed, but the new household members are interviewed). Those who live in the same household during the entire time they are in the CPS sample may not agree to be interviewed each month. Table 16−3 shows the percentage of households who were interviewed 0, 1, 2, … 8 times during the 8 months that they were eligible for interview during the period January 1994 to October 1995. These households represent seven rotation groups (see Chapter 3) that completed all of their rotations in the sample during this period. The vast majority of households, 82 percent, had completed interviews each month, and only 2 percent never participated. The remaining 16 percent participated to some degree (for further information see Harris-Kojetin and Tucker, 1997). the effect of nonresponse on labor force classification by using data from adjacent months and examining the monthto-month flows of persons from labor force categories to nonresponse as well as from nonresponse to labor force categories. Comparisons can then be made for labor force status between households that responded both months with households that responded 1 month, but failed to respond in the other month. However, the labor force status of persons in households that were nonrespondents for both months is unknown. Table 16–3. Percentage of Households1 by Number of Completed Interviews During the 8 Months in the Sample [January 1994–October 1995] Number of completed interviews Effect of Type A Noninterviews on Labor Force Classification. Although the CPS has monthly measures of Type A nonresponse, the total effect of nonresponse on labor force estimates produced from the CPS cannot be calculated from CPS data alone. It is the nature of nonresponse that we do not know what we would like to know from the nonrespondents, and therefore, the actual degree of bias because of nonresponse is unknown. Nonetheless, because the CPS is a panel survey, information is often available at some point in time from households that were nonrespondents at another point. Some assessment can be made of 0 1 2 3 4 5 6 7 8 .............................................. .............................................. .............................................. .............................................. .............................................. .............................................. .............................................. .............................................. .............................................. Percent 2.0 .5 .5 .6 2.0 1.2 2.5 8.9 82.0 1 Includes only households in the sample all 8 months with only interviewed and Type A nonresponse interview status for all 8 months, i.e., households that were out of scope (e.g., vacant) for any month in sample were not included in these tabulations. Movers were not included in this tabulation. 16–5 Table 16–4. Labor Force Status by Interview/Noninterview Status in Previous and Current Month [Average January–June 19971 percent distribution] 1st month labor force status Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Employed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2nd month labor force status Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Employed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ** p < .01 Interview in 2nd month Nonresponse in 2nd month Difference 65.98 62.45 5.35 68.51 63.80 6.87 2.53** 1.35** 1.52** Interview in 1st month Nonresponse in 1st month Difference 65.79 62.39 5.18 67.41 63.74 5.48 1.62** 1.35** .30* * <. 05 1 From Tucker and Harris-Kojetin (1997). Monthly labor force data were used for each consecutive pair of months for January through June 1997, separately for households that responded for each consecutive pair of months, and for households that responded only 1 month and were nonrespondents the other month (see Tucker and Harris-Kojetin, 1997). The top half of Table 16−4 shows the labor force classification in the first month for persons in households that were respondents the second month compared to persons who were in households that were noninterviews the second month. Persons from households that became nonrespondents had significantly higher rates of participation in the labor force, employment, and unemployment than persons from households that were respondents both months. The bottom half of Table 16−4 shows the labor force classification for the second month for persons in households that were respondents in the previous month compared to persons who were in households that were noninterviews the previous month. The pattern of differences is similar, but the magnitude of the differences is less. Because the overall Type A noninterview rate is quite small, the effect of the nonresponding households on the overall unemployment rate is also relatively small. However, it is important to remember that the labor force characteristics of the persons in households who never responded are not measured or included here. In addition, other nonsampling errors are also likely present, such as those due to repeated interviewing or month-in-sample effects (described later in this chapter). Item Nonresponse Another component of nonresponse is item nonresponse. Respondents may refuse or may be unable to answer certain items, but still respond to most of the CPS questions with only a few items missing. To examine the prevalence of item nonresponse in the CPS, counts were made of the refusals and ‘‘don’t know’’ responses from 10 demographic items and 94 labor force items (due to skip patterns only some of these questions were asked of each respondent) for all interviewed cases from January to December 1994. The average levels of item-missing data Table 16–5. Percentage of CPS Items With Missing Data Item series Percent Demographic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Industry and occupation . . . . . . . . . . . . . . . . . . . . . . . . . . Earnings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.54 1.46 3.76 12.44 from most of the CPS labor force and demographic items are shown in Table 16−5. It is likely that the bias due to imputation for the demographic and labor force items is quite small, but there may be a concern for earnings data. RESPONSE VARIANCE Estimates of the Simple Response Variance Component To obtain an unbiased estimate of the simple response variance, it is necessary to have at least two independent measurements of the characteristic for each person in a subsample of the entire sample using the identical measurement procedure on each trial. It is also necessary that responses to a second interview not be affected by the response obtained in the first interview. Two difficulties occur in every attempt to measure the simple response variance by a reinterview. The first is lack of independence between the original interview and the reinterview; a person visited twice within a short period and asked the same questions may tend to remember his/her original responses and repeat them. A second difficulty is that the data collection methods used in the original interview and in the reinterview are seldom the same. Both of these difficulties exist in the CPS reinterview program data used to measure simple response variance. For example, the interviews in the response variance reinterview sample are conducted by telephone by senior interviewers or supervisory personnel. Also, some characteristics of the reinterview survey process itself introduce 16–6 Table 16–6. Comparison of Responses to the Original Interview and the Reinterview Original interview Reinterview Not unemployed Total a c a+c b d b+d a+b c+d n=a+b+c+d unknown biases in the estimates (e.g., a high noninterview rate.) It is useful to consider a schematic representation of the results of the original CPS interview and the reinterview in estimating, say, the number of persons reported as unemployed (see Table 16-6). In this schematic representation, the number of identical persons having a difference between the original interview and the reinterview for the characteristic unemployed is given by (b + c). If these responses are two independent measurements for identical persons using identical measurement procedures, then an estimate of the simple response variance component is given by (b + c) /2n and E b⫹c ( ) 2n 2 sum of the simple response variance, ␴R, and the popu2 Unemployed Unemployed . . . . . . . Not unemployed . . . . Total . . . . . . . . . . . . . . denominator of this ratio, the population variance, is the 2 = ␴R and as in formula 14.3 (Hansen, Hurwitz, and Pritzker, 1964)3. We know that (b + c)/2n, using CPS reinterview 2 data, underestimates ␴R, chiefly because conditioning sometimes takes place between the two responses for a specific person resulting in correlated responses, so that fewer differences (i.e., entries in the b and c cells) occur than would be expected if the responses were in fact independent. Besides its effect on the total variance, the simple response variance is potentially useful to evaluate the precision of the survey measurement methods. It can be used to evaluate the underlying difficulty in assigning individuals to some category of a distribution on the basis of responses to the survey instrument. As such, it is an aid in determining whether a concept is sufficiently measurable by a household survey technique and whether the resulting survey data fulfill their intended purpose. For characteristics having ordered categories (e.g., number of hours worked last week), the simple response variance is helpful in determining whether the detail of the classification is too fine. To provide a basis for this evaluation, the estimated simple response variance is expressed, not in absolute terms, but as a proportion of the estimated total population variance. This ratio is called the index of inconsistency and has a theoretical range of 0.0 to 100.0 when expressed as a percentage (Hansen, Hurwitz, and Pritzker, 1964). The 3 The expression (b+c)/n is referred to as the gross difference rate; thus, the simple response variance is estimated as one-half the gross difference rate. lation sampling variance, ␴S. When identical responses are obtained from trial to trial, the simple response variance is zero and the index of inconsistency has a value of zero. As the variability in the classification of an individual over repeated trials increases (and the measurement procedures become less reliable), the value of the index increases until, at the limit, the responses are so variable that the simple response variance equals the total population variance. At the limit, if a single response from N individuals is required, equivalent information is obtained if one individual is randomly selected and interviewed N times, independently. Two important inferences can be made from the index of inconsistency for a characteristic. One is to compare it to the value of the index that could be obtained for the characteristic by best (or preferred) set of measurement procedures that could be devised. The second is to consider whether the precision of the measurement procedures indicated by the level of the index is still adequate to serve the purposes for which the survey is intended. In the CPS, the index is more commonly used for the latter purpose. As a result, the index is used primarily to monitor the measurement procedures over time. Substantial changes in the indices that persist for several months result in review of field procedures to determine and remedy the cause. Table 16−7 provides estimates of the index of consistency shown as percentages for selected labor force characteristics for the period of February through July 1994. Table 16–7. Index of Inconsistency for Selected Labor Force Characteristics February–July 1994 Labor force characteristic Working, full time . . . . . . . . . . . . . . . . . . . . . . Working, part time . . . . . . . . . . . . . . . . . . . . . With a job, not at work. . . . . . . . . . . . . . . . . . Unemployed. . . . . . . . . . . . . . . . . . . . . . . . . . . Not in labor force . . . . . . . . . . . . . . . . . . . . . . Estimate of index of inconsistency 90 percent confidence limits 9.4 22.6 30.1 30.8 8.1 8.4 to 10.6 20.3 to 25.0 24.5 to 37.0 25.9 to 36.5 7.1 to 9.2 MODE OF INTERVIEW Incidence of Telephone Interviewing As described in Chapter 6, the first and fifth months’ interviews are typically done in person while the remaining interviews may be done over the telephone either by the field interviewer or an interviewer from a centralized telephone facility. Although the first and fifth interviews are supposed to be done in person, the entire interview may 16–7 not be completed in person, and the field interviewer may call the respondent back to obtain missing information. The CPS CAPI instrument records whether the last contact with the household was by telephone or personal visit. The percentage of CAPI cases from each month-in-sample that were completed by telephone are shown in Table 16−8. Overall, about 66 percent of the cases are done by telephone, with almost 85 percent of the cases in monthin-samples 2-4 and 6-8 done by telephone. Furthermore, a substantial percentage of cases in months 1 and 5 are obtained by telephone, despite standing instructions to the field interviewers to conduct personal visit interviews. Table 16–8. Percentage of Households With Completed Interviews With Data Collected by Telephone (CAPI Cases Only) [Percent] Month-in-sample 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Last contact with household was telephone (average, Jan.–Dec. 1995) Majority of data collected by telephone (June 1996) 9.6 80.2 83.1 83.6 23.5 83.1 84.1 84.5 15 79 83 83 34 82 84 85 Because the indicator variable in the CPS instrument reflects only the last contact with the household, it may not be the best indicator of how most of the data were gathered from a household. For example, an interviewer may obtain information for several members of the household during the first month’s personal visit, but may make a telephone call back to obtain the labor force data for the last household member resulting in the interview being recorded as a telephone interview. In June 1996, an additional item was added to the CPS instrument (for that month only) that asked interviewers whether the majority of the data for each completed case was obtained by telephone or personal visit. The results using this indicator are presented in the second column of Table 16-8. It was expected that in MIS 1 and 5 interviewers would have reported that the majority of the data was collected though personal visit more often than was revealed by the last contact with the household. This would seem likely because, as noted above, the last contact with a household may be a telephone call back to obtain missing information not collected at the initial personal visit. However, a higher percentage of cases were reported with the majority of the data being collected by telephone than the percentage of cases with the last contact with the household being by telephone for MIS 1 and 5. The explanation for this pattern of results is not clear at the present time. Effects of Centralized Telephone Interviewing With the implementation of the revised CPS in January 1994, there was an increased reliance on the use of the Census Bureau’s centralized telephone centers for conducting CPS interviews. As noted in Chapter 4, only cases in CATI eligible PSUs can be sent to the CATI centers for interviewing. Furthermore, all cases within eligible PSUs were randomly assigned to the CATI panel or the control (CAPI) panels, so that meaningful comparisons could be made on the effects of centralized interviewing on estimates from CPS. All cases were interviewed by CAPI in MIS 1 and 5. In months 2-4 and 6-8, most, but not all of the cases in the CATI panel were interviewed in the centralized facilities, because some cases were still interviewed by field staff in CAPI for a variety of reasons. Nonetheless, to preserve the integrity of the research design, the results shown in Table 16−9 reflect the panel that each case was assigned to, which was also the actual mode of interview for the vast majority of cases Table 16−9 shows the results of the comparisons of the CATI test and control panels. There were significant differences found between the CATI test and control panels in MIS 2-4 and 6-8 for all groups on the unemployment rate. There were no significant differences in the unemployment rate in MIS 1 and 5 (and none would be expected given random assignment); however, the observed differences were not necessarily equal to zero. These initial differences may have had some slight effects on the size of the differences observed in MIS 2-4 and 6-8 for some of the groups. There were no significant differences for the civilian labor force participation rate or the employment to population ratio. Although there are differences in unemployment rates between the centralized CATI and CAPI interviews as noted above4, the interpretation of the findings is not as clear. One major difference between CATI and CAPI interviews is that CAPI interviews are likely to be done by the same interviewer all 8 MIS, while CATI interviews involve a definite change in interviewers from MIS 1 to 2 and from MIS 5 to 6. In fact, cases in the CATI facilities often have a different interviewer each month. However, it is not clear whether the results from the centralized CATI or the CAPI interviews are more accurate. TIME IN SAMPLE The rotation pattern of the CPS sample was described in detail in Chapter 3, and the use of composite estimation in CPS was discussed briefly in Chapter 9. The effects of interviewing the same respondents for CPS several times has been discussed for a long time (e.g., Bailar, 1975; Brooks and Bailar, 1978; Hansen, Hurwitz, Nisselson, and Steinberg, 1955; McCarthy, 1978; Williams and Mallows, 4 These findings were also similar to those from a test conducted using the parallel survey (see Thompson, 1994). 16–8 Table 16–9. Effect of Centralized Telephone Interviewing on Selected Labor Force Characteristics [Average January 1996 - December 1996] MIS 1 and 5 Employment status and sex MIS 2-4 and 6-8 CATI test Control (CAPI) Difference CATI test Control (CAPI) Difference Standard error P-value 66.89 63.20 5.52 67.20 63.60 5.36 −0.31 −0.40 0.16 66.47 62.79 5.53 66.44 63.20 4.87 0.03 −0.41 0.66 1.09 1.06 0.21 .98 .70 .00 75.33 71.19 5.49 76.35 72.23 5.40 −1.02 −1.03 0.09 74.96 70.85 5.49 75.31 71.63 4.88 −0.35 −0.79 0.61 1.71 1.66 0.28 .84 .64 .03 59.50 56.20 5.55 59.19 56.05 5.30 0.31 0.14 0.25 59.04 55.75 5.57 58.68 55.83 4.85 0.36 −0.08 0.72 1.18 1.12 0.21 .76 .94 .00 67.50 64.22 4.86 67.59 64.37 4.76 −0.08 −0.15 0.11 67.16 63.94 4.79 66.89 63.97 4.37 0.28 −0.02 0.43 1.35 1.33 0.16 .84 .99 .01 76.44 72.67 4.94 77.38 73.74 4.70 −0.93 −1.07 0.24 76.14 72.52 4.76 76.31 72.98 4.37 −0.17 −0.46 0.39 1.98 1.94 0.23 .93 .81 .10 59.42 56.59 4.78 58.80 55.97 4.82 0.62 0.62 −0.05 59.07 56.22 4.83 58.43 55.88 4.36 0.64 0.34 0.47 1.35 1.29 0.20 .63 .79 .02 63.30 56.94 10.04 65.44 58.85 10.06 −2.14 −1.91 −.02 62.12 55.55 10.59 64.22 58.44 9.01 −2.10 −2.89 1.58 4.22 3.99 0.79 .62 .47 .04 67.23 60.33 10.26 70.24 61.76 12.08 −3.01 −1.43 −1.82 66.45 58.73 11.62 69.21 62.07 10.31 −2.75 −3.34 1.30 5.47 4.97 0.72 .61 .50 .07 60.55 54.57 9.87 62.06 56.81 8.46 −1.51 −2.24 1.41 59.03 53.27 9.75 60.75 55.90 7.97 −1.72 −2.63 1.79 4.75 4.51 0.83 .72 .56 .03 TOTAL POPULATION, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Employment to population ratio . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . MALES, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Employment to population ratio . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . FEMALES, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Employment to population ratio . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . WHITE, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Employment to population ratio . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . WHITE MALES, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Employment to population ratio . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . WHITE FEMALES, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Employment to population ratio . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . BLACKS, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Employment to population ratio . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . BLACK MALES, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Employment to population ratio . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . BLACK FEMALES, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Employment to population ratio . . . . . . . . Unemployment rate . . . . . . . . . . . . . . . . . . 1970). It is possible to measure the effect of the time spent in the sample on labor force estimates from the CPS by creating a month-in-sample index which shows the relationships of all the month-in-sample groups. This index is the ratio of the estimate based on the sample units in a particular month-in-sample group to the average estimate from all eight month-in-sample groups combined, multiplied by 100. If an equal percentage of people with the characteristic are present in each month-in-sample group, then the index for each group would be 100. Table 16−10 shows indices for each group by the number of months they have been in the sample. The indices are based on CPS labor force data from September to December 1995. For the percentage of the total population that is unemployed, the index of 108.62 for the first month households indicates that the estimate from households in sample the first month is about 1.0862 times as great as the average overall eight month-in-sample groups; the index of 99.57 for unemployed for the second month households 16–9 indicates that it is about the same as the average overall month-in-sample groups. It should not be assumed that estimates from one of the month-in-sample groups should be taken as the standard in the sense that it would provide unbiased estimates, while the estimates for the other seven would be biased. It is far more likely that the expected value of each group is biased—but to varying degrees. Total CPS estimates, which are the combined data from all month-in-sample groups (see Chapter 9), are subject to biases that are functions of the biases of the individual groups. Since the expected values vary appreciably among some of the groups, it follows that the CPS survey conditions must be different in one or more significant ways when applied separately to these subgroups. One way in which the survey conditions differ among rotation groups is reflected in the noninterview rates. The interviewers, being unfamiliar with households in sample for the first time, are likely to be less successful in calling when a responsible household member is available. Thus, noninterview rates generally start above average with first month families, decrease with more time in sample, go up a little for households in sample the fifth month (after the 8 months the household was not in sample), and then decrease again in the final months. (This pattern may also reflect the prevalence of personal visit interviews conducted for each of the months-in-sample as noted above.) An index of the noninterview rate constructed in a similar manner to the month-in-sample group index can be seen in Table 16−11. This noninterview rate index clearly shows that the greatest proportion of noninterviews occurs for cases that are in the sample for the first month and the second largest proportion of noninterviews are due to cases returning to the sample in the fifth month. These Table 16–10. Month-In-Sample Bias Indexes (and Standard Errors) in the CPS for Selected Labor Force Characteristics [Average September - December 1995] Month-in-sample Employment status and sex 1 2 3 4 5 6 7 8 101.34 (.27) 108.62 (2.46) 100.26 (.26) 99.57 (2.59) 99.93 (.25) 98.27 (2.28) 100.32 (.25) 103.33 (2.59) 99.5 (.28) 99.19 (2.65) 99.5 (.24) 99.39 (2.42) 99.68 (.29) 98.74 (2.46) 99.48 (.30) 92.89 (2.66) 100.98 (.31) 107.65 (3.45) 100.26 (.31) 98.13 (3.53) 99.87 (.31) 97.19 (3.03) 100.34 (.26) 102.94 (3.41) 99.98 (.34) 103.60 (3.64) 99.55 (.29) 99.29 (3.22) 99.62 (.32) 95.60 (3.33) 99.40 (.34) 95.59 (3.73) 101.75 (.42) 110.09 (3.65) 100.26 (.38) 101.46 (3.44) 100.00 (.38) 99.62 (3.17) 100.29 (.41) 103.53 (3.42) 98.94 (.39) 93.80 (3.43) 99.44 (.37) 99.79 (3.61) 99.76 (.45) 102.25 (3.42) 99.56 (.47) 89.47 (3.16) 102.66 (1.05) 116.50 (5.84) 102.82 (.97) 104.13 (5.41) 100.04 (.99) 103.44 (5.29) 98.95 (.90) 106.24 (6.17) 99.19 (1.07) 94.56 (5.46) 99.03 (1.06) 85.48 (5.14) 99.11 (.90) 92.83 (4.94) 98.21 (1.04) 96.82 (6.00) 101.17 (1.45) 114.46 (9.48) 101.40 (1.34) 100.86 (8.33) 100.00 (1.16) 100.47 (7.49) 100.88 (1.17) 109.39 (8.97) 100.31 (1.29) 97.38 (8.74) 98.73 (1.28) 80.47 (7.19) 99.99 (1.29) 93.32 (6.97) 97.52 (1.52) 103.65 (8.95) 104.00 (1.40) 117.30 (8.72) 104.13 (1.31) 107.78 (7.00) 100.06 (1.40) 106.40 (7.30) 97.17 (1.25) 104.24 (7.92) 98.16 (1.45) 91.10 (6.76) 99.32 (1.38) 90.86 (6.86) 98.31 (1.27) 92.62 (7.42) 98.86 (1.30) 89.71 (7.49) TOTAL POPULATION, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . Unemployment level . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . MALES, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . Unemployment level . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . FEMALES, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . Unemployment level . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . BLACK, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . Unemployment level . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . BLACK MALES, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . Unemployment level . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . BLACK FEMALES, 16 YEARS OLD AND OVER Civilian labor force . . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . . Unemployment level . . . . . . . . . . . . . . . . . . Standard error . . . . . . . . . . . . . . . . . . . . 16–10 Table 16–11. Month-In-Sample Indexes in the CPS for Type A Noninterview Rates January– December 1994 Month-in-sample 1 2 3 4 5 6 7 8 .............................................. .............................................. .............................................. .............................................. .............................................. .............................................. .............................................. .............................................. Noninterview rate index 134.0 93.1 89.3 90.0 117.9 95.6 92.4 88.2 Table 16–12. Percentage of CPS Labor Force Reports Provided by Proxy Reporters Percent reporting All interviews Proxy reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Both self and proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49.68 0.28 MIS 1 and 5 Proxy reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Both self and proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50.13 0.11 MIS 2-4 and 6-8 Proxy reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Both self and proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49.53 0.35 noninterview changes are unlikely to be distributed proportionately among the various labor force categories (see Table 16−4). Consequently, it is reasonable to assume that the representation of labor force categories will differ somewhat among the month-in-sample groups and that these differences can affect the expected values of these estimates, which would imply that at least some of the month-in-sample bias can be attributed to actual differences in response probabilities among the month-in-sample groups (Williams and Mallows, 1970). Although the individual probabilities are not known, they can be estimated by nonresponse rates. However, other factors are also likely to affect the month-in-sample patterns. PROXY REPORTING Like many household surveys, the CPS seeks information about all persons in the household whether they are available for interview or not. CPS field representatives accept reports from responsible adults in the household (see Chapter 6 for a discussion of respondent rules) to provide information about all household members. Respondents who provide labor force information about other household members are called proxy reporters. Because some household members may not know or be able to provide accurate information about the labor force status and activities of some household members, nonsampling error may occur because of the use of proxy reporters. The level of proxy reporting in the CPS had generally been around 50 percent in the past and continues to be so in the revised CPS. As can be seen in Table 16−12, the month-in-sample has very little effect on the level of proxy reporting. Thus, whether the interview is more likely to be a personal visit (for MIS 1 and 5) or a telephone interview (MIS 2-4 and 6-8) has very little effect. Although one can make overall comparisons of the data given by self-reporters and proxy reporters, there is an inherent bias in the comparisons because proxy reporters were the people more likely to be found at home when the field representative called or visited. For this reason, household members with self- and proxy reporters tend to differ systematically on important labor force and demographic characteristics. In order to compare the data given by self- and proxy reporters systematic studies must be conducted that control assignment of proxy reporting status. Such studies have not been carried out. SUMMARY This chapter contains a description of several quality indicators in the CPS, namely, coverage, noninterview rates, telephone interview rates, and proxy reporting rates. These rates can be used to monitor the processes of conducting the survey, and they indicate the potential for some nonsampling error to enter into the process. This chapter also includes information on the potential effects of nonresponse, centralized telephone interviewing, and the month-in-sample groups on CPS estimates. This research comes close to identifying the presence and effects of nonsampling error in the CPS, but the results are far from conclusive. The full extent of nonsampling error in the CPS is unknown. Because of the number and scope of changes that have occurred in the questionnaire as well as in data collection, it is also currently unclear how much previous research on nonsampling error in the CPS is applicable to the redesigned instrument with computerized data collection. Furthermore, new sources of nonsampling error may have been introduced by the change in data collection methodology and with changes in questions. Although some potential sources of nonsampling error can be investigated with the currently available CPS data and reinterview data, the best source of information on nonsampling error is often information provided by an external or outside source or through a special study or experiment. At the time of this writing, there is currently about 3 years’ data available for internal analyses and comparison. REFERENCES Bailar, B.A. (1975), ‘‘ The Effects of Rotation Group Bias on Estimates from Panel Surveys,’’ Journal of the American Statistical Association, 70, 23-30. Brooks, C.A. and Bailar, B.A. (1978). Statistical Policy Working Paper 3 An Error Profile: Employment as Measured by the Current Population Survey, Subcommittee on Nonsampling Errors, Federal Committee on Statistical Methodology, U.S. Department of Commerce, Washington, DC. 16–11 Butani, S., Kojetin, B.A., and Cahoon, L. (1996), ‘‘Federal Government Shutdown: Options for Current Population Survey (CPS) Data Collection,’’ Paper presented at the Joint Statistical Meetings of the American Statistical Association, Chicago, Illinois. Hansen, M.H., Hurwitz, W.N., Nisselson, H., and Steinberg, J. (1955), ‘‘The Redesign of the Current Population Survey,’’ Journal of the American Statistical Association, 50, 701-719. Hansen, M.H., Hurwitz, W.N., and Pritzker, L. (1964), ‘‘Response Variance and Its Estimation,’’ Journal of the American Statistical Association, 59, 1016-1041. Hanson, R. H. (1978), The Current Population Survey: Design and Methodology, Technical Paper 40, Washington, DC: Government Printing Office. Harris-Kojetin, B.A. and Tucker, C. (1997), ‘‘Longitudinal Nonresponse in the Current Population Survey,’’ Paper presented at the 8th International Workshop on Household Survey Nonresponse, Mannheim, Germany. McCarthy, P.J. (1978), Some Sources of Error in Labor Force Estimates From the Current Population Survey, Background Paper No. 15, National Commission on Employment and Unemployment Statistics, Washington, DC. Thompson, J. (1994). ‘‘Mode Effects Analysis of Labor Force Estimates,’’ CPS Overlap Analysis Team Technical Report 3, Bureau of Labor Statistics and U.S. Census Bureau, Washington, DC. Tucker, C. and Harris-Kojetin, B.A. (1997), ‘‘The Impact of Nonresponse on the Unemployment Rate in the Current Population Survey,’’ Paper presented at the 8th International Workshop on Household Survey Nonresponse, Mannheim, Germany. Williams, W.H. and Mallows, C.L., (1970), ‘‘Systematic Biases in Panel Surveys Due to Differential Nonresponse,’’ Journal of the American Statistical Association, 65, 1338-1349. A–1 Appendix A. Maximizing Primary Sampling Unit (PSU) Overlap INTRODUCTION The sampling of primary sampling units (PSUs) for the redesigned CPS was accomplished using linear programming techniques that maximized the overlap with the PSUs from the former 1980s design while maintaining the correct overall probabilities of selection for a probability proportional to size (pps) sampling scheme. This strategy enables the fullest use of field staff who are already hired and trained and results in substantial cost savings. Procedures are already in place in the previously existing PSUs for identifying and sampling new construction housing units. In addition to the cost savings, a large PSU overlap prevents overly large breaks in data series that can occur when many PSUs are changed with the introduction of a new design. There were 379 nonself-representing (NSR) PSUs in the 1980 CPS design. (There were some small rotating PSUs, but only those still in sample at the end of the 1980 design are counted.) The maximizing procedure retained 249 (66 percent) of these NSR PSUs for the 1990 design. Had selection for the 1990 design been independent of the 1980 design, only 149 (39 percent) would have been retained. Methods of solving the problem of maximizing PSU overlap, while maintaining the correct unconditional design probabilities, have evolved over several decades. A limited solution was presented by Keyfitz (1951) for one PSU per stratum pps designs similar to the CPS. The definitions of individual PSUs were held constant, and the strata were assumed identical. Only the PSU probabilities of selection were allowed to be different for the former and new designs, thus responding to relative PSU population size changes over a decade. Raj (1968) demonstrated that the problem solved by Keyfitz could be reformulated as a transportation problem, which is commonly solved using linear programming techniques. Causey, Cox, and Ernst (1985) generalized the transportation problem approach to allow PSU redefinitions, changes in stratum composition, and more than one sample PSU per stratum. The generalization assumes the former design sampling to be independent within strata, which does not hold for the CPS since the 1980s design itself had PSU overlap maximized from the earlier 1970s design. This and other practical considerations are addressed by various authors, including Ernst (1986, 1990). To illustrate the concepts, consider the following example adapted from Causey et al. (1985). The PSUs and strata do not change across designs, and a particular stratum has three PSUs with probabilities p1.=.36, p2.=.24 and p3.=.4 for the former design. Over the decade, PSUs 1 and 2 have grown in size relative to PSU 3. Were sampling done anew without regard for the former design, then p.1=.5, p.2=.3, and p.3=.2 would be the probabilities for the new design. To obtain the maximum overlap of PSUs between the former and new designs, without restraint, we would arbitrarily choose for the new design any PSU that had been sampled for the former design. The implication of the arbitrary strategy for the example would be that PSU 3, which has been losing in relative size and is now the least populous PSU in the stratum, would retain the highest probability of selection (.4 from the former design). In general, the arbitrary strategy introduces bias by favoring declining PSUs over other PSUs, thus causing growing PSUs to be underrepresented. Keyfitz’s method of maximizing PSU overlap while maintaining the new design probabilities is intuitive. If PSU 1 was actually sampled for the former design, include it for the new design since the design probability has increased over the decade from p1.=.36 to p.1=.5. (We say the conditional probability of choosing PSU 1 for the new design, given that PSU 1 was sampled for the former design, is 1.0 or certainty.) Similarly, the design probability of PSU 2 has increased from p2.=.24 to p.2 =.3, so if it was actually sampled for the former design then include it for the new design. (The conditional probability of choosing PSU 2 for the new design, given that PSU 2 was sampled for the former design, is 1.0 or certainty.) On the other hand, the design probability of selection for PSU 3 has decreased from p3.=.4 to p.3=.2. If PSU 3 was the one selected for the former design, the probability can be cut down to size by giving it a 1/2 chance of remaining for the new design and a 1/2 chance of being replaced. (The conditional probability of choosing PSU 3 for the new design, given that PSU 3 was sampled for the former design, is .5 or 1/2.) The overall probability of PSU 3 being in the sample for the new design is .4 x 1/2 = .2, which is exactly the probability required for the new design. The following tables of conditional and joint probabilities have been partially completed. (A joint probability pij is the probability that the ith PSU was sampled for the former design and the jth PSU for the new design.) The question marks (?) indicate that if PSU 3 was sampled for the former design, but not conditionally reselected for the new design, then one of the other PSUs needs to be selected. A–2 Conditional probabilities Joint probabilities pij New PSU j Former PSU i i=1 p1.=.36 i=2 p2.=.24 i=3 p3.=.4 j=1 1.0 0 ? p.1=.5 j=2 0 1.0 ? p.2=.3 New PSU j j=3 0 0 .5 p.3=.2 The joint probability table is very easy to construct. The rows must add to the former design sampling probabilities and the columns must add to the new design probabilities. Also, the pij entries must all add to 1.0. The joint probabilities shown are the products of the initial former design probabilities times the conditional probabilities (diagonal entries: .36 x 1.0 = .36 for PSU 1; .24 x 1.0 = .24 for PSU 2; .4 x .5 = .2 for PSU 3). For the PSU 1 row, the offdiagonal joint elements are zero; that is, never choose a Former PSU i i=1 p1.=.36 i=2 p2.=.24 i=3 p3.=.4 j=1 .36 0 ? p.1=.5 different PSU for the new design if PSU 1 was sampled for the former design. However, in the PSU 1 column something is missing. There is a .36 chance that PSU 1 was sampled for the former design, then carried over to the new design, but to give PSU 1 the desired .5 unconditional probability for the new design, it needs an extra .14 in its column. Similarly, PSU 2 needs an extra .06 in its column. The completed tables follow. Conditional probabilities Former PSU i i=1 p1.=.36 i=2 p2.=.24 i=3 p3.=.4 j=1 1.0 0 .35 new p.1=.5 New PSU j j=2 0 1.0 .15 p.2=.3 Joint probabilities pij j=3 0 0 .5 p.3=.2 The rows and columns of the joint probability table now add properly. Any conditional probability can be derived by dividing the corresponding joint probability by the former design probability of the row (pij/pi.). For example, .35=.14/.4, .15=.06/.4, and .5=.2/.4 in the PSU 3 row. Any joint probability can be derived by multiplying the corresponding joint probability by the former design probability of the row. Continuing with the PSU 3 row, .14 = .35 x .4, .06 = .15 x .4, and .2 = .5 x .4. The diagonal sum of .8 on the joint probability table is the unconditional probability of including the same PSU in the sample of both the former and new designs. It is as large as possible given that additivity to both the former probabilities (rows) and unconditional new probabilities must be preserved. That is, the diagonal sum has been maximized subject to row and column constraints. A mathematical description of the problem is: 3 maximize 兺 pii i⫽1 subject to 兺 pij ⫽ pi. i⫽1,2,3 j⫽1 3 3 and j=2 j=3 0 0 .24 0 ? .2 p.2=.3 p.3=.2 diagonal sum = .36 +.24 +.2 = .8 兺 pij ⫽ p .j j⫽1,2,3 i⫽1 This is a special case of the maximizing/minimizing problem that is called a transportation problem. The pij of the general transportation problem can be any nonnegative Former PSU i i=1 p1.=.36 i=2 p2.=.24 i=3 p3.=.4 New PSU j j=2 j=3 0 0 .24 0 .06 .2 p.2=.3 p.3=.2 diagonal sum = .36+.24 +.2 = .8 j=1 .36 0 .14 p.1= .5 variable, and the pi. and p.j are prespecified constants. n m maximize 兺兺 cij pij i⫽1 j⫽1 subject to 兺 pij ⫽ pi. i⫽1,..,n j⫽1 m n n and 兺 pij ⫽ p.j j⫽1,..,m i⫽1 m 兺 p.j 兺 pi. ⫽ j⫽1 i⫽1 It is this generalized form of the transportation problem that has been adapted for the CPS. For the 1990s design, PSUs were redefined. In nonself-representing areas, new 1990s design strata were formed. A new stratum can be formed from pieces of several already existing strata. Even a simple example indicates the type of difficulties that may arise when the stratum structure is changed. Suppose a stratum is unchanged, except that to one PSU a county has been added from a previously existing PSU from a different previously existing stratum. The new design unconditional probabilities of selection are no more difficult to calculate than before, but several factors complicate maximizing the overlap. Now two previously existing strata contribute to the new stratum. For the unchanged PSUs, there is still a 1-1 matchup between the former design and the new design, but the changed PSU is linked to two previously existing PSUs, each of which had a chance of selection. A–3 Note that the added county had no independent chance of selection in the former design, but certainly contributed to the chance of selection of its previously existing PSU. Joint former/new design probabilities are determined that maximize the overlap between designs, taking into account such difficulties and maintaining the proper probability structure. A few numerical examples illustrate how this and other simple changes in the stratum PSU structure are dealt with. a only Select PSU a with conditional probability 1.0, since the former design probability .288 is less than the new probability .5. Enter .288 as the joint probability. a and d We want to select PSU a, if possible, since its relative size (.5) is much larger than the size of the cadd county of PSU d (.05). Since the sum of former design probabilities (.288 + .072) is less than the new design probability .5, PSU a can be selected with conditional probability 1.0. b only Select PSU b since .192 is less than the new probability .3. b and d Select PSU b since its relative size (.3) is much larger than the cadd county size (.05), and .192 + .048 < .3. cold and d Select PSU cnew. This possibility takes precedence over cold only, since there is more overlap. cold only It is advantageous to select cnew, but an extra joint probability of .12=.2–.08 is all that can be allowed (the column must add to .2). The corresponding conditional probability is .375= .12/.32. PSU cold comprises three-fourths of PSU cnew, and that factor is entered in parentheses after the joint probability. The entries in the PSU a and b columns ensure the proper additivity of the joint probabilities, and the (0) after the joint probabilities indicate no overlap. ADDING A COUNTY TO A PSU Suppose a stratum changes between the former and new designs only by adding a county to one of its three PSUs. To simplify notation, let the stratum in the former design consist of PSUs a, b, and cold with probabilities of selection p(a)=.36, p(b)=.24, and p(cold)=.4. In the new design, the stratum consists of PSUs a, b, and cnew with probabilities of selection p(a)=.5, p(b)=.3, and p(cnew)=.2. In response to a declining population, county cadd has been added to cold to create cnew. Suppose cadd was part of former design PSU d, and let p(d)=.2 be the former design probability of selection for PSU d in its stratum. Also suppose cadd accounts for one-fourth of the new design selection size of cnew (cadd has size .05). The Former Design Probabilities column in the table below gives all of the former design selection possibilities and their probabilities that sum to 1: either a, b, or cold was sampled; PSU d may or may not have been sampled. The joint probability entries are constrained to sum to the probabilities of selection for both the former design (rows) and the new design PSUs (columns). The computer programs that solve the general transportation problem ensure that the maximum sum of joint overlap probabilities is achieved (.77). The following reasoning can be used to arrive at the solution. Former design probabilities p(a only) = .36(1-.2) =.288 p(a and d)= .36(.2) =.072 p(b only) = .24(1-.2) =.192 p(b and d)= .24(.2) =.048 p(cold and d)= .4(.2) =.08 p(cold only)= .4(1-.2) =.32 Conditional probabilities Joint probabilities New PSU New PSU a 1.0 1.0 0 0 0 .4375 b 0 0 1.0 1.0 0 .1875 cnew 0 0 0 0 1.0 .375 a .288 (1) .072 (1) 0 0 0 .14 (0) b 0 0 .192 (1) .048 (1) 0 .06 (0) cnew 0 0 0 0 .08 (1) .12 (3/4) p(a)= .5 p(b)=.3 p(cnew)=.2 p(a)= .5 p(b)=.3 p(cnew)=.2 overlap sum = .288 +.072 + .192 + .048 + .08 + .12 (3/4) = .77 DROPPING A COUNTY FROM A PSU Suppose a stratum changes between the former and new designs only by dropping a county from one of its three PSUs. Let the stratum in the former design consist of PSUs a, b, and cold with probabilities of selection p(a)=.36, p(b)=.24, and p(cold)=.4. In the new design, the stratum consists of PSUs a, b, and cnew with probabilities of selection p(a)=.5, p(b)=.3, and p(cnew)=.2. County cdrop has been dropped from cold to create cnew. Note that dropping A–4 a county explains part of the former-to-new design decrease in probability for PSU c, but the solution for maximum overlap is really the same as for the first example in this appendix (see the table below). The (1) after the .2 joint Former design probabilities probability entry means that (1) if cold was sampled for the stratum in the former design and (2) cnew is selected for the stratum in the new design, then(3) all of cnew overlaps with the former design. Conditional probabilities Joint probabilities New PSU a 1.0 0 .35 p(a)=.5 p(a)= .36 p(b)= .24 p(cold)= .4 b 0 1.0 .15 p(b)=.3 New PSU cnew 0 0 .5 p(cnew)=.2 a .36 (1) 0 .14 (0) p(a)=.5 b 0 .24 (1) .06 (0) p(b)=.3 cnew 0 0 .2 (1) p(cnew)=.2 Overlap sum = .36 +.24 + .2 = .8 DROPPING AND ADDING PSU’S b only Suppose one PSU is dropped from a stratum and another added. Let the stratum in the former design consist of PSUs a, b, and c with probabilities of selection p(a)=.36, p(b)=.24, and p(c)=.4. In the new design, the stratum consists of PSUs b, c, and d (drop a; add d) with probabilities of selection p(b)=.3, p(c)=.2, and p(d)=.5. In the former design p(d)=.5 in its stratum. The Former Design Probabilities column in the table below gives all of the former design selection possibilities and their probabilities that sum to 1: either a, b or c was sampled in the previously existing stratum; PSU d may or may not have been sampled. The joint probability entries are constrained to sum to the probabilities of selection for both the former design (rows) and the new design PSUs (columns). The computer programs that solve the general transportation problem ensure that the maximum sum of joint overlap probabilities is achieved (.82). The following reasoning can be used to arrive at the solution. d only c only d and c d and b none Select PSU d with conditional probability 1.0, since the former design probability .18 Former design probabilities p(d only) = .5(1-.24-.4) =.18 p(b only)= .24(1-.5) =.12 p(c only) = .4(1-.5) =.2 p(d and c)= .5(.4) =.2 p(d and b)= .5(.24) =.12 p(none)= (1-.5)(1-.24-.4) =.18 is less than the new probability .5. Select PSU b with conditional probability 1.0, since the former design probability .12 is less than the new probability .3. Select PSU c with conditional probability 1.0, since the former design probability .2 does not exceed the new probability .2. Select PSU d with conditional probability 1.0. PSU c cannot be specified since its joint probability column already adds to the new probability .2. Select PSU d with conditional probability 1.0. PSU b could have been specified, but selecting PSU d provides more overlap with the former design. It is possible (probability .18) that none of the new stratum PSUs were sampled for the former design. Selecting PSU b with conditional probability 1.0 ensures that all the joint probability columns sum to the new design PSU probabilities. The (0) after the joint probability indicates that there is no overlap. Conditional probabilities Joint probabilities New PSU New PSU d 1.0 0 0 1.0 1.0 0 p(d)=.5 b 0 1.0 0 0 0 1.0 p(b)=.3 c 0 0 1.0 0 0 0 p(c)=.2 d .18 (1) 0 0 .2 (1) .12 (1) 0 p(d)=.5 b 0 .12 (1) 0 0 0 .18 (0) p(b)=.3 c 0 0 .2(1) 0 0 0 p(c)=.2 overlap sum = .18 +.12 + .2 + .2 + .12 = .82 A–5 REFERENCES Causey, B. D., Cox, L.H., and Ernst, L.K. (1985), ‘‘Applications of Transportation Theory to Statistical Problems,’’ Journal of the American Statistical Association, 80, 903-909. Ernst, L. (1986), ‘‘Maximizing the Overlap Between Surveys When Information Is Incomplete,’’ European Journal of Operations Research, 27, 192-200. Ernst, L. (1990), ‘‘Formulas for Approximating Overlap Probability When Maximizing Overlap,’’ U.S. Census Bureau, Statistical Research Division, Technical Note Series, SRD Research Report Number Census/SRD/TN-90/01. Gunlicks, C. (1992), ‘‘PSU Stratification and Selection Documentation: Comparison of the Overlap Program and the Expected Number of 1980 PSUs Selected Without the Overlap Program in the 1990 Redesign PSU Selection,’’ (S-S90 DC-6), Memorandum to documentation, July 14, 1992. Keyfitz, N. (1951), ‘‘Sampling With Probabilities Proportional to Size: Adjustment for Changes in Probabilities,’’ Journal of the American Statistical Association, 46, 105-109. Perkins, W. M. (1970), ‘‘1970 CPS Redesign: Proposed Method for Deriving Sample PSU Selection Probabilities Within 1970 NSR Strata,’’ Memorandum to J. Waksberg, U.S. Census Bureau, August 5, 1970. Perkins, W. M. (1971), ‘‘1970 CPS Redesign: Illustration of Computation of PSU Probabilities Within a Given 1970 Stratum,’’ Memorandum to J. Waksberg, U.S. Census Bureau, February 19, 1971. Raj, D. (1968), Sampling Theory, New York: McGraw Hill. Roebuck, M. (1992), ‘‘1990 Redesign: PSU Selection Specifications’’ (Document #2.4-R-7), Memorandum to B. Walter through G. Shapiro and L. Cahoon, September 25, 1992. B–1 Appendix B. Sample Preparation Materials INTRODUCTION Despite the conversion of CPS to an automated interviewing environment, a number of paper materials are still in use. These materials include segment folders, listing sheets, maps, etc. Regional office staff and field representatives use these materials as aids in identifying and locating selected sample housing units. This appendix provides illustrations and explanations of these materials by frame (unit, area, group quarters, and permit). The information provided here should be used in conjunction with the information in Chapter 4 of this document. UNIT FRAME MATERIALS Segment Folders, (BC-1669 (CPS)) Illustration 1 A field representative receives a segment folder for each segment assigned for listing, updating, or interviewing. Segment folders were designed to: 1. Hold all materials needed to complete an assignment. 2. Provide instructions on listing, updating, and sampling. The segment folder cover provides the following information about a segment: 1. Identifying information about the segment such as the regional office code, the field primary sampling unit (PSU), place name, sample designations for the segment, and basic geography. 2. Instructions for listing and updating. Part I of the segment folder contains ‘‘Field Representative Listing and Updating Instructions.’’ This information is provided by the regional office and varies by segment type. 3. Sampling instructions. Part II on the segment folder is used only for area and group quarters segments. The regional offices use this section, ‘‘Regional Office Sampling Instructions,’’ to record Start-with/Take-everys as described in Chapter 4. 4. Instructions for precanvassing and determining year built in area segments. Part III on the segment folder is reserved for stamps that provide the field representatives with information on whether or not to precanvass and whether or not to determine the year a structure was built. 5. Helpful information about the segment entered by the regional office or the field representative in the Remarks Section. A segment folder for a unit segment may contain some or all of the following: Unit/Permit Listing Sheets (Form 11-3), Incomplete Address Locator Actions (Form BC-1718 (ADP)), 1990 Combined Reference File (CRF) listings, and 1990 Census Maps (county or census spotted). A segment folder for an area segment may contain some or all of the following: Area Segment Listing Sheets (Form 11-5), Group Quarters Listing Sheets (Form 11-1), Area Segment Maps, Segment Locator Maps, County Maps, and Map Legends. A segment folder for a group quarters segment may contain some or all of the following: Group Quarters Listing Sheets (Form 11-1), Incomplete Address Locator Actions (Form BC-1718 (ADP)), 1990 Census Maps (county or census spotted), and 1990 Combined Reference File listings. A segment folder for a permit segment may contain some or all of the following: Unit/Permit Listing Sheets (Form 11-3), photocopies of the listed Permit Address List (PAL) (Form 11-193A), and Permit Sketch Maps (Form 11-187). Unit/Permit Listing Sheets (Form 11-3) Illustrations 2–4 Unit/Permit Listing Sheets are provided for all multiunit addresses in unit segments. For each sample housing unit at a multiunit address, the field representative receives a preprinted, computer generated listing. This listing aids the field representative in locating the sample units and, in most cases, will eliminate the need to relist the entire multiunit address. Listings for multiunit addresses are sorted into two categories: small and large. The type of multiunit listing is identified below the listing sheet title. The differences between these listings are identified in Table B–1. Illustration 1. Segment Folder, BC-1669(CPS) B–2 B–3 Table B–1. Listings for Multiunit Addresses Listing type Expected number of units Listing characteristics Illustration number Small 2-9 Lists all expected units and could have some 2 units with missing and/or duplicate unit designations as recorded in the 1990 census Large 10 or more Lists sample units only. On rare occasions, lists 3 both sample and nonsample units 4 Large (Special Exception) 10 or more No unit designations are listed and the Footnotes Section instructs the field representative to ‘‘List all units at the basic address.’’ The field representative also receives blank Unit/Permit Listing Sheets for use when an address needs relisting or when the field representative finds more units than expected causing him/her to run out of lines on the preprinted listing sheet. Single-unit addresses in unit segments are not listed on Unit/Permit Listing Sheets. Incomplete Address Locator Actions (Form BC-1718(ADP)) Illustration 5 The Incomplete Locator Actions form is used when the information obtained from the 1990 census is in some way incomplete (i.e., missing a house number, unit designation, etc.). The form provides the field representative with additional information that can be used to locate the incomplete address. The information on the Incomplete Address Locator Actions includes: 1. The address as it appeared in the Combined Reference File and possibly a complete address resulting from research conducted by Census Bureau staff; 2. A list of additional materials provided to the field representative to aid in locating the address; and 3. An outline of the actions the field representative should take to locate the address. After locating the address, the field representative completes or corrects the basic address on the Incomplete Address Locator Actions form. 1990 Combined Reference File Listings For each assigned segment that has at least one incomplete basic address, the field representative receives a computer-generated printout of addresses from the 1990 Combined Reference File. Each printout contains one or more incomplete addresses along with complete addresses in the same block. The field representative uses the Combined Reference File listing by finding the incomplete basic address on the listing. Then the field representative uses the addresses listed directly before and after it to physically locate the incomplete address. 1990 Census Maps (County or Census Spotted) For each assigned segment that has at least one incomplete basic address, the field representative receives a 1990 county or census-spotted map. For each incomplete basic address, the following control numbers are entered in red at the top of the map: field PSU number, sample designation, segment number, and serial number. The map spot of the incomplete address will be circled in red. The field representative uses the surrounding map spots to locate the incomplete address. AREA FRAME MATERIALS Segment Folder (BC-1669(CPS)) Illustration 1 See the description and illustration in the Unit Frame Materials above. Area Segment Listing Sheets (Form 11-5) Illustration 6 Area Segment Listing Sheets are used to record information about housing units within the assigned block and to record information on the year a structure was built (when necessary). A field representative receives several Area Segment Listing Sheets for a block since the actual number of housing units in that block is unknown until the listing is complete. Only one of the listing sheets has the heading items filled in by the regional office. The heading items include: regional office code, field PSU, segment number, and survey acronym. Group Quarters Listing Sheets (Form 11-1) Illustration 10 See the description and illustration provided in the Group Quarters Frame Materials below. Area Segment Map Illustration 7 An Area Segment Map is a large scale map of the area segment, but it does not show roads or features outside of the segment boundaries. Within the segment, the map B–4 Illustration 2. Unit/Permit Listing Sheet (Unit Segment) B–5 Illustration 3. Unit/Permit Listing Sheet (Multiunit Structure) B–6 Illustration 4. Unit/Permit Listing Sheet (Large Special Exception) B–7 Illustration 5. Incomplete Address Locator Actions, BC-1718 (ADP) B–8 Illustration 6. Area Segment Listing Sheet B–9 includes such features as highways, streets, roads, trails, walkways, railroads, bodies of water, military installations, and parks. The area segment map is used for: 1. Determining the exact location and boundaries of the area segment, and 2. Mapspotting the locations of housing units when listing the area segment. Segment Locator Map Illustration 8 Segment Locator Maps are used to aid in the location of the area segment. The map identifies street patterns and names surrounding the exterior of the area segment. The location of the area segment is a shaded area, centered on the segment locator map. In area segments, field representatives receive blank group quarters listing sheets since at the time of the initial listing, it is not known whether or not the block contains any group quarters. Incomplete Address Locator Actions (Form BC-1718(ADP)) Illustration 3 See the description and illustration in the Unit Frame Materials above. 1990 Census Maps (County or Census Spotted) See the description in the Unit Frame Materials above. 1990 Combined Reference File Listings County Map Field representatives receive county maps for each field PSU in which they work. The map is divided into blocks or grids. Each segment locator map is one or more blocks or grids from the county map. Map Legend Illustration 9 Map Legends are used to help in the identification of boundaries and features from the symbols and names printed on the area segment map and the segment locator map. GROUP QUARTERS FRAME MATERIALS Segment Folder (BC-1669(CPS)) Illustration 1 See the description and illustration in the Unit Frame Materials above. Group Quarters Listing Sheets (Form 11-1) Illustration 10 The Group Quarters Listing Sheet is used to record the name, type and address of the group quarters, the name and telephone number of the contact person at the group quarters and to list the eligible units within the group quarters. In group quarters segments, the field representative receives group quarters listing sheets with certain information preprinted by the computer. This information includes: group quarters name and address, group quarters type, group quarters ID number (from the 1990 census), regional office code, field PSU, segment number, etc. See the description in the Unit Frame Materials above. PERMIT FRAME MATERIALS Segment Folder (BC-1699(CPS)) Illustration 1 See the description and illustration in the Unit Frame Materials above. Unit/Permit Listing Sheets (Form 11-3) Illustration 11 For each permit address in sample for the segment, the field representative receives a Unit/Permit Listing Sheet with heading information, sample designations, and serial numbers preprinted on it. The field representative also receives blank Unit/Permit Listing Sheets in case there are not enough lines on the preprinted listing sheet(s) to list all units at a multiunit address. Permit Address List (Form 11-193A) Illustration 12 The permit address list (PAL) form is computer generated for each of the Building Permit Offices in sample. It is used to obtain necessary address information for individual permits issued. The PALs for Survey of Construction (SOC) building permit offices and non-SOC building permit offices are identical with one exception: For SOC building permit offices, the sample permit numbers, the date issued, and the number of housing units for each sample permit are preprinted on the PALs. The field representatives obtain the address information for these sample permits only. For Illustration 7. Area Segment Map B–10 B–11 Illustration 8. Segment Locator Map B–12 Illustration 9. Map Legend Current Population Survey B–13 Illustration 10. Group Quarters Listing Sheet B–14 Illustration 11. Unit/Permit Listing Sheet Illustration 12. Permit Address List B–15 B–16 Illustration 13. Permit Sketch Map B–17 non-SOC building permit offices, the PALs do not contain any preprinted permit numbers. Field representatives list all residential permits issued by the building permit office for a specific month. Permit Sketch Map (Form 11-187) Illustration 13 When completing a PAL form, if the address given on a building permit is incomplete, the field representative attempts to obtain a complete address. If a complete address cannot be obtained, the field representative visits the new construction site and draws a Permit Sketch Map. The map shows the location of the construction site, streets in the vicinity of the site, and the directions and distance from the construction site to the nearest town. C–1 Appendix C. Maintaining the Desired Sample Size INTRODUCTION Implementing the Reduction Plan The Current Population Survey (CPS) sample is continually updated to include housing units built after the most recent census. If the same sampling rates were used throughout the decade, the growth of the U.S. housing inventory would lead to increases in the CPS sample size and, consequently, to increases in cost. To avoid exceeding the budget, the sampling rate is periodically reduced to maintain the desired sample size. Referred to as maintenance reductions, these changes in the sampling rate are implemented in a way that retains the desired set of reliability requirements. These maintenance reductions are different from changes to the base CPS sample size resulting from modifications in the CPS funding levels. The methodology for designing and implementing this type of sample size change is generally dictated by new requirements specified by the Bureau of Labor Statistics (BLS). For example, the sample reduction implemented in January 1996 was due to a reduction in CPS funding and new design requirements were specified (see Appendix H). As a result of the January 1996 reduction, however, any impending maintenance reduction to the 1990 design sample because of sample growth was preempted. Reduction groups. The CPS sample size is reduced by deleting one or more subsamples of ultimate sampling units (USUs) from both the old construction and permit frames. The original sample of USUs is partitioned into 101 subsamples called reduction groups; each is representative of the overall sample. The decision to use 101 subsamples is somewhat arbitrary. A useful attribute of the number used is it is prime to the number of rotation groups (eight) so that reductions have a uniform effect across rotations. A number larger than 101 would allow greater flexibility in pinpointing proportions of sample to reduce. However, a large number of reduction groups can lead to imbalances in sample cut distribution across PSUs since small PSUs may not have enough sample to have all reduction groups represented. That is, when there are a large number of reduction groups randomly assigned, a smaller PSU may or may not experience reductions and quite possibly it may undergo too much reduction in the PSU. All USUs in a hit string have the same reduction group number (see Chapter 3). For the unit, area, and group quarters (GQ) frames, hit strings are sorted and then sequentially assigned a reduction group code from 1 through 101. The sort sequence is: MAINTENANCE REDUCTIONS Developing the Reduction Plan The CPS sample size for the United States is projected forward for about a year using linear regression based on previous CPS monthly sample sizes. The future CPS sample size must be predicted because CPS maintenance reductions are gradually introduced over 16 months and operational lead time is needed to prevent dropped cases from being interviewed. The growth is examined in all states and major substate areas to determine whether it is uniform or not. The states with faster growth are candidates for a possible maintenance reduction. The remaining sample must be sufficient to maintain the individual state and national reliability requirements. Generally, the sample in a state is reduced by the same proportion in all frames in all primary sampling units (PSUs) to maintain the self-weighting nature of the state design. 1. State or substate. 2. Metropolitan statistical area/nonmetropolitan statistical area status. 3. Self-representing/nonself-representing status. 4. Stratification PSU. 5. Final hit number, which defines the original order of selection. For the permit frame, a random start is generated for each stratification PSU and permit frame hits are assigned a reduction group code from 1 through 101 following a specific, nonsequential pattern. This method of assigning reduction group code is used to improve balancing of reduction groups because of small permit sample sizes in some PSUs and the uncertainty of which PSUs will actually have permit samples over the life of the design. The sort sequence is: 1. Stratification PSU. 2. Final hit number. C– 2 The state or national sample can be reduced by deleting USUs from both the old construction and permit frames in one or more reduction groups. If there are k reduction groups in sample, the sample may be reduced by 1/k by deleting one of k reduction groups. For the first reduction applied to redesigned samples, each reduction group represents roughly 1 percent of the sample. Reduction group numbers are chosen for deletion in a specific sequence designed to maintain the nature of the systematic sample to the extent possible. For example, suppose a state has an overall state sampling interval of 500 at the start of the 1990 design. Suppose the original selection probability of 1 in 500 is modified by deleting 5 of 101 reduction groups. The resulting overall state sampling interval (SI) is 101 ⫽ 526.0417 101 ⫺ 5 This makes the resulting overall selection probability in the state 1 in 526.0417. In the subsequent maintenance reduction, the state has 96 reduction groups remaining. A reduction of 1 in 96 can be accomplished by deleting 1 of the remaining 96 reduction groups. The resulting overall state sampling interval is the new basic weight for the remaining uncut sample. SI ⫽ 500 x Introducing the reduction. A maintenance reduction is implemented only when a new sample designation is introduced and is gradually phased in with each incoming rotation group to minimize the effect on survey estimates and reliability and to prevent sudden changes to interviewer workloads. The basic weight applied to each incoming rotation group reflects the reduction. Once this basic weight is assigned, it does not change until future sample changes are made. In all, it takes 16 months for a maintenance sample reduction and new basic weights to be fully reflected in all eight rotation groups interviewed for a particular month. During the phase-in period, rotation groups have different basic weights; consequently, the average weight over all eight rotation groups changes each month. After the phase-in period, all eight rotation groups have the same basic weight. Maintenance Reductions and GVFs National generalized variance parameters are not updated for maintenance reductions because the purpose of these reductions is to maintain the sample size. However, the national generalized parameters should be updated to reflect any changes in sample size. For example, the parameters were updated for the January 1996 reduction since survey design parameters changed. (See Chapter 14 for more details.) D–1 Appendix D. Derivation of Independent Population Controls INTRODUCTION Each month, for the purpose of the iterative, secondstage weighting of the Current Population Survey (CPS) data, independent projections of the eligible population are produced by the Census Bureau’s Population Division. While the CPS provides the fundamental reason for the existence of these monthly projections, other survey-based programs sponsored by the Bureau of the Census, the Bureau of Labor Statistics, the National Center for Health Statistics, and other agencies use these independent projections to calibrate surveys. These projections consist of the adjusted civilian noninstitutional population of the United States, distributed by demographic characteristics in three ways: (1) age, sex, and race, (2) age, sex, and Hispanic1 origin, and (3) state of residence, restricted to the population 16 years of age and over. They are produced in association with the Population Division’s population estimates and projections programs, which provide published estimates and long-term projections of the population of the United States, the 50 states and District of Columbia, the counties, and subcounty jurisdictions. Organization of This Appendix The CPS population controls, like other population estimates and projections produced by the Census Bureau, are based on a demographic framework of population accounting. Under this framework, time series of population estimates and projections are anchored by decennial census enumerations, with populations for dates between two previous censuses, since the last census, or in the future derived by the estimation or projection of population change. The method by which population change is estimated depends on the defining demographic and geographic characteristics of the population. The issue of adjustment for net underenumeration in the census is handled outside of this demographic framework, and is applied to the CPS control series as a final step in its production. This appendix seeks first to systematically present this framework, defining terminology and concepts, then to describe data sources and their application within the framework. The first subsection is operational and is devoted to the organization of the chapter and a glossary of terms. 1 Hispanics may be of any race. The second and third subsections deal with two broad conceptual definitions. The second subsection distinguishes population estimates from population projections and describes how monthly population controls for surveys ‘‘fit’’ into this distinction. The third subsection defines the ‘‘CPS control universe,’’ the set of inclusion and classification rules specifying the population to be projected for the purpose of weighting CPS data. The fourth subsection, ‘‘Calculation of Population Projections for the CPS Control Universe,’’ comprises the bulk of the appendix. It provides the mathematical framework and data sources for updating total population from a census date via the demographic components of change and how each of the requisite inputs to the mathematical framework is measured. This is presented separately for the total population of the United States, its distribution by demographic characteristics, and its distribution by state of residence. The fifth subsection, ‘‘Adjustment of CPS Controls for Net Underenumeration in the 1990 Census,’’ describes the application of data from the 1990 Post-Enumeration Survey (PES) to include an estimate of net underenumeration in the 1990 census. The final two sections are, once again, operational. A subsection, ‘‘The Monthly and Annual Revision Process for Independent Population Controls’’ describes the protocols for incorporating new information in the series through revision. We conclude with a brief section, ‘‘Procedural Revisions,’’ containing an overview of various technical problems for which solutions are being sought. Terminology Used in This Appendix The following is an alphabetical list of terms used in this appendix, essential to understanding the derivation of census-based population controls for surveys, but not necessarily prevalent in the literature on survey methodology. An adjusted (as opposed to unadjusted) population estimate or projection includes an allowance for net undercoverage (underenumeration) in a census. In this appendix (as in other Census Bureau publications), the term, when applied to series based on the 1990 census, refers to incorporation of the results of the 1990 Post-Enumeration Survey (PES). The population base, (or base population) for a population estimate or projection is the population count or estimate to which some measurement or assumption of population change is added to yield the population estimate or projection. D–2 A census-level population estimate or projection is one that can be linked to a census count through an estimate or projection of population change; specifically, it does not include an adjustment for underenumeration in the census. The civilian population is the portion of the resident population not in the active-duty military. Active-duty military, used in this context, refers to persons defined to be in active duty by one of the five branches of the Armed Forces, as well as persons in the National Guard or the reserves actively participating in certain training programs. Components of population change are any subdivisions of the numerical population change over a time interval. The demographic components often cited in the literature consist of births, deaths, and net migration. In this appendix, reference is also made to changes in the institutional and active-duty Armed Forces populations, which affect the CPS control universe. The CPS control universe is characterized by three attributes: (1) restriction to the civilian noninstitutional population, (2) modification of census data by age, race, and sex (MARS), and (3) adjustment for net census underenumeration. Each of the three defining concepts appears separately in this glossary. The Demographic Analysis Population (‘‘DA’’ population) is a distribution of population by characteristics—in the present case age, sex, and race—developed by observation of births, deaths by age, and international migration by age, over a period of time for each sex and race category. It is derived independent of census data. Thus, the number of persons x years of age of a given sex and race is determined by the number of births x years before the reference date, as well as the number of deaths and net migration of persons born the same year during the x-year interval from birth date to reference date. In this text, we also refer to the ‘‘DA/MARS population.’’ This is a hybrid distribution based on demographic analysis for age, sex, and three race groups, but on the census for Hispanic origin detail within these groups. ‘‘MARS’’ refers to the fact that the census detail is based on modified age and race, as defined elsewhere in this glossary. Emigration is the permanent departure of a person from a country of residence, in this case the United States. In the present context, it refers to the departure of persons legally resident in the United States, but not confined to legally permanent residents (immigrants). Departures of undocumented residents are excluded, because they are included in the definition of net undocumented migration (elsewhere in this glossary). Population estimates are population figures that do not arise directly from a census or count, but can be determined from available data (e.g., administrative); hence, they are not based on assumptions or modeling. Population estimates discussed in this appendix stipulate an enumerated base population, coupled with estimates of population change from the date of the base population to the date of the estimate. Immigration is the acquisition, by a foreign national, of legal permanent resident status in the United States. Hence, an immigrant may or may not reside in the United States prior to immigration. This definition is consistent with the parlance of the Immigration and Naturalization Service (INS), but differs from popular usage, which associates immigration with an actual change of residence. The institutional population refers to a population universe consisting of inmates of CPS-defined institutions, such as prisons, nursing homes, juvenile detention facilities, or residential mental hospitals. Internal consistency of a population time series occurs when the difference between any two population estimates in the series can be construed as population change between the reference dates of the two populations estimates. International migration is generally the change of a person’s residence from one country to another. In the present context, this concept includes change of residence into the United States by non-U.S. citizens and by previous residents of Puerto Rico or the U.S. outlying areas, as well as the change of residence out of the United States by persons intending to live permanently abroad, in Puerto Rico, or the outlying areas. Legal permanent residents are persons whose right to reside in the United States is legally defined either by immigration or by U.S. citizenship. In the present context, the term generally refers to noncitizen immigrants. Modified age, race, and sex, abbreviated MARS, describes the census population after the definition of age and race has been aligned with other administrative sources (or with OMB Directive 15, in the case of race). The natural increase of a population over a time interval is the number of births minus the number of deaths during the interval. The Post-Enumeration Survey is a survey of households conducted after the 1990 population census in a sample of blocks to estimate the net underenumeration in the 1990 census. Population projections are population figures relying on modeled or assumed values for some or all of their components, therefore not entirely calculated from actual data. Projections discussed in this appendix stipulate a base population that may be an estimate or count, and components of population change from the reference date of the base population to the reference date of the projection. The reference date of an estimate or projection is the date to which the population figure applies. The CPS control reference date, in particular, is the first day of the month in which the CPS data are collected. The resident population of the United States is the population usually resident in the 50 states and District of Columbia. For the census date, this population matches the total population in census decennial publications, although D–3 applications to the CPS series beginning in 1995 include some count resolution corrections to the 1990 census not included in original census publications. THE POPULATION UNIVERSE FOR CPS CONTROLS Undocumented migration (more precisely, net undocumented migration) is the net increase of population brought about by the residential migration of persons in and out of the country who have no legal basis for residence in the United States. In the concept of ‘‘population universe,’’ as used in this appendix, are not only the rules specifying what persons are included in the population under consideration, but also the rules specifying their geographic locations and their relevant characteristics, such as age, sex, race, and Hispanic origin. In this section we consider three population universes; the resident population universe defined by the 1990 census, the resident population universe used for official population estimates, and the CPS control universe that relates directly to the calculation of CPS population controls. These three universes are distinct from one another; each one is derived from the one that precedes it; hence, the necessity of considering all three. The primacy of the decennial census in the definition of these three universes is a consequence of its importance within the Federal statistical system. This, in turn, is a result of the extensive detail that it provides on the geographic distribution and characteristics of the U.S. population, as well as the legitimacy accorded it by the U.S. Constitution. Later in this appendix, we cite another population universe—the Demographic Analysis population—that may be technically more appropriate as a base for estimates and projections than the 1990 census in some regards, mainly in the continuity of the age distribution and the conformity of racial categories with other administrative data sources. Because the census defines the universe for estimates, the use of the DA population in the process is intended to ensure the continuity of census-related biases over time in the estimates, not to eliminate them. A population universe is a set of rules or criteria defining inclusion and exclusion of persons from a population, as well as rules of classification for specified geographic or demographic characteristics within the population. Examples include the resident population universe and the civilian population universe, both of which are defined elsewhere in this glossary. Frequent reference is made to the ‘‘CPS control universe,’’ defined separately in this glossary. CPS Population Controls: Estimates or Projections? Throughout this appendix, the independent population controls for the CPS will be cited as population projections. Throughout much of the scientific literature on population, demographers take care to distinguish the concepts of ‘‘estimate’’ and ‘‘projection’’; yet, the CPS population controls lie perilously close to a somewhat ragged line of distinction. Generally, population estimates relate to past dates, and in order to be called ‘‘estimates,’’ must be supported by a reasonably complete data series. If the estimating procedure involves the computation of population change from a base date to a reference date, the inputs to the computation must have a solid basis in data. Projections, on the other hand, allow the replacement of unavailable data with assumptions. The reference date for CPS population controls is indeed in the past, relative to their date of production. However, the past is so recent—3 to 4 weeks prior to production—that for all intents and purposes, CPS controls are projections. No data relating to population change are available for the month prior to the reference date; very little data are available for 3 to 4 months prior to the reference date; no data for state-level geography are available past July 1 of the year prior to the current year. Hence, we refer to CPS controls as projections, despite their frequent description as estimates in the literature. Be this as it may, CPS controls are founded primarily on a population census and on administrative data. Analysis of coverage rates has shown that even without adjustment for underenumeration in the controls, the lowest coverage rates of the CPS often coincide with those age-sex-raceHispanic segments of the population having the lowest coverage in the census. Therefore, the contribution of independent population controls to the weighting of CPS data is highly productive. This holds, despite the use of modeling for missing data. The universe defined by the 1990 census is the U.S. resident population, including persons resident in the 50 states and the District of Columbia. This definition excludes residents of the Commonwealth of Puerto Rico, and residents of the outlying areas under U.S. sovereignty or jurisdiction (principally American Samoa, Guam, Virgin Islands of the United States, and Commonwealth of the Northern Mariana Islands). The definition of residence conforms to the criterion used in the census, which defines a resident of a specified area as a person ‘‘usually resident’’ in the area. For the most part, ‘‘usual residence’’ is defined subjectively by the census respondent; it is not defined by de facto presence in a particular house or dwelling, nor is it defined by any de jure (legal) basis for a respondent’s presence. There are two exceptions to the subjective character of the residence definition. 1. Persons living in military barracks, prisons, and some other types of residential group-quarters facilities are generally reported by the administration of their facility, and their residence is generally the location of the facility. Naval personnel aboard ships reside at the home port of the ship, unless the ship is deployed to the overseas fleets, in which case they are not U.S. residents. Exceptions may occur in the case of Armed D–4 Forces personnel who have an official duty location different from the location of their barracks, in which case their place of residence is their duty location. 2. Students residing in dormitories report their own residences, but are instructed on the census form to report their dormitories, rather than their parental homes, as their residences. The universe of interest to the official estimates of population that the Census Bureau produces is the resident population of the United States, as it would be counted by the last census (1990) if this census had been held on the reference date of the estimates. However, this universe is distinct from the census universe in two regards, neither of which affects the total population. 1. The estimates differ from the census in their definition of race, with categories redefined to be consistent with other administrative-data sources. Specifically, persons enumerated in the 1990 census who gave responses to the census race question not codable to White, Black, American Indian, Eskimo, Aleut, or any of several Asian and Pacific Islander categories were assigned a race within one of these categories. Most of these persons (9.8 million nationally) were persons of Hispanic origin who gave a Hispanic origin response to the race question. Publications from the 1990 census do not incorporate this modification, while all official population estimates incorporate it. 2. Respondents’ ages are modified to compensate for the aging of census respondents from the census date (April 1, 1990) to the actual date that the census form was completed. This modification was necessitated by the fact that no printed instruction on the census form required respondents to state their age as of the census date. They therefore tended to state their age at the time of completion of the form, or possibly their next anticipated birthday. This modification is not included in census publications, which tabulate respondents’ age as stated. It is applied to all population estimates and projections, including controls for the CPS. The universe underlying the CPS sample is confined to the civilian noninstitutional population. Thus, independent population controls are sought for this population, which defines the CPS control universe. The previously mentioned alternative definitions of race and age, associated with the universe for official estimates, are carried over to the CPS control universe. Four additional changes are incorporated, which distinguish the CPS control universe from the universe for official estimates. 1. The CPS control universe excludes active-duty Armed Forces personnel, including those stationed within the United States, while these are included in the resident population. ‘‘Active duty’’ is taken to refer to personnel reported in military strength statistics of the Departments of Defense (Army, Navy, Marines, and Air Force) and Transportation (Coast Guard), including reserve forces on 3- and 6-months’ active duty for training, National Guard reserve forces on active duty for 4 months or more, and students at military academies. Reserve forces not active by this definition are included in the CPS control universe, if they meet the other inclusion criteria. 2. The CPS control universe excludes persons residing in institutions, such as nursing homes, correctional facilities, juvenile detention facilities, and long-term mental health care facilities. The resident population, on the other hand, includes the institutional population. 3. The CPS control universe, like the resident population base for population estimates, includes students residing in dormitories. Unlike the population estimates base, however, it accepts as state of residence a family home address within the United States, in preference to the address of the dormitory. This affects estimation procedures at the state level, but not at the national level. 4. An added difference between the census official estimates and the CPS population controls after January 1, 1994, is that the former do not include adjustment for undercoverage in the 1990 census based on the Post-Enumeration Survey (PES) while the latter do. Details of this adjustment are given later in this appendix.2 CALCULATION OF POPULATION PROJECTIONS FOR THE CPS UNIVERSE The present section is concerned with the methodology for computing population projections for the CPS control universe used in the actual calibration of the survey. Three subsections are devoted to the calculation of (i) the total population, (ii) its distribution by age, sex, race, and Hispanic origin, and (iii) its distribution by state of residence. Total Population Projections of the population to the CPS control reference date (the first of each month) are determined by a base population (either estimated or enumerated) and a 2 Whether the presence of an adjustment for census undercount implies an actual change in universe, as opposed to the correction of a measurement problem, is a legitimate subject for dispute. In the context of the census itself, adjustment might be viewed as a measurement issue relative to the population of a ‘‘true’’ universe of U.S. residents. It is treated here as a technical attribute of the universe, since unadjusted population estimates are specifically adapted to remain consistent with the census in regard to coverage. Hence ‘‘census-level’’ and ‘‘adjusted’’ define distinct population universes. D–5 projection of population change. The latter is generally an amalgam of components measured from administrative data and components determined by projection. The ‘‘balancing equation of population change’’ used to produce estimates and projections of the U.S. population states the population at some reference date as the sum of the base population and the net of various components of population change. The components of change are the sources of increase or decrease in the population from the reference date of the base population to the date of the estimate. The exact specification of the balancing equation depends on the population universe, and the extent to which components of population change are disaggregated. For the resident population universe, the equation in its simplest form is given by Rt1 ⫽ Rt0 ⫹ B ⫺ D ⫹ NM (1) where: Rt1 = Resident population at time t1 (the reference date), Rt0 = Resident population at time t0 (the base date), B = births to U.S. resident women, from time t0 to time t1, D = deaths of U.S. residents, from time t0 to time t1, NM = net migration to the United States (migration into the United States minus migration out of the United States) from time t0 to time t1. It is essential that the components of change in a balancing equation represent the same population universe as the base population. If we derive the current population controls for the CPS independently, from a base population in the CPS universe, civilian noninstitutional deaths would replace resident deaths, ‘‘net migration’’ would be confined to civilian migrants, but would incorporate net recruits to the Armed Forces and net admissions to the institutional population. The equation would thus appear as follows: CNPt1 ⫽ CNPt0 ⫹ B ⫺ CND ⫹(NCM ⫺ NRAF ⫺ NEI) where: CNPt1 CNPt0 B = = = CND = NCM NRAF = = NEI = (2) civilian noninstitutional population, time t1, civilian noninstitutional population, time t0, births to the U.S. resident population, time t0 to time t1, deaths of the civilian noninstitutional population, time t0 to time t1, net civilian migration, time t0 to time t1, net recruits (inductions minus discharges) to the U.S. Armed Forces from the domestic civilian population, from time t0 to time t1, net admissions (enrollments minus discharges) to institutions, from time t0 to time t1. In the actual estimation process, it is not necessary to directly measure the variables CND, NRAF, and NEI in the above equation. Rather, they are derived from the following equations: CND ⫽ D ⫺ DRAF ⫺ DI (3) where D = deaths of the resident population, DRAF = deaths of the resident Armed Forces, and DI = deaths of inmates of institutions (the institutional population) NRAF ⫽ WWAFt1 ⫺ WWAFt0 ⫹ DRAF ⫹ DAFO ⫺ NRO (4) where WWAFt1 and WWAFt0 represent the U.S. Armed Forces worldwide at times t1 and t0, DRAF represents deaths of the Armed Forces residing in the United States during the interval, DAFO represents deaths of the Armed Forces residing overseas, and NRO represents net recruits (inductions minus discharges) to the Armed Forces from the overseas civilian population NEI ⫽ CIPt1 ⫺ CIPt0 ⫹ DI (5) where CIPt1 and CIPt0 represent the civilian institutional population at times t1 and time t0, and DI represents deaths of inmates of institutions during the interval. The appearance of DI in both equations (3) and (5) with opposite sign, and DRAF in equations (3) and (4) with opposite sign, ensures that they will cancel each other when applied to equation (2). This fact obviates the need to measure institutional inmate deaths or deaths of the Armed Forces residing in the United States. A further disaggregation to facilitate the description of how the components on the right side of equation (2) are measured as follows: NCM ⫽ IM ⫹ CCM (6) where IM = net international migration, and CCM = net movement of civilian citizens to the United States from abroad, in the time interval (t0,t1). At this point, we restate equation (2) to incorporate equations (3), (4), (5), and (6), and simplify. The resulting equation, which follows, is the operational equation for population change in the civilian noninstitutional population universe. CNPt1 ⫽ CNPt0 ⫹ B ⫺ D ⫹ 共IM ⫹ CCM兲 ⫺ 共WWAFt1 ⫺ WWAFt0 ⫹ DAFO ⫺ NRO兲 ⫺ 共CIPt1 ⫺ CIPt0兲 (7) Equation (7) identifies all the components that must be separately measured or projected to update the total civilian noninstitutional population from a base date to a later reference date. Aside from forming the procedural basis for all estimates and projections of total population (in whatever universe), balancing equations (1), (2), and (7) also define a recursive concept of internal consistency of a population series within a population universe. We consider a time series of population estimates or projections to be internally consistent if any population figure in the series can be derived D–6 from any earlier population figure as the sum of the earlier population and the components of population change for the interval between the two reference dates. Measuring the Components of the Balancing Equation Balancing equations such as equation (7) requires a base population, whether estimated or enumerated, and information on the various components of population change. Because a principal objective of the population estimating procedure is to uphold the integrity of the population universe, the various inputs to the balancing equation need to agree as much as possible with respect to geographic scope, treatment of underenumeration of population or underregistration of events, and residential definition, and this is a major aspect of the way they are measured. A description of the individual inputs to equation (7), which applies to the CPS control universe, and their data sources follows. The census base population (CNPt0). While any base population (CNPt0, in equation (7)), whether a count or an estimate, can give rise to estimates and projections for later dates, the original base population for all unadjusted postcensal estimates and projections is the enumerated population from the last census. Although survey controls are currently adjusted for underenumeration in the 1990 census, the census population is itself not adjusted for underenumeration in the census; rather, an adjustment difference is added in a later step. After the 2000 census, it is expected that official estimates will be adjusted for undercoverage, so that the adjustment can be incorporated in the base population. However, the census base population does include minor revisions arising from count resolution corrections. As these corrections arise entirely from the tabulation of data from the enumerated population, they are unrelated to adjustment for underenumeration. As of the end of 1994, count resolution corrections incorporated in the estimates base population amounted to a gain of 8,418 persons from the originally published count. In order to apply equation (7), which is derived from the balancing equation for the civilian noninstitutional population, the resident base population enumerated by the census must be transformed into a base population consistent with this universe. The transformation is given as follows: CNP0 ⫽ R0 ⫺ RAF0 ⫺ CIP0 (8) where R0 = the enumerated resident population, RAF0 = the Armed Forces resident in the United States on the census date, and CIP0 = the civilian institutional population residing in the United States on the census date. This is consistent with previous notation, but with the stipulation that t0 = 0, since the base date is the census date, the starting point of the series. The transformation can also be used to adapt any postcensal estimate of resident population to the CPS universe for use as a population base in equation (7). In practice, this is what occurs when the monthly CPS population controls are produced. Adjustment for underenumeration, while essential to the CPS universe, is handled independently of the process of producing projections. It is added after the population is updated. Hence, adjustment will be discussed later in this appendix. Births and deaths (B, D). In estimating total births and deaths of the resident population (B and D, in equation (7)), we assume the population universe for vital statistics to match the population universe for the census. If we define the vital statistics universe to be the population subject to the natural risk of giving birth or dying, and having the event recorded by vital registration systems, the assumption implies the match of this universe with the census-level resident population universe. We relax this assumption in the estimation of some characteristic detail, and this will be discussed later in the appendix. The numbers of births and deaths of U.S. residents are supplied by the National Center for Health Statistics (NCHS). These are based on reports to NCHS from individual state and local registries; the fundamental unit of reporting is the individual birth or death certificate. For years in which reporting is considered final by NCHS, the birth and death statistics are considered final; these generally cover the period from the census until the end of the calendar year 3 years prior to the year of the CPS series. For example, the last available final birth and death statistics available in time for the 1998 CPS control series were for calendar year 1995. Final birth and death data are summarized in NCHS publications (see Ventura et al., 1997a, Anderson et al., 1997). Monthly births and deaths for calendar year(s) up to 2 years before the CPS reference dates (e.g., 1996 for 1998 controls) are based on provisional estimates by NCHS (Ventura et al., 1997b). For the year before the CPS reference date through the last month before the CPS reference date, births and deaths are projected based on the population by age according to current estimates and age-specific rates of fertility and mortality and seasonal distributions of births and deaths observed in the preceding year. Various concerns exist relative to the consistency of the NCHS vital statistics universe with the census resident universe and the CPS control universe derived from it. 1. Births and deaths can be missed entirely by the registration system. In the accounting of the national population without adjustment for underenumeration, underregistration of vital events is tacitly assumed to match underenumeration in the census. While we have found no formal investigation of the underregistration of deaths, we surmise that it is likely to be far less than the rate of underenumeration in the census because of the requisite role of local public agencies in D–7 the processing of bodies for cremation or burial. A test of the completeness of birth registration conducted by the Census Bureau in 1964-1968, applied to the distribution of 1992 births by race and whether born in hospital, implied an underregistration of 0.7 percent, far less than most estimates of underenumeration in recent censuses, but probably greater than the underregistration of deaths. 2. Birth and death statistics obtained from NCHS exclude those events occurring to U.S. residents while outside the United States. The resulting slight downward bias in the numbers of births and deaths would be partially compensatory. These two sources of bias, which are assumed to be minor, affect the accounting of total population. Other, more serious comparability problems exist in the consistency of NCHS vital statistics with the CPS control universe, with respect to race, Hispanic origin, and place of residence within the United States. These will be discussed in later sections of this appendix. International migration (IM). The objective of the current procedures to estimate international migration (IM, in equation (7)), is to transform the number of legal immigrants— for whom records of migration are available—into an estimate of the net of persons who become ‘‘usual residents’’ of the United States according to the census residency definition, and those who cease to be usual residents because they move out of the United States. This objective is met primarily by the availability of administrative data from Federal agencies. However, it is beset by five fundamental problems. 1. A substantial number of foreign-born persons increase the resident population each year, either by overstaying legal, nonimmigrant visas or by entering the country without inspection. Those that have not returned to their country of origin (either voluntarily or by force of law) by a CPS date are eligible for interview by the CPS and must be included in the CPS controls. 2. Because the geographic limits of the census universe do not include Puerto Rico or outlying areas under U.S. jurisdiction, persons who enter or leave the country from or to these areas must be treated as international migrants. However, migrants to or from these areas are generally U.S. citizens and need not produce any administrative record of their moves. 3. Legal residents of the United States departing for residence abroad are not required to provide any administrative record of departure. Hence, there exist no current data on emigration. 4. Some persons who enter the country legally, but temporarily (e.g., foreign students, scholars, and members of certain professions), are eligible for enumeration in the census as usual residents and should be included in population estimates. While administrative records exist for the arrival of such persons, there is no adequate source of data for their departures or their lengths of stay. 5. The census definition of usual residence is subjective and depends on the interpretation of census respondents. Immigration data generally assume a legal, rather than a subjective concept of residence (e.g., legal permanent resident, nonresident alien) that does not match the census concept. Thus, an alien could be legally resident in the United States, but perceive their usual residence to be elsewhere when responding to a census or survey. For purposes of population accounting, migration to the United States is assumed to include persons arriving in the United States who, based on their arrival status, appear to be new census-defined residents, meaning they would report their usual residence in a census as being inside the United States. The accounting should thus include all legal permanent residents, refugees, and undocumented migrants that do not return and are not deported. It should also include arrivals (such as foreign students and foreign scholars) that generally assume steady residence in the United States for the duration of their stay and would, therefore, be enumerated in a census. Tourists and business travelers from abroad are assumed not to be U.S. residents and are, therefore, excluded from the international migration tally. The core data source for the estimation of international migration is the Immigration and Naturalization Service (INS) public use immigrant file, which is issued yearly for immigrants (year defined by the Federal fiscal year). This file contains records for citizens of foreign countries immigrating or establishing legal permanent residence in the United States and accounts for the majority of international migration to the United States. Data from the immigrant file are summarized by the INS in its statistical yearbooks (for example, U.S. Immigration and Naturalization Service, 1997). However, the file contains no information on emigration, undocumented immigration, or nonimmigrant moves into the country. Moreover, immigrants do not always change residence at time of immigration; they may in fact already reside in the United States. INS immigrant data are, therefore, partitioned into four categories based on INS class of admission and each category is treated differently. The first category consists of all persons classed as ‘‘new arrivals,’’ meaning their last legal entry into the U.S. coincided with their immigration or acquisition of legal permanent resident status. These persons are simply included in the migration component for the month and year of their immigration. The second category consists of persons already resident in the United States who are adjusting their status from nonimmigrant (e.g., temporary resident) to immigrant D–8 and who are not refugees. They include a large component of persons who are nonimmigrant spouses or children of U.S. citizens or legal permanent residents. We cannot account for these persons at their time of arrival through current data. We could account for these persons retrospectively at the earlier year of their arrival from INS records. However, this would result in a serious downward bias for the most recent years, since we would miss similar persons, currently entering the United States, who will immigrate in future years. For this reason, we accept the number of adjustees in this category as a proxy for the number of future adjustees physically entering the country in the current year. This assumption is robust over time provided the number of persons in this category remains stable from year to year.3 A third category consists of persons who entered the country as refugees and are currently adjusting their status to immigrant. The assumption of year-to-year stability of the flow, made for the previous category, would generally be inadequate for refugees because they tend to enter the country in ‘‘waves’’ depending on emigration policies or political upheavals in foreign countries. Hence, the time series of their entry would be poorly reflected by the time series of their conversion to immigrant status. The Office of Refugee Resettlement (ORR) maintains monthly data on the arrival of refugees (as well as some special-status entrants from Haiti and Cuba) by country of citizenship. Refugees adjusting to immigrant status are thus excluded from the accounting of immigrants but included as refugees at their time of arrival based on the ORR series. The fourth and final category consists of persons living in the country illegally—either as nonimmigrant visa overstayers or ‘‘entered without inspection’’—who adjust to legal immigrant status by proving a history of continuous residence in the United States. Since fiscal year 1989, the largest portion of these persons has been admitted under the Immigration Reform and Control Act of 1986 (IRCA), which required proof of permanent residence since 1982. Generally, these persons should qualify as census-defined residents from the date of their arrival, since they do not generally expect to return to their country of origin. Because their arrival occurred before the 1990 census and current undocumented migration is separately accounted, immigrating IRCA adjustees are excluded from the accounting of migration altogether for post-1990 estimates. The separate accounting of net undocumented migration cannot be based on registration data because, by definition, it occurs without registration. Research conducted at the Census Bureau (Robinson, 1994) has produced an allowance of 225,000 net migration per year, 3 In fiscal year 1995, the assumption of a stable flow of nonimmigrants adjusting to immigrant status was seriously undermined by a legal change. For the first time, persons qualifying for legal immigrant status were allowed to do so without leaving the country. As a result, a rapid increase in applications to INS resulted in an increase in the backlog of unprocessed applications. It was, therefore, necessary to adapt the accounting for the increase in applications, rather than assuming a one-for-one relationship of current adjustees to future adjustees currently entering the country. based on observation of the late 1980s, with the universe confined (appropriately) to those counted in the 1990 census. This number is a consensus estimate, based partly on a specific derivation and partly on a review of other estimates for the same period. The derivation is based on a residual method, according to which the number of foreign-born persons enumerated in 1990 by period of arrival is compared to net legal migration measured for the same period. Currently, all Census Bureau estimates and projections of the population after 1990 incorporate this annual allowance as a constant. The remaining category of foreign-born migration to the United States, not included in the immigrant data, is the flow of legal temporary residents; persons who reside in the country long enough to consider themselves ‘‘usual residents’’ while they are here, but who do not have immigrant visas. These include foreign students, scholars, some business persons (those who establish residence), and some professionals who are provided a special allowance to work. INS data on nonimmigrants would provide records of admission for temporary visa holders; however, there would be no reliable data source for their departures. The stock of this category of foreign-born persons enumerated in the 1990 census was estimated at 488,000, based on various characteristics measured by the census ‘‘long form’’ (1 in 6 households). This migration flow is currently assumed to maintain this stock at 488,000, This tells us that net migration (arrivals minus departures) equals the estimated number of deaths occurring to the group while in the United States. The overall net migration of this group is rather trivial (less than a thousand per year); far more important is the implied distribution of migration by age, which will be discussed later in this appendix. A major migratory movement across the U.S. frontier, not directly measurable with administrative data, is the emigration of legal permanent residents of the United States to abroad. The current projection is a constant 222,000 per year, of whom 195,000 are foreign-born (Ahmed and Robinson, 1994). Like the allowance for undocumented immigration, the method of derivation is a residual method. In this case, the comparison is between foreign-born persons enumerated in the 1980 census and foreign-born persons enumerated in the 1990 census who gave a year of arrival prior to 1980. In theory, the latter number would be smaller, with the difference attributed to either death or emigration. The number of deaths was estimated through life tables, leaving emigration as a residual. This analysis was carried out by country of birth, facilitating detailed analysis of possible biases arising from differential reporting in the two censuses. The annual estimate of 195,000 was projected forward as a constant, for purposes of producing post-1990 estimates and projections. The remaining annual emigrant allowance of 27,000 is based on an estimate of native-born emigrants carried out during the early 1980s which employed data on U.S.-born D–9 persons enumerated in foreign censuses, as well as historical evidence from the period before 1968, when permanent departures from the United States were registered by INS. A final class of international migration that can only be estimated roughly as a net flow is the movement of persons from Puerto Rico and the outlying areas to the United States. No definitive administrative data exist for measuring volume, direction, or balance of these migration flows. The migratory balance from all areas except Puerto Rico is assumed to be zero. For Puerto Rico, we assume an allowance of roughly 7,000 net migration (arrivals minus departures) per year. This projection is based on an imputation of net out-migration from Puerto Rico during the 1980s, based on a comparison of the island’s natural increase (births minus deaths) with the population change between the 1980 and 1990 censuses. There is considerable variation in the currency of data on international migration, depending on the source. The INS public use immigrant file is generally available during the summer following the fiscal year of its currency, so projections for control of the CPS have access to final immigration data through September, 2 years prior to the reference date of the projections (e.g., 1995 for 1997 controls). INS provides a provisional monthly series for the period from October through June for the following year, that is, through the middle of the calendar year before the CPS reference date. From that point until the CPS reference date itself, the immigrant series is projected. The Office of Refugee Resettlement data on refugees follows roughly the same timetable, except the preliminary series is ongoing and is generally current through 4 to 5 months before the CPS reference date. Net undocumented immigration, emigration of legal residents, and net migration from Puerto Rico are projected as constants from the last decennial census, as no current data are available. The determination of the base series for the projection requires considerable research, so the introduction of data for a new decade generally does not occur until about the middle of the following decade. Net migration of Federally affiliated civilian U.S. citizens (CCM). While the approach to estimating international migration is to measure the migration flows for various types of migration, no data exist on flows for civilian U.S. citizens between the United States and abroad (except with respect to Puerto Rico, which is treated here as international); this is the variable CCM in equation (7). Because U.S. citizens are generally not required to report changes of address to or from other countries, there is no registration system from which to obtain data. Fortunately, there are data sources for measuring the current number (or stock) of some classes of civilian U.S. citizens residing overseas, including persons affiliated with the Federal government, and civilian dependents of Department of Defense employees, both military and civilian. The estimation procedure used for the movement of U.S. citizens thus relies on an analysis of the change in the stock of overseas U.S. citizens over time. We assume the net migration of non-Federally affiliated U.S. citizens to be zero. Imputation of migration from stock data on overseas population is by the following formula: CCM ⫽ ⫺ (OCPt1 ⫺ OCPt0 ⫺ OB ⫹ OD) (9) where OCPt1 and OCPt0 represent the overseas civilian citizen population (the portion that can be estimated) at the beginning of two consecutive months; OB is the number of births, and OD the number of deaths of this population during the month. This equation derives from the balancing equation of population change and states the net migration to the overseas civilian population to be its total change minus its natural increase. Federally affiliated civilian U.S. citizens residing overseas for whom we have data, or the basis for an estimate, are composed of three categories of persons; dependents of Department of Defense (DOD) personnel, civilian Federal employees, and dependents of non-DOD civilian Federal employees. The number of civilian dependents of DOD military and civilian personnel overseas are published quarterly by DOD; the number of Federal employees—total and DOD civilian—located overseas at the beginning of each month is published by the Office of Personnel Management (OPM). We estimate the number of dependents of non-DOD civilian Federal employees by assuming the ratio of dependents to employees for non-DOD personnel to match the ratio for civilian DOD personnel. Quarterly data (e.g., DOD dependents) are converted to monthly by linear interpolation. Estimates of the numbers of births and deaths of this population (OB and OD, in equation (9)) depend almost entirely on a single statistic, published quarterly by DOD: the number of births occurring in United States military hospitals overseas. This number (apportioned to months by linear interpolation) is adopted as an estimate of overseas births, on the reasoning that most births to Federally affiliated citizens or their dependents would occur in military hospitals. We estimate deaths by assuming deaths of persons 1 year of age or older to be nil; we estimate infant deaths by applying a life table mortality rate to the number of births. The likely underestimate of both births and deaths would tend to compensate each other. While this method of estimating the natural increase of overseas civilian citizens is very approximate, the numbers involved are very small, so the error is unlikely to have a serious effect on estimates of civilian citizen migration. Data from DOD and OPM used to estimate civilian citizen migration are generally available for reference dates until 6 to 9 months prior to the CPS reference date. For the months not covered by data, the estimates rely on projected levels of the components of the overseas population and overseas births. D–10 Net recruits to the Armed Forces from the civilian population (NRAF). The net recruits to the worldwide U.S. Armed Forces from the U.S. resident civilian population is given by the expression (WWAFt1 - WWAFt0 + DRAF + DAFO - NRO) in equation (4). The first two terms represent the change in the number of Armed Forces personnel worldwide. The third and fourth represent deaths of the Armed Forces in the U.S. and overseas, respectively. The fifth term represents net recruits to the Armed Forces from the overseas civilian population. While this procedure is indirect, it allows us to rely on data sources that are consistent with our estimates of the Armed Forces population on the base date. Most of the information required to estimate the components of this expression is supplied directly by the Department of Defense, generally through a date 1 month prior to the CPS control reference date; the last month is projected, for various detailed subcomponents of military strength. The military personnel strength of the worldwide Armed Forces, by branch of service, is supplied by personnel offices in the Army, Navy, Marines and Air Force, and the Defense Manpower Data Center supplies total strength figures for the Coast Guard. Participants in various reserve forces training programs (all reserve forces on 3- and 6-month active duty for training, National Guard reserve forces on active duty for 4 months or more, and students at military academies) are treated as active-duty military for purpose of estimating this component, and all other applications related to the CPS control universe, although the Department of Defense would not consider them to be in active duty. The last three components of net recruits to the Armed Forces from the U.S. civilian population, deaths of the resident Armed Forces (DRAF), deaths of the Armed Forces overseas (DAFO) and net recruits to the Armed Forces from overseas (NRO) are usually very small and require indirect inference. Four of the five branches of the Armed Forces (those in DOD) supply monthly statistics on the number of deaths within each service. Normally, deaths are apportioned to the domestic and overseas component of the Armed Forces relative to the number of the domestic and overseas military personnel. If a major fatal incident occurs for which an account of the number of deaths is available, these are assigned to domestic or overseas, as appropriate, before application of the pro rata assignment. Lastly, the number of net recruits to the Armed Forces from the overseas population (NRO) is computed annually, for years from July 1 to July 1, as the difference between successive numbers of Armed Forces personnel giving a ‘‘home of record’’ outside the 50 states and District of Columbia. To complete the monthly series, the ratio of persons with home of record outside the U.S. to the worldwide military is interpolated linearly, or extrapolated as a constant from the last July 1 to the CPS reference date. These monthly ratios are applied to monthly worldwide Armed Forces strengths; successive differences of the resulting estimates of persons with home of record outside the U.S. yield the series for net recruits to the Armed Forces from overseas. Change in the institutional population (CIPt1 - CIPt0). The change in the civilian population residing in institutions is measured by a report of selected group quarters facilities carried out by the Census Bureau, in conjunction with state governments, through the Federal State Cooperative Program for Population Estimates (FSCPE). Information is collected on the type of facility and the number of inhabitants, and these data are tabulated by institutional and noninstitutional. The change in the institutional population from the census date to the reference date of the population estimates, measured by the group quarters report, is used to update the institutional population enumerated by the census. Certain facilities, such as military stockades and resident hospitals on military bases, are excluded from the institutional segment, as most of their inhabitants would be military personnel; hence, the resulting institutional population is assumed to be civilian. The last available group quarters data refer to July 1, 2 years prior to the reference year of the CPS series (July 1, 1995, for 1997 CPS controls). Institutional population from this date forward must be projected. Censusbased institutional participation rates are computed for the civilian population by type of institution, age, sex, race, and Hispanic origin. These are used to produce estimates for the group quarters report dates, which are then proportionally adjusted to sum to the estimated totals from the report. Participation rates are recomputed, and the resulting rates, when applied to the monthly estimate series for the civilian population by characteristic, produce projections of the civilian institutional population. These estimates produce the last two terms on the right side of equation (7). Population by Age, Sex, Race, and Hispanic Origin The CPS second-stage weighting process requires the independent population controls to be disaggregated by age, sex, race, and Hispanic origin. The weighting process, as currently constituted, requires cross-categories of age group by sex and by race, and age group by sex and by Hispanic origin, with the number of age groups varying by race and Hispanic origin. Three categories of race (White, Black, and all Other4), and two classes of ethnic origin (Hispanic, non-Hispanic) are required, with no cross-classifications of race and Hispanic origin. Beginning in 1993, the population projection program adopted the full cross-classification of age by sex and by race and by Hispanic origin into its 4 Other includes American Indian, Eskimo, Aleut, Asian, and Pacific Islander. D–11 monthly series, with 101 single-year age categories (single years from 0 to 99, and 100 and over), two sex categories (male, female), four race categories (White; Black; American Indian, Eskimo, and Aleut; and Asian and Pacific Islander), and two Hispanic origin categories (not Hispanic, Hispanic).5 The resulting matrix has 1,616 cells (101 x 2 x 4 x 2), which are then aggregated to the distributions used as controls for the CPS. In discussing the distribution of the projected population by characteristic, we will stipulate the existence of a base population and components of change, each having a distribution by age, sex, race, and Hispanic origin. In the case of the components of change, ‘‘age’’ is understood to be age at last birthday as of the estimate date. The present section on the method of updating the base population with components of change is followed by a section on how the base population and component distributions are measured. active-duty resident Armed Forces, and the civilian institutional population is precisely the same for any age group as for the total population, so we omit its discussion. Of course, it is necessary to know the age distribution of the resident Armed Forces population and the civilian institutionalized population in order to derive the CNP by age from the resident population by age. Under ideal circumstances, a very standard demographic procedure—the method of cohort components—would be employed to update the population age distribution. The cohort component method yields the population of a given birth cohort (persons born in a given year) from the same cohort in an earlier year, incremented or decremented by the appropriate components of change. To take a simple case, we assume the time interval to be 1 year, from the beginning of year t to the beginning of year t+1, and the age distribution being updated is a distribution of the resident population by single year of age. We can derive the number of persons age x (for most values of x) by the equation Rx,t⫹1 ⫽ Rx⫺1,t ⫺ Dx,t ⫹ NMx,t Update of the Population by Sex, Race, and Hispanic Origin The full cross-classification of all variables except age amounts to 16 cells, defined by variables not naturally changing over time (2 values of sex by 4 of race by 2 of Hispanic origin). The logic for projecting the population of each of these cells follows the same logic as the projection of the total population and involves the same balancing equations. The procedure, therefore, requires a distribution by sex, race, and Hispanic origin to be available for each component on the right side of equation (7). Similarly, the derivation of the civilian noninstitutional base population from the census resident population follows equation (8). Update of the Age Distribution: The Inflation-Deflation Method Having produced population series for 16 cells of sex, race, and Hispanic origin, it remains to distribute each cell to 101 age groups. Age differs fundamentally from other demographic characteristics because it changes over time. The balancing equations described in earlier sections rely on the premise that the defining characteristics of the population being estimated remain fixed, hence the procedure must be adapted to allow for the fact that age does not meet this criterion. We present the method of estimating the age distribution through the equation for the resident population, analogous to equation (1) for the total population, for reasons discussed at the end of this section. The mathematical logic producing the civilian noninstitutional population (CNP) by age from the resident population, the 5 Throughout this appendix, ‘‘American Indian’’ refers to the aggregate of American Indian, Eskimo, and Aleut; ‘‘API’’ refers to Asian and Pacific Islander. where: Rx,t+1 Rx-1,t Dx,t NMx,t (10) = resident population aged x (last birthday), at time t+1 = resident population aged x-1, at time t = deaths during the time interval from t to t+1, of persons who would have been age x at time t+1 = net migration during the time interval from t to t+1, of persons who would be age x at time t+1 For the special case where x=0, the equation (10) becomes R0,t⫹1 ⫽ Bt ⫺ D0,t ⫹ NM0,t (11) with variable definitions analogous to equation (10), except that Bt equals live births during the time interval from t to t+1. For the special case where the population estimated comprises an ‘‘open’’ age category, like 100 and over, the equation becomes RX,t⫹1 ⫽ Rx⫺1,t ⫹ RX,t⫺ DX,t ⫹ NMX,t (10) where RX,t+1 and RX,t designate persons age x and older, at time t+1 and time t, respectively, and DX,t and NMX,t designate deaths and net migration, respectively, from t to t+1 of persons who would be age x and older at time t+1. Here, Rx-1,t is defined as in equation (10). A fundamental property of this method is its tracking of a birth cohort, or persons born in a given year, to a later date through aging; hence it optimizes the comparability of the population of birth cohorts over time. But, in producing a time series of projections or estimates of a population age distribution, the objective is rather to maintain consistency from month-to-month in the size of age groups, or the number of persons in a specific, unchanging age range. As D–12 a result of this, the cohort component method of updating age distributions is highly sensitive to discontinuities from age group to age group in the base population. Unfortunately, two attributes of the census base population render the assumption of continuity across age groups untenable. First, underenumeration in the census is highly age-specific. This problem is most egregious for age categories of Black males in the age range from 25 to 40, which represent a ‘‘trough’’ in census coverage. It is lowest near the middle of the age range, as is evident from the age pattern of census-based sex ratios within the range. Consequently, consecutive years of age are inconsistent with respect to the proportion of persons missed by the census. The second troublesome attribute of the census base population is the presence of distinct patterns of age reporting preference, as manifested by a general tendency to underreport birth years ending in one and overreport birth years ending in zero. Because the census date falls in the first half of a decade year, this results in an exaggeration of the number of persons with age ending in nine, and an understatement of the number with age ending in eight. Were the cohort component method applied to this distribution without adaptation, these perturbations in the age distribution would progress up the age distribution with the passage of time, impairing the consistency of age groups in the time series. The solution to this problem, first adopted in the 1970s, is an adaptation of the cohort-component method that seeks to prevent the aging of spurious elements of the base population age structure. The resulting method is known as ‘‘inflation-deflation.’’ In principle, the method depends on the existence of an alternative base population distribution by age, sex, race, and origin, for the census date, which is free of age-specific underenumeration and inconsistent age reports, but as consistent as possible with the census in every other aspect of the population universe. Under ideal circumstances, the alternative base distribution would be the population that would have been enumerated by the census, if the census had been free of undercount and age misreporting, although this requirement is neither realistic nor essential. It is essential, however, that single-year-of-age categories be consistent with one another, in the sense that they share the same biases. The correctness of the overall level of the alternative base population (absence of biases in the total population of all ages) is unimportant. Once such a base population distribution is available, it can be updated, recursively, from the census date to the estimates reference date using the cohort component logic of equations (10), (11), and (12) without concern for bias arising from discontinuities in the population by age. To update a census-level population from one date to another, ‘‘inflation-deflation factors’’ are computed as the ratio of each single-year age group in the census base population to the same age group in the alternative base population. If the alternative population represents a higher degree of coverage than the census base population, the factors can be expected to have values somewhat less than one. If the alternative population is free of age misreporting, then overreported ages in the census may have factors greater than one, while underreported ages will have factors less than one. If, as in the present application, the calculations are done within categories of sex, race, and Hispanic origin, the factors can take on unnatural values if there is relative bias between the census and the alternative population in the reporting of these variables. Such is the case with the current application for ‘‘all other races,’’ the racial category combining American Indian, Eskimo, Aleut, Asian, and Pacific Islanders. This group tends to be more heavily reported in the census than in other administrative data sources. The census-level population at the base date is ‘‘inflated,’’ through multiplication by the reciprocal of the factors, to be consistent with the alternative population base. The resulting distribution is updated via cohort component logic (equations (10) through (12)), and the factors are multiplied by the same age groups (not the same birth cohorts) on the reference date, thereby deflating the estimates back to ‘‘census level.’’ Because the inflation-deflation factors are held constant with respect to age rather than birth cohort, those aspects of the base age distribution that should not increase in age from year-to-year are embodied in the inflation-deflation factors, and those aspects that should indeed ‘‘age’’ over time are embodied in the alternative base distribution. A mathematical representation of this procedure is given by the following equation, analogous to equation (10) for the cohort component method ( A Rx,0 Rx—1,0 A A A ´ R ⫽ R — Dx,t ⫹ NMx,t ⫹ NMx,t x,t⫹1 A Rx⫺1,0 x—1,t Rx,0 ) (13) ´ where R x,t⫹1 is the census-level estimate of population age ´ A x at time t+1, R x⫺1,tis the same for age x-1 at time t, Rx,0 is the resident population age x in the alternative distribution A at the census date, Rx—1,0 is the same for age x-1, and A A Dx,t and NMx,t are deaths and net migration for the interval beginning at time t, respectively, consistent with the alterA native population, age x at time t+1. Note that Rx,0 Rx,0 is the inflation-deflation factor for age x. A similar adaptation can be applied to equations (11) and (12) to obtain expressions for estimates for age 0 and the open age category, respectively. The inflation-deflation procedure does not preserve exact additivity of the age groups to the external sex-race-origin total Rt. It is therefore necessary, in order to preserve the balancing equation for the population totals, to introduce a final proportional adjustment. This can be expressed by the formula Ⲑ ´ Rx,t ⫽ R x,t Rt (14) 100 兺R´ y,t y=0 D–13 which effectively ensures the additivity of the resident population estimates Rx,t by age (x) to the external total Rt. The alternative base population for the census date used in the Bureau’s population estimates program, including the survey control projections, is known as the Demographic Analysis (DA) resident population. Originally devised to measure underenumeration in the census, this population is developed, up to age 65, from an historical series of births, adjusted for underregistration. This series is updated to a population on April 1, 1990, by sex and race using cumulative data on deaths and net migration from date of birth to the census date. For ages 65 and over, the population is based on medicare enrollees, as this population is considered more complete than the census enumeration with respect to coverage and age reporting. This population is assumed to possess internal consistency with respect to age superior to that of the census because births and deaths, which form the mainstay of the estimates of age categories, are based on administrative data series rather than subjective reporting of age from an underenumerated population. Some adaptation of the administrative series was necessary to standardize the registration universe for the time series of births and deaths, which changed, as various states were added to the central registration network. For a description of this method, see Fay, Passel and Robinson (1988). One might reasonably ask why the results of the PostEnumeration Survey (PES), used to adjust the CPS controls for underenumeration, were not adopted as the alternative population. The reason is that the PES age distribution was based on application of adjustment factors to the census population defined only for large age categories. This method does not address census-based inconsistencies among single years of age within the categories; moreover, it introduces serious new inconsistencies between ages close to the edges of the categories. Hence, its use as an alternative base for inflation-deflation would be inappropriate. This fact does not impair the appropriateness of PES-based adjustment as a means of addressing undercoverage in the completed survey controls, since the results may be superior in their treatment of other dimensions of the population distribution, such as state of residence, race, origin, or large aggregates of age. Having noted the relative appropriateness of the DA base population for the inflated series, it is not without technical limitations. 1. Because of its dependency on historical vital statistics data, it can only be generated for three categories of race—White, Black, and all other. The distribution of the third category to American Indian and API assumes the distribution of the census base for each age-sex group. Therefore, spurious disturbances in the census age distributions that differ for these two groups remain uncorrected in the DA base population. 2. For the same reason, the two categories of Hispanic origin cannot be generated directly in the DA base. The practical solution was to assume that the proportion Hispanic in each age, sex, and race category matched the 1990 census base population. Because of this assumption, no distinctly Hispanic properties of the age discontinuity in the census population could be reflected in the DA base employed in the estimates. 3. The reliance by the DA procedure on medicare enrollee data for the population 65 years of age and over, while defensible as a basis for adjusting the elderly population, leaves open the possibility of inconsistency in the age distribution around age 65. 4. The DA procedure assumes internally consistent registration of deaths and an accounting of net migration in the historical series, although adaptation had to be made for the changing number of states reporting registered births and deaths to the Federal government. Deviations from this assumption could affect the internal consistency of the age distribution, especially for the older ages under 65. 5. Because the DA population is based primarily on vital registration data, race reporting does not always match race reporting in the census. In particular, births and deaths reported to NCHS are less likely to be coded American Indian and Asian and Pacific Islander than respondents in the census. While this has no direct effect on estimated distributions of population by race (inflation-deflation only affects age within sex-race totals), the population defining the ‘‘ageable’’ attributes of the base age distribution for these groups is smaller (roughly 15 percent for most ages) than the base used to compute sex-race-origin-specific population totals. 6. The inflation-deflation method, as currently implemented, assumes current deaths and all the components of migration to match in the DA universe and the census base universe by sex, race, and origin. This amounts A A and NMx,t in equation (13) sum to D to saying that Dx,t and NM in equation (1), respectively. In the case of births, we assume births adjusted for underregistration to be consistent with the DA population, and births without adjustment to be consistent with the censuslevel population. In principle, the application of components of change to a DA-consistent population in equation (13) roughly parallels the derivation of the DA population.6 However, the components used to update the census-level population in equations (1) or (7) 6 One deviation from this rule occurs in the treatment of births by race. The DA population defines race of child by race of father; the postcensal update currently assigns race of child by race of mother. This practice will likely be modified in the near future. A second deviation occurs in the accounting of undocumented immigration and the emigration of legal residents. As of 1995, the postcensal estimates had incorporated revised assumptions, which had not been reflected in the DA base population; moreover, the revised assumptions were conspicuously ‘‘census-level,’’ rather than ‘‘DA-level’’ assumptions. D–14 should theoretically exclude the birth, death, or migration of persons who would not be enumerated, if the census-level character of the estimates universe is to be maintained. Finally, we note the DA population can be estimated only for the resident population universe, so the process of estimating age distributions must be carried out on the resident population. The accounting of total population, on the other hand, is for the civilian noninstitutional universe. The resident universe must then be adapted to the civilian noninstitutional population by subtracting resident Armed Forces and civilian institutional population estimates and projections by age, sex, and race, from the resident population, as of the estimate or projection reference date. Fortunately, our sources of information for the institutional and Armed Forces populations yield age distributions for any date for which a total population is available; hence, this adaptation is uncomplicated. Distribution of Census Base and Components of Change by Age, Sex, Race, and Hispanic Origin Our discussion until now has addressed the method of estimating population detail in the CPS control universe, and has assumed the existence of various data inputs. The current section is concerned with the origin of the inputs. Modification of the census race and age distributions. The distribution of sex, race, and Hispanic origin in the census base population is determined by the census, but with some adaptation. The 1990 census included questions on sex, race, and Hispanic origin, as was the case in 1980. The race question elicited responses of White, Black, American Indian, Eskimo, Aleut, and ten Asian and Pacific Islander categories, including ‘‘other API.’’ It also allowed respondents to write-in a race category for ‘‘other race.’’ The Hispanic origin question elicited responses of ‘‘No (not Spanish/Hispanic),’’ and a series of ‘‘Yes’’ answers, including Mexican, Puerto Rican, Cuban, and other Spanish/Hispanic, with the last category offering a possibility of write-in. While the initial census edit process interpreted the write-in responses to the race question, it left a substantial number of responses not directly codable to any White, Black, American Indian, Eskimo, Aleut, or Asian and Pacific Islander group. In 1980 this ‘‘other or not specified’’ category comprised 6.7 million persons; in 1990, it comprised 9.8 million persons. The overwhelming majority, in both cases, consisted of persons who gave a positive response to the Hispanic origin question, and/or a Hispanic origin response to the race question. However, the Hispanic component of the ‘‘other or not specified’’ race category comprised somewhat less than half of the total Hispanic population (based on the Hispanic origin response), with most of the balance classified as White. The existence of a residual race category with these characteristics was inconsistent with other data systems essential to the estimation process, namely vital statistics and Armed Forces strength statistics. It also stood at variance with Office of Management and Budget Directive 15, a 1977 ruling intended to standardize race and ethnicity reporting across Federal data systems. This directive specified an exhaustive distribution of race that could either be cross-classified with Hispanic origin, into a 4-by-2 matrix, or combined into a single-dimensional variable with five categories, including American Indian, Eskimo, and Aleut (one category), Asian and Pacific Islander (one category), non-Hispanic White, non-Hispanic Black, and Hispanic origin. It was thus necessary to adapt the census distribution in such a way as to eliminate the residual race category. The Bureau opted to follow the 4-by-2 crossclassification in 1990. Prior to 1990, the American Indian and Asian and Pacific Islander categories were combined into a single category in all Census Bureau population estimates, and the race distribution included persons of Hispanic origin separately, with race unspecified. In 1990based estimates, these two racial categories were separated, and the procedure was expanded to estimate the full race-Hispanic cross-classification, although at present this disaggregation is recombined for the purpose of weighting the CPS. The modification of the 1990 race distribution occurred at the level of individual microdata records. Persons of unspecified race were assigned to a racial category using a pool of ‘‘race donors.’’ This pool was derived from persons with a specified race response, and the identical response to the Hispanic origin question (e.g., Mexican, Puerto Rican, Cuban, other Hispanic, or not Hispanic). Thus, a respondent’s race (if initially unspecified) was, in expectation, determined by the racial distribution of persons with the same Hispanic origin response who resided in the vicinity of the respondent. The distribution by age from the 1990 census also required modification. The 1990 census form asked respondents to identify their ages and their years of birth. While the form provided explicit direction to include only those persons who were alive and in the household on the census date, there was no parallel instruction to state the ages of household members as age on April 1, 1990. It was apparent that many respondents reported age at time of completion of the form, time of interview by an enumerator, or time of their next anticipated birthday. Any of these could occur several months after the April 1 reference date. As a result, age was biased upward, a fact most noticeable through gross understatement of the under 1-year-of-age category. While this distribution was published in most 1990 census publications and releases, it was considered inadequate for postcensal estimates and survey controls. The resulting modification was based on reported year of birth. Age was respecified by year of birth, with allocation to first quarter (persons aged 1990 minus year of birth) and last three quarters (aged 1989 minus year of birth) based on a historical series of birth by month derived from birth registration data. This methodology is detailed in U.S. Bureau of the Census (1991). D–15 As was the case with race, the recoding of age occurred directly on the individual records, after preliminary edits to eliminate reports of year of birth or age inconsistent with other variables (such as household relationship). The tabulations required for the base distribution for estimates could thus be obtained by simple frequency distributions from the individual-record file. There was also a trivial modification of the distribution by sex that came about because the assignment of a very small percentage of census respondents of unknown sex was linked to respondent’s age. The modification of the age distribution will be discussed under ‘‘Distribution of Population Projections by Age.’’ The resulting distribution is known to the Census Bureau estimates literature as ‘‘MARS,’’ an acronym for ‘‘modified age, race, and sex.’’ We note parenthetically that the Demographic Analysis (DA) population, used as the alternative base for the inflation-deflation method, did not require modifying with respect to age and race because it was derived primarily from vital statistics and immigration. However, the historical distribution of international migrants by race was based on the MARS distribution by race within country of birth. More importantly, the need to expand the race detail of the original DA population required introducing the MARS distribution to the split between American Indians and API, as well as any indirect effect of MARS on the distribution by Hispanic origin within race. Distribution of births and deaths by sex, race, and Hispanic origin. The principal source of information on sex and race for births and deaths is the coding of sex and race on birth and death certificates (race of mother, in the case of births). These results are coded by NCHS on detail files of individual birth and death records. Hispanic origin is also coded on most vital records, but a substantial number of events did not receive a code, in some cases because they occurred in states that do not code Hispanic origin on birth and death certificates. For births, the unknown category was small enough to be distributed to Hispanic and not Hispanic in proportion to the MARS distribution of persons aged under 1 year. For deaths, the number of unknowns was sufficiently large (in some years, exceeding the number of deaths known to be Hispanic) to discourage the use of the NCHS distribution. Life tables were applied to a projected Hispanic origin population by age and sex; the resulting distribution was aggregated to produce totals for Hispanic and not Hispanic by sex. While the resulting distributions of births and deaths resemble closely the MARS distribution of the base population, a few discrepancies remained. Some are of sufficient concern to prompt adaptations of the distributions. 1. NCHS final data through 1992 included, for both births and deaths, a small category of ‘‘other races,’’ not coded to one of the categories in the MARS (OMB) distribution. The practice to date has been to include these in the Asian and Pacific Islander category, although a closer examination of the characteristics of these events in 1994 suggested some should be coded elsewhere in the distribution, principally Hispanic origin, with race other than API. 2. The reporting of American Indian and Asian and Pacific Islander categories in the census, hence also in the MARS base population, has tended to exceed the reporting of these categories in birth and death data. This fact is symptomized by unrealistically low fertility and mortality rates when MARS-consistent population estimates are used as denominators. This fact has not been addressed in estimates and projections to date, but is currently under review for possible future adaptation. 3. Persons of Hispanic origin and race other than White are substantially more numerous in the MARS base population than in the vital statistics universe. This can best be explained by the procedure used to code race in the base population. The initial edit of write-in responses to the census race question was done independently of the Hispanic origin response. Consequently, a considerable number of write-in responses allocated to non-White racial categories were of Hispanic origin. While the subsequent allocation of the ‘‘other (unspecified) race’’ category in the MARS procedure did indeed reflect Hispanic origin responses, it was possible for ‘‘race donors’’ selected for this procedure to have been persons of previously assigned race. Because the census (MARS) base population defines the population universe for estimates and projections, the non-White race categories of the Hispanic origin population have been estimated using a race-independent assumption for Hispanic age-specific fertility and mortality rates, applied to MARS-consistent projected populations. The NCHS race totals for Hispanic and non-Hispanic combined have been considered sufficiently close to consistency with MARS to warrant their adoption for the population estimates; hence, the non-Hispanic component of each racial group is determined by subtraction of Hispanic from the total. Hence, this adaptation will have no direct effect on second stage controls to the CPS, as long as the control procedure does not depend on the crossclassification of race with Hispanic origin. A further issue that has not been fully addressed is the possible effect on the race distribution of the practice of identifying the race of a child by the race of its mother. Research is currently in progress to define race and Hispanic origin of child by a method that recognizes the race and origin of both parents, using census results on the reported race of children of biracial parentage. Deaths by age. The distribution of deaths by age presents yet another need for adaptation of the population universe. However, its use for census-based estimates requires the D–16 application of projected rates to the oldest age groups, rather than a simple decrement of the population by deaths in an age group. Because a cohort-component method (or in the present case, inflation-deflation) is required to estimate the age distribution of the population, the procedure is extremely sensitive to the reporting of age on death certificates and in the Demographic Analysis base population among elderly persons. We recall that medicare enrollees, not vital statistics, form the basis for the DA age distribution of the elderly. Because deaths of the oldest age groups may represent a substantial portion of the living population at the beginning of an age interval, relatively small biases caused by differential age reporting can produce large cumulative biases in surviving populations over a few years’ time. This, in fact, occurred for estimates produced during the 1980s. It was the practice then to decrement the population at all ages using data on numeric deaths. Because there were not enough deaths to match the extreme elderly population, the population in the extreme elderly groups grew rapidly in the estimates. This was most noted in the population over 100 years of age; while this is of no direct concern to the CPS, which does not disaggregate age above 85, the situation provoked legitimate concern that an overestimate of the population 85 and over was also occurring, albeit of lesser proportion. At the opposite extreme, the application of numeric deaths for the oldest age groups could produce negative populations, if the number of deaths in a cohort exceeded the living population at the beginning of the interval. This problem was solved in 1992 by developing a schedule of age-specific death rates using life tables for racial and Hispanic origin categories. When life table death rates are applied to a population, the fact that proportions of the living population in each age group must die each year ensures the timely demise of the oldest birth cohorts. Of course, the life tables are themselves reliant on the assumed compatibility of the numerators and denominators of the death rates. However, the fact that the population estimates underlying the rates had incorporated earlier death rates was sufficient to limit any cumulative error from differential age reporting. The current practice is to incorporate numeric deaths from NCHS through 69, apply life table-based death rates to ages 70 and over by single year of age, then proportionately adjust deaths to ages 70 and over to sum to NCHS-based totals by sex and race. Thus, the only cumulative bias entering the series affecting the extreme elderly would arise from differential reporting of age by under or over 70 years, which is deemed to be of minor concern. International migration by age, sex, race, and Hispanic origin. Legal international migration to the U.S. is based on actual records of immigration from INS or refugee arrivals from ORR, so the statistics include direct reporting of sex and country of birth or citizenship as well as age. Race and Hispanic origin are not reported. The procedure to obtain the race and Hispanic origin variables depends on a distribution from the 1990 census sample edited detail file (1 in 6 households or ‘‘long form’’ sample) of foreign-born persons arrived since the beginning of 1985, by sex, race (MARS categories) Hispanic origin, and country of birth. This distribution is used to distribute male and female immigrants and refugees by race and Hispanic origin. This procedure is very robust for most country-of-birth categories, since migrants from most foreign countries are heavily concentrated in a single race-origin category (the notable exception being Canada). For the remaining components of international migration, indirect methods are required to produce the distribution by age, sex, race, and origin. 1. For annual emigration of legal residents (222,000 total), the foreign-born and native-born were distributed separately. For emigration of the foreign-born, the age-sex distribution is a byproduct of the determination of the annual allowance of 195,000 persons. Because emigration is computed as a residual of the foreign-born from two censuses (with adjustment for age-specific mortality), their age and sex distribution can rest on the censuses themselves. This logic also yields country of birth, so race and origin are imputed in the same manner as for legal immigrants and refugees, previously described. The 27,000 annual allowance of native-born emigrants is assigned a distribution by all characteristics matching the nativeborn population from the 1990 census. 2. For net undocumented migration, we rely primarily on a distribution supplied by INS, giving the distribution by age, sex, country of birth, and period of arrival, for persons who legalized their residence under provisions of the Immigration Reform and Control Act (IRCA). Age is back-dated from date of legalization to date of arrival, since undocumented migrants are assumed to be residents of the United States from the time they arrive. An allowance is made, based on a life table application, for persons who died before they could be legalized. Once again, country of birth was used to assign race and Hispanic origin based on data from the 1990 census on the race and Hispanic origin of foreign-born migrants by country of birth. This assumption is subject to two foreseeable biases; (1) it takes no account of undocumented migrants who never legalized their status, and who may have different characteristics than those who legalized; and (2) it takes no account of persons who entered and departed prior to legalizing, who would (if nothing else) tend to make the migration flow more youthful since such persons are older upon departure than upon entry. 3. The distribution by age and sex of the net migration flow from Puerto Rico arises from the method of imputation. The method of cohort survival was used to estimate the net migration out of Puerto Rico between the 1980 and 1990 census dates. This method yields D–17 an age distribution as a byproduct. Race and Hispanic origin were not asked in the Puerto Rican census questionnaire and could not be considered in the census survival method. The distribution within each age-sex category is imputed from the one-way flow of persons from Puerto Rico to the United States from 1985 to 1990, based on the 1990 census of the 50 states and the District of Columbia. 4. As previously noted, we assume the overall net migration of legal temporary residents other than refugees to be constant and of a magnitude and distribution necessary to maintain a constant stock of legal temporary residents in the United States. The consideration of this component in the accounting system is of virtually no importance to population totals, but is quite important to the age distribution. Were it not considered, the effect would be to age temporary residents enumerated in the census through the distribution; whereas, in reality they are more likely replaced by other temporary residents of similar age. This stock distribution of 488,000 persons enumerated in the 1990 census is derived by identifying characteristics, measured by the census ‘‘long form’’ (20 percent sample) data, resembling those that qualify for various nonimmigrant visa categories. The largest such category is foreign students. We estimate an annual distribution of net migration by computing the difference of two distributions of population stock. The first is simply the distribution of the 488,000 persons from the 1990 census. The second is the distribution of the same population after the effects of becoming 1 year older, including losses to mortality, estimated from a life table. Thus, the migration distribution effectively negates the effects of cumulative aging of these persons in the cohortcomponent and inflation-deflation methods. Migration of Armed Forces and civilian citizens by age, sex, race, and Hispanic origin. Estimation of demographic detail for the remaining components of change requires estimation of the detail of the civilian citizen population residing overseas, as well as the Armed Forces residing overseas and Armed Forces residing in the United States. The first two are necessary to assign demographic detail to the net migration of civilian citizens; the third is required to assign detail to the effects of Armed Forces recruitment on the civilian population. Distributions of the Armed Forces by branch of service, age, sex, race/origin, and location inside or outside the United States are provided by the Department of Defense, Defense Manpower Data Center. The location-and-servicespecific totals closely resemble those provided by the individual branches of the services for all services except the Navy. For the Navy, it is necessary to adapt the location distribution of persons residing on board ships to conform to census definitions, which is accomplished through a special tabulation (also provided by Defense Manpower Data Center) of persons assigned to sea duty. These are prorated to overseas and U.S. residence, based on the distribution of the total population afloat by physical location, supplied by the Navy. In order to incorporate the resulting Armed Forces distributions in estimates, the race-origin distribution must also be adapted. The Armed Forces ‘‘race-ethnic’’ categories supplied by the Defense Manpower Data Center treat race and Hispanic origin as a single variable, with the Hispanic component of Black and White included under ‘‘Hispanic,’’ and the Hispanic components of American Indian and API assumed to be nonexistent. There also remains a residual ‘‘other race’’ category. The method of converting this distribution to consistency with MARS employs, for each age-sex category, the 1990 census MARS distribution of the total population (military and civilian) to supply all MARS information missing in the Armed Forces raceethnic categories. Hispanic origin is thus prorated to White and Black according to the total MARS population of each age-sex category in 1990; American Indian and API are similarly prorated to Hispanic and non-Hispanic. As a final step, the small residual category is distributed as a simple prorata of the resulting Armed Forces distribution, for each age-sex group. Fortunately, this adaptation is very robust; it requires imputing the distributions of only very small race and origin categories. The overseas population of Armed Forces dependents is distributed by age and sex, based on an overseas census conducted in connection with the 1970 census, of U.S. military dependents. While this source obviously makes no claim to currency, it reflects an age distribution uniquely weighted in favor of young adult females and children. Race and origin are based on the distribution of overseas Armed Forces, with cross-categories of age-sex with race-origin determined by the marginals. Detail for civilian Federal employees overseas has been provided by the Office of Personnel Management for decennial dates, and assumed constant; their dependents are assumed to have the same distribution as Armed Forces dependents. Having determined these distributions for populations, the imputation of migration of civilian citizens previously described (equation (9)) can be carried out specifically for all characteristics. The only variation occurs in the case of age, where it is necessary to carry out the imputation by birth cohort, rather than by age group. The two very small components of deaths of the Armed Forces overseas and net recruits to the Armed Forces from the overseas population, previously described, are also assigned the same level of demographic detail. Deaths of the overseas Armed Forces are assigned the same demographic characteristics as the overseas Armed Forces. Net recruits from overseas are assigned race and origin, based on an aggregate of overseas censuses, in which Puerto Rico dominates numerically. The age and sex distribution follows the worldwide Armed Forces. D–18 The civilian institutional population by age, sex, race, and origin. The fundamental source for the distribution of the civilian institutional population by age, sex, race, and origin is a 1990 census/MARS tabulation of institutional population by age, sex, race, origin, and type of institution. The last variable has four values: nursing home, correctional facility, juvenile facility, and a small residual. From this table, participation rates are computed for the civilian population. These rates are applied in each year to the current estimated distribution of the civilian population. As previously observed, the institutional population total is updated annually, until July 1 of the year 2 years prior to the CPS reference date, by the results of a special report on selected group quarters facilities. The report provides an empirical update of the group quarters populations by type, including these four institutional types. The demographic detail can then be proportionately adjusted to sum to the type-specific totals. For dates beyond the last group quarters survey, the detailed participation rates are recomputed for the last available date, and these are simply applied to the estimated or projected civilian population distribution. Population Controls for States The second-stage weighting procedure for the CPS requires a distribution of the national civilian noninstitutional population ages 16 and over by state. This distribution is determined by a linear extrapolation through two population estimates for each state and the District of Columbia. The reference dates are July 1 of the years that are 2 years and 1 year prior to the reference date for the CPS population controls. The extrapolated state distribution is forced to sum to the national total population ages 16 and over by proportional adjustment. For example, the state distribution for June 1, 1996, was determined by extrapolation along a straight line determined by two data points (July 1, 1994, and July 1, 1995) for each state. The resulting distribution for June 1, 1996, was then proportionately adjusted to the national total for ages 16 and over, computed for the same date. This procedure does not allow for any difference among states in the seasonality of population change, since all seasonal variation is attributable to the proportional adjustment to the national population. The procedure for state estimates (e.g., through July 1 of the year prior to the CPS control reference date), which form the basis for the extrapolation, differs from the nationallevel procedure in a number of ways. The primary reason for the differences is the importance of interstate migration to state estimates and the need to impute it by indirect means. Like the national procedure, the state-level procedure depends on fundamental demographic principles embodied in the balancing equation for population change; however, they are not applied to the total resident or civilian noninstitutional population but to the household population under 65 years of age. Estimates of the group quarters population under 65 and the population 65 and over employ a different logic, based (in both cases) on independent sources of data for population size, benchmarked to the results of the last census. Furthermore, while nationallevel population estimates are produced as of the first of each month, state estimates are produced at 1-year intervals, with July 1 reference dates. The Population 65 Years of Age and Over The base population by state (which can be either a census enumeration or an earlier estimate) is divided into three large categories; the population 65 years of age and over, the group quarters population under 65, and the household (nongroup quarters) population under 65. The first two categories are estimated directly by stock-based methodologies; that is, the population change is estimated as the change in independent estimates of population for the categories. In the case of the population ages 65 and over, the principal data source is a serial account of medicare enrollees, available by county of residence. Because there is a direct incentive for eligible persons to enroll in the medicare program, the coverage rate for this source is very high. To adapt this estimate to 1990 censuslevel, the change in the medicare population from census date to estimate date is added to the 1990 census enumeration of the population 65 and over. The Group Quarters Population Under 65 Years of Age The group quarters population is estimated annually from the same special report on selected group quarters facilities, conducted in cooperation with the Federal and State Cooperative Program for Population Estimates (FSCPE), used to update national institutional population estimates. Because this survey is conducted at the level of the actual group quarters facility, it provides detail at any level of geography. Its results are aggregated to seven group quarters types: the four institutional types previously discussed, and three noninstitutional types: college dormitories, military barracks, and a small noninstitutional group quarters residual. Each type is disaggregated to the population over and under 65 years of age, based on 1990 census distributions of age by type. The change in the aggregate group quarters population under 65 forms the basis for the update of the group quarters population from the census to the estimate date. The Household Population Under 65 Years of Age The procedure for estimating the nongroup quarters population under 65 can be analogized to the national-level procedures for the total population, with the following major differences. D–19 1. Because the population is restricted to persons under 65 years of age, the number of persons advancing to age 65 must be subtracted, along with deaths of persons under 65. This component is composed of the 64-year-old population at the beginning of each year, and is measured by projecting census-based ratios of the 64-year-old to the 65-year-old population by the national-level change, along with proration of other components of change to age 64. 2. All national-level components of population change must be distributed to state-level geography, with separation of each state total into the number of events to persons under 65 and persons 65 and over. Births and deaths are available from NCHS by state of residence and age (for deaths). Similar data can be obtained—often on a more timely basis—from FSCPE state representatives. These data, after appropriate review, are adopted in place of NCHS data for those states. For legal immigration (including refugees), ZIP Code of intended residence is coded by INS on the immigrant public use file; this is converted to county and state geography by a program known as ZIPCRS, developed cooperatively by the Census Bureau and the U.S. Postal Service (Sater, 1994). Net undocumented migration is distributed by state on the basis of data from INS on the geographic distribution of undocumented residents legalizing their status under the Immigration Reform and Control Act (IRCA). The emigration of legal residents is distributed on the basis of the foreign-born population enumerated in the 1990 census. 3. The population updates include a component of change for net internal migration. This is obtained through the computation for each county of rates of out-migration based on the number of exemptions under 65 years of age claimed on year-to-year matched pairs of IRS tax returns. Because the IRS codes tax returns by ZIP Code, the previously mentioned ZIPCRS program is used to identify county and state of residence on the matched returns. The matching of returns allows the identification for any pair of states or counties of the number of ‘‘stayers’’ who remain within the state or county of origin and the number of ‘‘movers’’ who change address between the two. This identification is based on the number of exemptions on the returns and the existence or nonexistence of a change of address. Dividing the number of movers for each pair by the sum of movers and stayers for the state of origin yields a matrix of out-migration rates from each state to each of the remaining states. The validity of these interstate migration rates depends positively on the level of coverage (the proportion of the population included on two consecutive returns). It is negatively affected by the difference between tax filers and tax nonfilers with respect to migration behavior, since nonfilers are assumed to migrate at the same rate as filers. The Exclusion of the Population Under 16 Years of Age, Armed Forces, and Inmates of Civilian Institutions The next step in the state-level process is the estimation of the age-sex distribution, which allows exclusion of the population under 16. The major input to this procedure is a nationwide inquiry to state governments and some private and parochial school authorities for data on school enrollment. The number of school-aged children (exact age 6.5 to 14.5) for each annual estimate date is projected, without migration, from the census date by the method of cohort survival. This method begins with the number of persons enumerated in the last census destined to be of school age on the estimate date, known as the school-age cohort. For reference dates more than 6.5 years from the census, the enumerated cohort must be augmented with a count of registered births occurring between the census date and the estimate date minus 6.5 years. For example, if the estimate reference date is July 1, 1995, the enumerated cohort will consist of persons age 1.25 to 9.25, since the census of April 1, 1990, is 5.25 years prior to the estimate date. On the other hand, if the estimate reference date is July 1, 1998, the cohort will consist of the census population aged under 6.25 years, plus registered births occurring from April 1, 1990, through December 31, 1991. The number of deaths of this cohort of children from the census date to the estimate date (estimated from vital registration data) is then subtracted to yield a projection of the schoolage population on the estimate reference date. Subtracting the resulting projection from actual estimates of school-age population, determined by updating the census school-age population via school enrollments, yields estimates of cumulative net migration rates for school-age cohorts for each state. These estimates are used to estimate migration rates for each age group up to age 18, using historical observation of the relationship of the migration of each age group to the migration of the school-age population. Populations by age are proportionately adjusted to sum to the national total, yielding estimates of the population under age 16. These are subtracted from the total to yield the population ages 16 and over, required to produce the population of the CPS control universe. Once the resident population 16 years of age and older has been estimated for each state, the population must be restricted to the civilian noninstitutional universe. Armed Forces residing in each state are excluded, based on reports of location of duty assignment from the branches of the Armed Forces (including the homeport and projected deployment status of ships, for the Navy). The group quarters report is used to exclude the civilian institutional population. The group quarters data and some of the Armed Forces data (especially information on deployment of naval vessels) must be projected from the last available date to the later annual estimate dates. D–20 Dormitory Adjustment The final step in the production of the CPS base series for states is the adjustment of the universe from the census universe to the CPS control universe with respect to the geographic location of college students living in dormitories. The decennial census form specifically directs students living in dormitories to report the address of their dormitories, while CPS interviews identify the address of the family home. A dormitory adjustment for each state is defined as the number of students with family home address in the state residing in dormitories (in any state) minus the number of students residing in dormitories in the state. The latter estimate is a product of the previously mentioned group quarters report, restricted to college dormitories. The former is the same group quarters dormitory estimate, but adjusted by a ratio of students by family resident state to students by college enrollment state, which is computed from data from the National Center for Education Statistics (NCES). Adding the dormitory adjustment (which may be negative or positive) to the censuslevel civilian noninstitutional population ages 16 and over yields the population ages 16 and over for a universe consistent with the CPS in every regard except adjustment for underenumeration. ADJUSTMENT OF CPS CONTROLS FOR NET UNDERENUMERATION IN THE 1990 CENSUS Beginning in 1995, the CPS controls were adjusted for net undercoverage in the 1990 census. This adjustment was based on the results of the Post-Enumeration Survey (PES) carried out in the months following the census. From this, census coverage ratios were obtained for various large ‘‘poststrata’’ or cross-categories of a few variables determined to be most related to underenumeration. The present section addresses briefly the methodology of this survey, the application of coverage ratios for poststrata to obtain numerical levels of underenumeration in the CPS universe, and the application of the resulting numerical adjustment to the CPS population controls. Important to the rationale behind the adjustment of CPS controls is the origin of the decision to adjust them for underenumeration, given that population estimates published by the Census Bureau are not. The Post-Enumeration Survey was, in its design, conceived as a method of providing an adjustment for net underenumeration in the 1990 census. In the year following the 1990 census date, estimates of net underenumeration were produced. In a decision effective July 15, 1991, Secretary Robert Mosbacher announced a decision not to adjust the census for undercount or overcount (Federal Register, 1991). Cited in the decision was a large amount of research pointing to improvement in the estimates of the national population resulting from incorporation of the Post-Enumeration Survey, coupled with a lessening of the accuracy of estimates for some states and metropolitan areas. The decision also called for further research into the possibility of incorporating PES results in the population estimates programs. Research conducted over the following 18 months under the auspices of the Committee on Adjustment of Postcensal Estimates (CAPE), a committee appointed by the Director of the Census Bureau, provided a description of the PES methodology, and a detailed analysis of the effect of adjustment on national, state, and county estimates. This research is summarized in a report, which forms the basis for all technical information regarding the PES discussed here (U.S. Census Bureau, 1992). After extensive public hearings, including testimony by members of Congress, Barbara Bryant, then Director of the Census Bureau, decided, effective December 30, 1992, not to incorporate PES results in the population estimates programs, but offered to calibrate Federally sponsored surveys, conducted by the Census Bureau, to adjusted population data (Federal Register, 1993). The decision cited the finding by CAPE that estimates of large geographic aggregates were improved by inclusion of PES estimates, while the same finding could not be determined for some states and substate areas. In the course of 1993, the Bureau of Labor Statistics opted to include the 1990 census, with adjustment based on the Post-Enumeration Survey, in the controls for the CPS, beginning January 1994. We stress that the intent of the PES from its inception was to serve as a device for adjusting the census, not for the calibration of surveys or population estimates. Its application to population controls for the CPS and other surveys was based on an analysis of its results. Essential to its usefulness is the primacy of national-level detail in the evaluation of survey controls, coupled with the apparent superior performance of the PES for large geographic aggregates. THE POST-ENUMERATION SURVEY AND DUAL-SYSTEM ESTIMATION The PES consisted of a reinterview, after the 1990 census, of all housing units within each of a sample of small geographic units. The geographic unit chosen was the census block, a small polygon of land surrounded by visible features, frequently four-sided city blocks bounded by streets. The sampling universe of all such blocks was divided into 101 strata, based on certain variables seen to be related to the propensity to underenumeration. These consisted of geography, city size, racial and ethnic composition, and tenure of housing units (owned versus rented). The strata were sampled; the entire sample consisted of more than 5,000 blocks. The reinterview and matching procedures consisted of an interview with all households within each of the sampled blocks followed by a comparison of the resulting PES observations with original census enumeration records of individual persons. Care was taken to ensure the independence of the reinterview process from the original enumeration, through use of different enumerators, as well as D–21 independent local administration of the activity. Through the comparison of census and PES records of individual persons, matches were identified, as were nonmatches (persons enumerated in one survey but not in the other), and erroneous enumerations (persons who either should not have been enumerated, or should not have been enumerated in that block). Where necessary, follow-up interviews were conducted of census and PES households to determine the status of each individual with respect to his/her match between census and PES or his/her possible erroneous enumeration. This matching procedure yielded a determination for each block of the number of persons correctly enumerated by the census only, the number of persons enumerated by the PES only, and the number enumerated by both. The number of persons missed by both census and survey could then be imputed by observing the relationship—within the PES sample—of the number missed by the census to the number enumerated, and generalizing this relationship to those missed by the PES sample. The resulting estimate was called a ‘‘dual-system estimate,’’ because of its reliance on two independent enumeration systems. In most cases, the dual-system estimate was higher than the original enumeration, implying a net positive undercount in the census. For some blocks, the dual-system estimate was lower than the census count, meaning the number of incorrect enumerations exceeded the number of persons imputed by the matching process, resulting in a net overcount (negative undercount). Having established dual-system estimates of the sampled blocks, a new stratification—-this time of individual persons (rather than blocks)—was defined; the resulting strata were dubbed ‘‘poststrata,’’ as they were used for retrospective analysis of the survey results (as opposed to being part of the sampling design). These poststrata formed the basic unit for which census coverage ratios were computed. They were defined on data for characteristics considered relevant to the likelihood of not being enumerated in the census, including large age group, sex, race (including residence on a reservation, if American Indian), Hispanic origin, housing tenure (living in an owned versus rented dwelling), large city, urban, or rural residence, and geographic region of residence. The choice of categories of these variables was intended to balance two conflicting objectives. The first was to define strata that were homogeneous with respect to the likelihood of enumeration in the census. The second was to ensure that each stratum was large enough to avoid spuriousness in the estimation of census coverage. As a result, the categories were asymmetric with respect to most of the variables: the choice of categories of one variable depended on the value of another. Seven categories of age and sex (four age categories by two sex categories, but with males and females combined for the youngest category) were crossed with 51 categories of the other variables to produce a total of 357 poststrata. The percentages of undercount or overcount were computed for each poststratum by dividing the difference between the dual-system estimate of population and the census enumeration for the sampled population by the census enumeration. Applying these ratios to the poststrata, this time defined for the entire noninstitutional population of the United States enumerated in the 1990 census, yielded estimates of undercoverage. Among variables tabulated in the CPS, there was considerable variation in the adjustment factors by race, since race was an important source of heterogeneity among poststrata. Moreover, as a result of the small number of age categories distinguished in the poststrata, adjustment factors tended to vary sharply between single year age groups close to the limits of neighboring age categories. Application of Coverage Ratios to the MARS Population Undercount ratios were applied to the 1990 MARS distribution by age, sex, race, and origin. Because the poststrata included, in their definition, one racial category (‘‘Other’’) that did not exist in the MARS distribution, undercount ratios for census race-origin categories were imputed to the corresponding MARS categories. These undercount ratios were implemented in the MARS microdata file, which was then tabulated to produce resident population by age, sex, race, Hispanic origin, and state. ‘‘Difference matrices’’ were obtained by subtracting the unadjusted MARS from the adjusted MARS distribution. This was done for both the national age-sex-race-origin distribution, and the 51-state distribution of the civilian noninstitutional population ages 16 years and over. An important assumption in the definition of the difference matrix was the equality of the adjustment for resident population and civilian noninstitutional population. This amounted to an assumption of zero adjustment difference for the active-duty military and civilian institutional populations. The assumption was natural in the case of the institutional population, because this population was expressly excluded from the PES. In the case of the Armed Forces, the population stock is estimated, for both the nation and for states, from data external to the census; hence, there was no basis on which to estimate PES-consistent undercount for the Armed Forces population. Once derived, the differences have been added to all distributions of population used for independent controls for the CPS produced since January 1, 1995. While it would have been technically more satisfying to apply undercount ratios, as defined by the PES, directly to the CPS population controls, this would have required projections of the poststrata for CPS control reference dates, for which no methodology has been developed. This adjustment of the CPS universe is the final step in the process of computing independent population controls for the second-stage weighting of the Current Population Survey. D–22 A question that frequently arises regarding adjustment for underenumeration is how it relates to inflation-deflation, as previously discussed. We need to inflate estimated populations to an ‘‘adjusted level’’ in order to apply cohortcomponent logic to the estimation of the age distribution. Why then is it necessary to deflate to census level, then inflate again to adjust for underenumeration? The answer to this lies in the relationship of the purpose of the inflation-deflation method to the method of adjustment. In the case of inflation-deflation, the sole objective is to provide an age distribution that is unaffected by differential undercount and age reporting bias, from single year to single year of age throughout the distribution. This is necessary because cohort component logic advances the population 1 year of age for each 1-year time interval. The Demographic Analysis population meets this objective well, as it is developed, independently of the census, from series of vital and migratory events that are designed to be continuous, based on time series of administrative data. However, the definition of poststrata for dual-system estimates in the PES is based on the application of uniform adjustment factors to large age categories, with large differences at the limits of the categories. This would not serve the needs of inflation-deflation, even though the PES has been assumed to be a superior means of adjusting the census for underenumeration (at least for subnational geographic units). THE MONTHLY AND ANNUAL REVISION PROCESS FOR INDEPENDENT POPULATION CONTROLS Each month a projection of the civilian noninstitutional population of the United States is produced for the CPS control reference date. Each projection is derived from the population at some base date and the change in the population from the base date to the reference date of the projection. In every month except January, the base date (the date after which new data on population change can be incorporated in the series) is 2 months prior to the CPS reference date. In January, the entire monthly series back to the last census is revised, meaning the base date is the date of the last census (currently April 1, 1990). Early in the decade (in the current decade, January 1, 1994), the population from the most recent decennial census is introduced for the first time as a base for the January revision. As a consequence of the policy of ongoing revision of a month back, for monthly intervals from January 1 to December 1, the monthly series of population figures produced each month for 1 month prior to the CPS reference date is internally consistent for the year of reference dates from December 1 to November 1—meaning the month-to-month change is determined by the measurement of population change. For actual CPS reference dates from January 1 to December 1, the series is not strictly consistent; there is a small amount of ‘‘slippage’’ in the month-to-month intervals, which is the difference between: 1. The population change during the 1 month immediately preceding the CPS reference date, as measured a month after the CPS control figure is produced, and 2. The population change in the month preceding the CPS date, associated with the actual production of the CPS controls. This slippage is maintained for the purpose of preventing cumulative error, since the measurement of population change in the last month at the time of measurement (2) is based entirely on projection (no data are available to measure the change); whereas, some key administrative data (e.g., preliminary estimates of births and deaths) are generally available for the month before the last (1). The slippage rarely exceeds 20,000 persons for the population ages 16 and over, compared to a monthly change in this population that varies seasonally from roughly 150,000 to 260,000 persons, based on calendar year 1994. Because the numerically largest source of slippage is usually the projection of births, most of the slippage is usually confined to the population under 1 year of age, hence, of no concern to labor force applications of the survey. Monthly population controls for states, by contrast, normally contribute no slippage in the series from January to December, because they are projections from July 1 of the previous year. Generally, any slippage in the state series is a consequence of their forced summation to the national total for persons ages 16 and over. The annual revision of the population estimates to restart the national series each January affects states as well as the Nation, as it involves not only revisions to the national-level estimates of population change, but also the distribution of these components to states, and an additional year of internal migration data. For the revision of the entire series each January, preliminary estimates of the cumulative slippage for the national population by demographic characteristic (not the states) are produced in November and reviewed for possible effect on the population control series. Annual slippage (the spurious component of the difference between December 1 and January 1 in CPS production figures) is generally confined to less than 250,000 in a year, although larger discrepancies can occur if the revision to the series entails cumulative revision of an allowance for an unmeasured component (such as undocumented immigration), the introduction of a new census, or a redefinition of the universe (e.g., with respect to adjustment for undercoverage). PROCEDURAL REVISIONS The process of producing estimates and projections, like any research activity, is in a constant state of flux. Various technical issues are addressed with each cycle of estimates for possible implementation in the population estimating procedures, which carry over to the CPS controls. These procedural revisions tend to be concentrated early in the decade, because of the introduction of data from a D–23 new decennial census, which can be associated with new policies regarding population universe or estimating methods. However, they may occur at any time, depending either on new information obtained regarding the components of population change, or the availability of resources to complete methodological research. REFERENCES Ahmed, B. and Robinson, J. G. (1994), Estimates of Emigration of the Foreign-Born Population: 1980-1990, U.S. Census Bureau, Population Division Technical Working Paper No. 9, December, 1994. Anderson, R. N., Kochanek, K. D., Murphy, S. L. (1997), ‘‘Report of Final Mortality Statistics, 1995,’’ Monthly Vital Statistics Report, National Center for Health Statistics, Vol. 45, No. 11, Supplement 2. Fay, R. E., Passel, J. S., and Robinson, J. G. (1988), The Coverage of the Population in the 1980 Census, 1980 Census of Population and Housing, U.S. Census Bureau, Vol. PHC–E–4, Washington, DC: U.S. Government Printing Office. Federal Register, Vol. 56, No. 140, Monday, July 22, 1991. Federal Register, Vol, 58, No. 1, Monday, January 4, 1993. Robinson, J. G. (1994), ‘‘Clarification and Documentation of Estimates of Emigration and Undocumented Immigration,’’ unpublished U.S. Census Bureau memorandum. Sater, D. K. (1994), Geographic Coding of Administrative Records— Current Research in the ZIP/SECTORto-County Coding Process, U.S. Census Bureau, Population Division Technical Working Paper No. 7, June, 1994. U.S. Census Bureau (1991), Age, Sex, Race, and Hispanic Origin Information From the 1990 Census: A Comparison of Census Results With Results Where Age and Race Have Been Modified, 1990 Census of Population and Housing, Publication CPH-L-74. U.S. Census Bureau (1992), Assessment of Adjusted Versus Unadjusted 1990 Census Base for Use in Intercensal Estimates: Report of the Committee on Adjustment of Postcensal Estimates, unpublished, August 7, 1992. U.S. Immigration and Naturalization Service (1997), Statistical Yearbook of the Immigration and Naturalization Service, 1996, Washington, DC: U.S. Government Printing Office. Ventura, S. J., Martin, J. A., Curtin, S. C., Mathews, T.J. (1997a), ‘‘Report of Final Natality Statistics, 1995,’’ Monthly Vital Statistics Report, National Center for Health Statistics, Vol. 45, No. 11, supplement. Ventura, S. J., Peters, K. D., Martin, J. A., and Maurer, J. D. (1997b), ‘‘Births and Deaths: United States, 1996,’’ Monthly Vital Statistics Report, National Center for Health Statistics, Vol. 46, No. 1. SUMMARY LIST OF SOURCES FOR CPS POPULATION CONTROLS The following is a summary list of agencies providing data used to calculate independent population controls for the CPS. Under each agency are listed the major data inputs obtained. 1. The U.S. Census Bureau (Department of Commerce) The 1990 census population by age, sex, race, Hispanic origin, state of residence, and household or type of group quarters residence Sample data on distribution of the foreign-born population by sex, race, Hispanic origin, and country of birth The population of April 1, 1990, estimated by the Method of Demographic Analysis Decennial census data for Puerto Rico 2. National Center for Health Statistics (Department of Health and Human Services) Live births by age of mother, sex, race, and Hispanic origin Deaths by age, sex, race, and Hispanic origin 3. The U.S. Department of Defense Active-duty Armed Forces personnel by branch of service Personnel enrolled in various active-duty training programs Distribution of active-duty Armed Forces Personnel by age, sex, race, Hispanic origin, and duty location (by state and outside the United States) Deaths to active-duty Armed Forces personnel Dependents of Armed Forces personnel overseas Births occurring in overseas military hospitals 4. The Immigration and Naturalization Service (Department of Justice) Individual records of persons immigrating to the United States, month and year of immigration, including age, sex, country of birth, and ZIP Code of intended residence, and year of arrival (if different from year of immigration) 5. Office of Refugee Resettlement (Department of Health and Human Services) Refugee arrivals by month, age, sex, and country of citizenship 6. Department of State Supplementary information on refugee arrivals 7. Office of Personnel Management Civilian Federal employees overseas in overseas assignments E–1 Appendix E. State Model-Based Labor Force Estimation INTRODUCTION Small samples in each state and the District of Columbia result in unacceptably high variation in the monthly CPS composite estimates of state employment and unemployment. The table below gives the sample sizes, the standard errors, and coefficient of variation (CVs) for unemployment and employment assuming an unemployment rate of 6 percent for the states and the Nation as a whole. These numbers are based on the current design which was in effect January 1996 through July 2001.1 Nation . . . . . . . . . . . States . . . . . . . . . . . The Signal Component In the state labor force models, the signal component is represented by an unobserved variance component model with explanatory variables (Harvey, 1989) Table E–1. Reliability of CPS Estimators Under the Current Design CV STD CV Number of households in STD sample 1.90 15.70 0.11 0.94 0.25 1.98 0.16 50,000 1.25 546-4055 Unemployment rate where ␪共t兲 represents the systematic part of the true labor force value and e共t兲 is the sampling error. The basic objective is to produce an estimator for ␪共t兲 that is less variable than the CPS composite estimator. The signal component, θ(t), represents the variation in the sample values due to the stochastic behavior of the population. The sampling error component, e(t), consists of variation arising from sampling only a portion of the population. Employment-topopulation In an effort to produce less variable labor force estimates, BLS introduced time series models to ‘‘borrow strength’’ over time. The models are based on a signalplus-noise approach to small area estimation where the monthly CPS estimates are treated as stochastically varying labor force values obscured by sampling error (Tiller, 1992). Given a model for the true labor force values and sampling error variance-covariance information, signalextraction techniques are used to estimate the true labor force values. This approach was first suggested by Scott and Smith (1974) and has been more fully developed by Bell and Hillmer (1990), Binder and Dick (1990) and Pfefferman (1992). TIME SERIES COMPONENT MODELING For each state, separate models for the CPS employmentto-population ratio and the unemployment rate are developed. In the signal-plus-noise model, the CPS labor force estimate at time t, denoted by y(t), is represented as the sum of two independent processes y共t兲 ⫽ ␪共t兲 ⫹ e共t兲 (1) 1 They do not reflect the increased reliability after the sample expansion in July 2001 due to the State Children’s Health Insurance Program. (See Appendix J.) ␪共t兲 ⫽ ␪ Ⲑ共t兲 ⫹ ␩共t兲 where ␪ Ⲑ共t兲 ⫽ X共t兲␤共t兲 ⫹ T共t兲 ⫹ S共t兲 (2) X(t) = vector of known explanatory variables β(t)= random coefficient vector T(t) = trend component S(t) = periodic or seasonal component η(t)= residual noise. The stochastic properties of each of these components are determined by one or more normally distributed, mutually independent white noise disturbances ␷j共t兲 ⬃ NID共0,␴␷2j兲 (3) where j indexes the individual components. Regression component. The regression component consists of explanatory variables, X(t), with time-varying coefficients, β(t) M共t兲 ⫽ X共t兲␤共t兲. (4) This component accounts for variation in the CPS estimates that can be explained by a set of observable economic variables developed from auxiliary data sources that are independent of the CPS sampling error. A common core of state-specific explanatory variables has been developed: unemployment insurance claims from the FederalState Unemployment Insurance System (Blaustein, 1979), E–2 a nonagricultural payroll employment estimate from the BLS Current Employment Statistics (CES) program (US DOL, 1997), and intercensal population estimates developed by the Census Bureau. The explanatory variable included in the employment model is the CES estimate, adjusted for strikes, expressed as a percent of the state’s population. For the unemployment rate model the explanatory variable is the claims rate defined as the ratio of the worker claims for unemployment insurance (UI) benefits to CES employment. The regression coefficients are modeled as a random walk process (5) ␤共t兲 ⫽ ␤共t ⫺ 1兲 ⫹ ␷␤共t兲, ␷␤共t兲 ⬃ NID共0,␴␤2 兲. The effect of ␷␤共t兲 is to allow the coefficients to evolve slowly over time. A zero variance, ␴␤2 ⫽ 0, results in a fixed regression coefficient. ␷T共t兲 ⬃ NID共0,␴T2兲 (7) ␷R共t兲 ⬃ NID共0,␴R2兲 E 关␷TⲐ 共t兲␷T*共t兲兴 ⫽ 0. The larger the variances, the greater the stochastic movements in the trend. The effect of ␷T共t兲 is to allow the level of the trend to shift up and down, while ␷R共t兲 allows the slope to change. This two-parameter trend model allows for a variety of patterns. If the variances of the two disturbances are both zero, then this component reduces to a fixed linear trend. A random walk results when the level variance is positive and the slope is identically zero and, when explanatory variables are included in the model, results in an intercept which varies over time. Time series components. The explanatory variables in the models account for an important part of the variation in the CPS estimates but, because of definitional and other differences, leave significant amounts of the nonsampling error variation in the CPS estimates unexplained (Tiller, 1989). Therefore, stochastic trend and seasonal components are added to the model to adjust the explanatory variables for differences from the CPS measure of state employment and unemployment. Since the explanatory variables account for a portion of the trend and seasonal variation in the CPS, these stochastic time series components are residuals; they do not account for all the trend and seasonal variation in the CPS. Residual seasonal component. This component is the sum of up to six trigonometric terms associated with the 12-month frequency and its five harmonics Residual trend component. The trend component is represented by a local approximation to a linear trend with a random level, T共t兲, and slope, R(t) 2␲j ␻j = 12 T共t兲 ⫽ T共t⫺1兲 ⫹ R共t⫺1兲 ⫹ ␷⌻* 共t兲 and R共t兲 ⫽ R共t⫺1兲 ⫹ ␷R共t兲 where 6 S共t兲 ⫽ 兺 S 共t 兲. j (8) j⫽1 Each frequency component is represented by a pair of stochastic variables Sj共t兲 ⫽ cos共␻j兲Sj共t⫺1兲 ⫹ sin共␻j兲Sj*共t⫺1兲 ⫹ ␷sj共t兲 (9) Sj*共t兲 ⫽ ⫺sin共␻j兲Sj共t⫺1兲 ⫹ cos共␻j兲Sj*共t⫺1兲 ⫹ ␷*sj共t兲 where ␷sj and ␷*sj are zero mean white noise processes which are uncorrelated with each other, and have a common variance ␴2s . The white noise disturbances are assumed to have a common variance, so that the change in the seasonal pattern depends upon a single parameter. If the common variance is zero then the seasonal pattern is fixed over time. The expected values of the seasonal effects add to zero over a 12-month period. m ␷⌻*共t兲 ⫽ 兺 ␦ ␰ 共t 兲 ⫹ ␷ 共t 兲 k k Residual noise component. The residual noise component consists of random variation unaccounted for by other components plus unusually large transitory fluctuations or outliers T k⫽1 and ␰k共t兲 ⫽ 兵 1 if t ⫽ tk 0 if t ⫽ tk. (6) External shocks which cause permanent shifts in the level of the trend are specified in the disturbance term associated with the trend level, ␷T共t兲. The coefficient, ␦k, represents the impact of the shock at time tk. The disturbance terms ␷T共t兲and ␷R共t兲 are assumed to be mutually uncorrelated white noise random variates with zero means and constant variances ␩共t兲 ⫽ ⌱共t兲 ⫹ O共t兲. (10) Irregular component, Ι(t). In some cases after estimating the signal and sampling error components, we find that the signal still contains significant irregular movements, not attributable to sampling error, which tend to average to zero over a short period of time. In this case an additional irregular component is added to the model specification to E–3 further smooth the estimates of the signal. This component is specified as consisting of a single white noise disturbance with a zero mean and constant variance ⌱共t兲 ⫽ ␷⌱共t兲, ␷⌱共t兲 ⬃ N共0,␴⌱2兲. (11) If the variance for this component is zero, then the irregular is identically zero and can be dropped from the model. Additive outlier component, O(t). An additive outlier represents a one-period transitory shift in the level of the observed series O共t兲 ⫽ 兺 ␭ ␨ 共t 兲 j j j where ␨j共t兲 ⫽ 兵 1 if t ⫽ j 0 otherwise. (12) The coefficient ␭j is the change in the level of the series at time j. Time series are typically influenced by exogenous disturbances that affect specific observations. These outliers may occur because an unusually nonrepresentative sample of households was selected or because some real, nonrepeatable event occurred in the population. Irrespective of its origin, an outlier of this type affects only a single observation at a time and is unrelated to the time series model of the signal. Because an outlier represents a sudden, large, temporary change in the level of the CPS, the model may attempt to initially adjust to this break as if it were permanent. Accordingly, it is important to identify such outliers and then discount them when estimating the signal. While the general model of the signal, just described, is very flexible, not all of the components discussed above are necessarily needed. The seasonal component may have less than 6 frequency components, depending upon the seasonal nature of the series being modeled. The regressor variables may be able to explain a substantial amount of variation in the observed series with fixed coefficients. On average, the CES explains 11 percent of the month-to-month variation in the CPS employment-topopulation ratio (CPSEP) while the UI data account for about 15 percent of the monthly variation in the CPS unemployment rate (CPSRT). If good regressor variables are not available, the signal may be represented by just the time series component; this results in a univariate analysis. Sampling error component. Sampling error is defined as the difference between the population value, θ(t) , and the survey estimate e共t兲 ⫽ y共t兲 ⫺ ␪共t兲 (13) where e(t) has the following properties:2 E关e共t兲兴 ⫽ 0 (14) Var关e共t兲兴 ⫽ ␴e2共t兲 (15) ␳e共l兲 ⫽ E兵e共t兲e共t⫺l兲其 ␴e2共t兲 (16) The CPS estimates contain measurement error which is very difficult to quantify. We ignore this source of error by treating our target variable as the set of values the CPS would produce with a complete census of the population. The sampling error is, therefore, assumed to have a zero expectation. While the variances and, hence, the covariances change over time, the autocorrelations are treated as stationary (see below.) When modeling the sampling error, it is important to account for sample design features which are likely to have a major effect on the error structure e(t). The autocorrelation structure of the CPS sampling error depends strongly upon the CPS rotating panel design and population characteristics (see Chapter 3). Another important characteristic of the state CPS estimator is its changing reliability over time. That is, the absolute size of the sampling error is not fixed but changes because of redesigns, sample size changes, and variation in labor force levels. Prior to 1985, some state estimates were produced from a national design. Special sample supplementations under the old national design also had an effect on the reliability of selected state samples in the late 1970s and early 1980s. A state-based design was phased in during 1984/85 along with improved procedures for noninterviews, ratio adjustments, and compositing. This redesign had a major effect on the reliability of many of the state CPS estimates. Even with a fixed design and sample size, the variance of the sampling error component changes because it is also a function of the size of the labor force characteristics of the population being measured. Since the CPS variance changes across time and the autocorrelation structure is stable over time, we express e(t) in multiplicative form as (17) e共t兲 ⫽ ␥共t兲e*共t兲. This expression allows us to capture the autocorrelated and heteroscedastic structure of e(t). The variance inflation factor, ␥共t兲, accounts for heteroscedasticity in the CPS and is defined by ␴e共t兲 ␥共t兲 ⫽ ␴ e* (18) where ␴e共t兲 is the GVF estimate of the standard error for the CPS estimate and ␴e* is the standard error of the autoregressive moving average (ARMA) process, e*共t兲. 2 In the context of time series analysis, we view the variances and covariances of the CPS estimators as properties of the sampling error component. E–4 The CPS autocorrelation structure is captured through e*共t兲 which is modeled as an ARMA process ⫺1 e*共t兲 ⫽ ␾ 共L兲␪共L兲␷e*共t兲 (19) where L is the lag operator, that is, Lk共Xt兲 ⫽ Xt⫺k so that ␾共L兲 ⫽ 1 ⫺ ␾1L⫺␾2L2 ⫺ . . . ⫺␾pLp and ␪共L兲 ⫽ 1 ⫺ ␪1L⫺␪2L2 ⫺ . . . ⫺␪qLq. (20) The parameters, ␾1, ␾2 , . . ., ␾p, and ␪1, ␪2, . . ., ␪q are estimated from the sampling error lag correlations (Dempster and Hwang, 1990) through the autocorrelation function ␳e*共l兲 ⫽ ␪共L兲␪共L⫺1兲 ␾共L兲␾共L⫺1兲 (21) where we set ␳e*共 ⫺ l兲 ⫽ ␳e*共l兲. The coefficients θ and φ are then used to compute the impulse response weights {gk} through the generating function g共L兲 ⫽ ␾⫺1共L兲␪共L兲. (22) The variance of the ARMA process e*共t兲 is computed as ⬁ 2 ␴e* = 兺g . 2 k (23) DIAGNOSTIC TESTING A model should adequately represent the main features of movements in the CPS. An analysis of the model’s prediction errors is the primary tool for assessing goodness of fit. This is an indirect test of the model. The actual model error is the difference between the true value of the signal and the model’s estimate of that value. Since we do not observe the true values, but only the CPS, which contains sampling error, we cannot compute the actual model error. The overall model, however, provides an estimate of the signal and sampling error, which sum to an estimate of the CPS. We may, therefore, use the model to predict new CPS observations. If the model’s prediction errors are larger than expected or do not average to zero, then this suggests that the signal and/or noise components may be misspecified. In this way, an examination of the model errors in predicting the CPS provides an overall test of its consistency with the CPS data. The prediction errors are computed as the difference between the current values of the CPS and the predictions of the CPS made from the model, based on data prior to the current period. Since these errors represent movements not explained by the model, they should not contain any systematic information about the behavior of the signal or noise component of the CPS. Specifically, the prediction errors, when standardized, should approximate a randomly distributed normal variate with zero mean (unbiased) and constant variance. The models are subjected to a battery of diagnostic tests to check the prediction errors for departure from these properties. k=0 We can then compute the variance inflation γ(t) factor using CPS standard error estimates which are obtained from generalized variance function (see Chapter 13) for ␴e共t兲. ESTIMATION The parameters of the noise component are derived directly from design-based variance-covariance information. The state CPS variance estimates are obtained through the method of generalized variance functions (see Chapter 13). State level autocorrelations of the sampling error are based on research conducted by Dempster and Hwang (1990) that used a variance component model to compute autocorrelations for the sampling error. After the unobserved signal and noise components are put into the state-space form, the unknown parameters of the variance components of the signal are estimated by maximum likelihood using the Kalman filter (Harvey, 1989). Given these parameter values, the filter calculates the expected value of the signal and the noise components at each point of time conditional on the observed data up to the given time point. As more data become available, previous estimates are updated by a process called smoothing (Maybeck, 1979). For more details, see Tiller (1989). MONTHLY PROCESSING State agency staff prepare their official monthly estimates using software developed by BLS that implements the KF. The state model-based labor force estimates are generally released 2 weeks after the release of the national labor force estimates. The KF algorithm is particularly well suited for the preparation of current estimates as they become available each month. Since it is a recursive data processing algorithm, it does not require all previous data to be kept in storage and reprocessed every time a new sample observation becomes available. All that is required is an estimate of the state vector and its covariance matrix for the previous month. The computer interface used by the states to make, review, and transmit their model estimates is a system of interactive programs called STARS (State Time Series Analysis and Review System). The software is interactive, querying users for their UI and CPS data and then combining these data with CPS estimates to produce model-based estimates. END-OF-THE-YEAR PROCESSING At the end of the year, a number of revisions are made to the model estimates. The model estimates are first re-estimated to incorporate new population estimates obtained E–5 from the Census Bureau and revisions to state-supplied data. The revised estimates are then smoothed and benchmarked to CPS annual averages. Finally, the benchmarked series is then seasonally adjusted using the X-11 ARIMA seasonal adjustment procedure. Re-Estimation and Benchmarking After the Census Bureau provides new population controls to BLS, the state CPS labor force estimates are adjusted to these controls. Similarly, state agencies revise their CES and UI claims data as more information about the series becomes available throughout the year. Using the revised state CPS and input data, the KF produces a revised estimate for the current year. The revised estimates are smoothed, using a fixed interval smoothing algorithm, to incorporate data accumulated during the current year. A benchmarking process follows the smoothing of the forward filter estimates. Since the CPS state samples were designed to produce reliable annual average estimates, the annual average of the smoothed estimates is forced to equal the CPS annual average. The method used during the benchmarking procedure is called the Denton method. It forces the average of the monthly smoothed estimates to equal the CPS state annual averages while minimizing distortions in the month-to-month changes of the smoothed estimates. Seasonal Adjustment The model estimates are seasonally adjusted using the X-11 ARIMA seasonal adjustment procedure. Seasonal factors for the first half of the current year, January through June, are based on the historical series. Factors for July through December are calculated from a data series comprising the historical series and the forward filter estimates for the first half of the current year. The models are designed to suppress sampling error but not to decompose the series into seasonal and nonseasonal variation. This preprocessed series can then be adequately decomposed by the X-11 filters (Tiller, 1996). RESULTS Using the basic model structure described above, employment and unemployment models were developed for all of E–6 the states. Auxiliary variables, available by state, were used in the regression components: Current Employment Statistics survey employment (CESEM), worker claims for unemployment insurance benefits (UI), and population estimated by the Census Bureau. The general form for each model is CPSEP共t兲 ⫽ ␣共t兲CESEP共t兲 ⫹ Trend共t兲 ⫹ Seasonal共t兲 ⫹ Noise共t兲 (24) CPSRT共t兲 ⫽ ␦共t兲CLRST共t兲 ⫹ Trend共t兲 ⫹ Seasonal共t兲 ⫹ Noise共t兲 where: CESEP = 100(CESEM/POP) CLRST = 100(UI/CESEM) POP = noninstitutional civilian 16+ population The basic signal component consists of a regression component with a time varying coefficient, trend level and slope, and six seasonal frequencies, irregular variation and outliers. The basic noise component consists of sampling error. Once estimated, these models were subjected to diagnostic testing. In a well-specified model, the standardized one-step-ahead prediction errors should behave approximately as white noise, that is, be uncorrelated with a zero mean and fixed variance. To be acceptable, the final model was required to show no serious departures from the white noise properties. Once satisfactory results were obtained, further decisions were based on goodness of fit measures and Akaike’s Information Criterion (Harvey,1989) and on subject matter knowledge. Often, one or more of these components could be simplified. For the signal, the trend slope could often be dropped and the number of seasonal frequencies reduced. In many cases, the estimated variance of the irregular was close to zero, allowing it to be dropped from the model. A graph of the CPS unemployment rate and the signal from a state unemployment rate model is on page E–5. The signal is considerably smoother than the CPS. Elimination of the sampling error from the CPS by signal extraction removed about 65 percent of the monthly variation in the series. REFERENCES Bell, W.R. and Hillmer, S.C. (1990), ‘‘The Time Series Approach to Estimation for Repeated Surveys,’’ Survey Methodology, 16, 195-215. Binder, D.A. and Dick, J.P. (1989),‘‘Modeling and Estimation for Repeated Surveys,’’ Survey Methodology, 15, 29-45. Blaustein, S.J. (1979), paper prepared for the National Commission on Employment and Unemployment Statistics. Dempster, A.P., and Hwang, J.S. (1990), ‘‘A Sampling Error Model of Statewide Labor Force Estimates From the CPS,’’ Paper prepared for the U.S. Bureau of Labor Statistics. Evans, T., Tiller, R., and Zimmerman, T. (1993), ‘‘Time Series Models for State Labor Force Series,’’ Proceedings of the Survey Research Methods Section, American Statistical Association. Harvey, A.C. (1989), Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge: Cambridge University Press. Maybeck, P.S. (1979), Stochastic Models, Estimation, and Control (Vol. 2), Orlando: Academic Press. Pfeffermann, D. (1992), ‘‘Estimation and Seasonal Adjustment of Population Means Using Data From Repeated Surveys,’’ Journal of Business and Economic Statistics, 9, 163-175. Scott, A.J., and Smith, T.M.F. (1974), ‘‘Analysis of Repeated Surveys Using Time Series Methods,’’ Journal of the American Statistical Association, 69, 674-678. Scott, A.J., Smith, T.M.F., and Jones, R.G. (1977), ‘‘The Application of Time Series Methods to the Analysis of Repeated Surveys,’’ International Statistical Review, 45, 13- 28. Thisted, R.A. (1988), Elements of Statistical Computing: Numerical Computation, New York: Chapman and Hall. Tiller, R. (1989), ‘‘A Kalman Filter Approach to Labor Force Estimation Using Survey Data,’’ Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 16-25. Tiller, R. (1992), ‘‘An Application of Time Series Methods to Labor Force Estimation Using CPS Data,’’ Journal Of Official Statistics, 8, 149-166. U.S. Department of Labor, Bureau of Labor Statistics (1997), BLS Handbook of Methods, Bulletin 2490. Zimmerman, T., Tiller, R., and Evans, T. (1994), ‘‘State Unemployment Rate Time Series Models,’’ Proceedings of the Survey Research Methods Section, American Statistical Association. F–1 Appendix F. Organization and Training of the Data Collection Staff INTRODUCTION The data collection staff for all Census Bureau programs is directed through 12 regional offices (ROs) and 3 telephone centers (computer assisted telephone interviewing CATI). The ROs collect data in two ways; CAPI (computer assisted personal interviewing) and PAPI (paper-and-pencil interviewing). The 12 ROs report to the Chief of the Field Division whose headquarters is located in Washington, DC. The three CATI facility managers report to the Chief of the National Processing Center (NPC). ORGANIZATION OF REGIONAL OFFICES/CATI FACILITIES The staffs of the ROs and CATI facilities carry out the Census Bureau’s field data collection programs, both sample surveys and censuses. Currently, the ROs supervise about 6,000 part-time and intermittent field representatives (FRs) who work on continuing current programs and one-time surveys. Approximately 2,100 of these FRs work on the Current Population Survey (CPS). When a census is being taken, the field staff increases greatly. The location of the ROs and the boundaries of their responsibilities are displayed in Figure F–1. RO areas were originally defined to evenly distribute the office workloads for all programs. Table F–1 shows the average number of CPS units assigned for interview per month in each RO. A regional director is in charge of each RO. Program coordinators report to the director through an assistant regional director. The CPS is the responsibility of the demographic program coordinator who has one or two CPS program supervisors on staff. The program supervisor has a staff of one or two office clerks working essentially full time. Most of the clerks are full-time civil servants who work in the RO. The typical RO employs about 100 to 250 FRs who are assigned to the CPS. Most FRs also work on other surveys. The RO usually has 15 to 18 senior field representatives (SFRs) who act as team leaders to FRs. Each team leader is assigned 6 to 10 FRs. The primary function of the team leader is to assist the program supervisors with training and supervising the field interviewing staff. In addition, the SFRs conduct response follow-up with eligible households. Like other FRs, the SFR is a part-time or intermittent employee who works out of his or her home. Despite the geographic dispersion of the sample areas, there is a considerable amount of personal contact between the supervisory staff and the FRs. This is accomplished mainly through the training programs and various aspects of the quality control program. For some of the outlying PSUs, it is necessary to use the telephone and written communication to keep in continual touch with all FRs. With the introduction of CAPI, the ROs also communicate with the FRs using e-mail. Assigning new functions, such as team leaders, also improves communications between the ROs and the interviewing staff. In addition to communications relating to the work content, there is a regular system for reporting progress and costs. The CATI centers are staffed with one facility manager who directs the work of two to three supervisory survey statisticians. Each supervisory survey statistician is in charge of about 15 supervisors and between 100-200 interviewers. A substantial portion of the budget for field activities is allocated to monitoring and improving the quality of the FRs’ work. This includes FRs group training, monthly home studies, personal observation, and reinterview. Approximately 25 percent of the CPS budget (including travel for training) was allocated to quality enhancement. The remaining 75 percent of the budget went to FR and SFR salaries, all other travel, clerical work in the ROs, recruitment, and the supervision of these activities. TRAINING FIELD REPRESENTATIVES Approximately 20 to 25 percent of the CPS FRs leave the staff each year. As a result, the recruitment and training of new FRs is a continuing task in each RO. To be selected as a CPS FR, a candidate must pass the Field Employee Selection Aid test on reading, arithmetic, and map reading. The FR is required to live in the Primary Sampling Unit (PSU) in which the work is to be performed and have a residence telephone and in most situations, an automobile. As a part-time or intermittent employee, the FR works 40 hours or less per week or month. In most cases, new FRs are paid at the GS-3 level and are eligible for payment at the GS-4 scale after 1 year of fully successful or better work. FRs are paid mileage for the use of their own cars while interviewing and for commuting to classroom training sites. They also receive pay for completing their home study training packages. FIELD REPRESENTATIVE TRAINING PROCEDURES Initial training for new field representatives. Each FR, when appointed, undergoes an initial training program prior to starting his/her assignment. The initial training program F–2 Figure F–1. Map F–3 Table F–1. Average Monthly Workload by Regional Office: 2001 Regional office Base workload CATI workload Total workload Percent Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boston . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Philadelphia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detroit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chicago . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kansas City . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Charlotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Atlanta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dallas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Denver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Los Angeles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66,576 8,914 2,887 5,915 5,006 4,430 6,566 5,147 5,310 4,853 4,094 9,681 3,773 5,640 615 361 786 286 525 450 529 366 376 320 815 212 72,216 9,529 3,248 6,701 5,292 4,955 7,016 5,676 5,677 5,230 4,414 10,495 3,984 100.00 13.20 4.50 9.28 7.33 6.86 9.71 7.86 7.86 7.24 6.11 14.53 5.52 consists of up to 20 hours of pre-classroom home study, 3.5 to 4.5 days of classroom training (dependent upon the trainee’s interview experience) conducted by the program supervisor or coordinator, and as a minimum, an on-the-job field observation by the program supervisor or SFR during the FRs first 2 days of interviewing. The classroom training includes comprehensive instruction on the completion of the survey using the laptop computer. In classroom training, special emphasis is placed on the labor force concepts to ensure that the new FRs fully grasp these concepts before conducting interviews. In addition, a large part of the classroom training is devoted to practice interviews that reinforce the correct interpretation and classification of the respondents’ answers. Each FR completes a home study exercise before the second month’s assignment and, during the second month’s interview assignment, is observed for at least 1 full day by the program supervisor or the SFR who gives supplementary training, as needed. The FR also completes a home study exercise and a final review test prior to the third month’s assignment. FIELD REPRESENTATIVE PERFORMANCE GUIDELINES Training for all field representatives. As part of each monthly assignment, FRs are required to complete a home study exercise which usually consists of questions concerning labor force concepts and survey coverage procedures. Once a year, the FRs are gathered in groups of about 12 to 15 for 1 or 2 days of refresher training. These sessions are usually conducted by program supervisors with the aid of SFRs. These group sessions cover regular CPS and supplemental survey procedures. Training for interviewers at the CATI centers. Candidates selected to be CATI interviewers receive 3 days of training on using the computer. This initial training does not cover subject matter material. While in training, new CATI interviewers are monitored for a minimum of 5 percent of their time on the computer. This is compared to 2.5 percent monitoring time for experienced staff. In addition, once a CATI interviewer has been assigned to conduct interviews on the CPS, she/he receives an additional 3 1/2 days of classroom training. Performance guidelines have been developed for CPS CAPI FRs for response/nonresponse and production. Response/nonresponse rate guidelines have been developed to ensure the quality of the data collected. Production guidelines have been developed to assist in holding costs within budget and to maintain an acceptable level of efficiency in the program. Both sets of guidelines are intended to help supervisors analyze activities of individual FRs and to assist supervisors in identifying FRs who need to improve performance. Each CPS supervisor is responsible for developing each employee to his/her fullest potential. Employee development can only be accomplished by providing meaningful feedback on a continuous basis. By acknowledging strong points and highlighting areas for improvement, the CPS supervisor can monitor an employee’s progress and take appropriate steps to improve weak areas. FR performance is measured by a combination of the following: response rates, production rates, supplement response, reinterview results, observation results, submitting accurate payrolls on time, meeting deadlines, reporting to team leaders, and attending training sessions. The most useful tool to help supervisors’ evaluate the FRs performance is the CPS 11-39 form. These reports are generated monthly and are produced using data from the ROSCO and CARMN systems. This report provides the supervisor with information on: Workload and number of interviews Response rate and adjective rating Noninterview results (counts) Production rate, adjective rating, and mileage Observation and reinterview results Meeting transmission goals Rate of personal visit telephone interviews Supplement response rate Refusal/Don’t Know counts and rates Industry and Occupation (I&O) entries Jeffersonville could not code. F–4 EVALUATING FIELD REPRESENTATIVE PERFORMANCE Census Bureau headquarters, located in Suitland, Maryland provides guidelines to the ROs for developing performance standards for FRs response and production rates. The ROs have the option of using the guidelines, modifying them, or establishing a completely different set of standards for their FRs. If the RO establishes their own standards, the RO must notify the FRs of the standards. Maintaining high response rates is of primary importance to the Census Bureau. The response rate is defined as the proportion of all sample households eligible for interview that are actually interviewed. It is calculated by dividing the total number of interviewed households by the sum of interviewed households and the number of refusals, those that are temporarily absent, noncontacts, and noninterviewed households for other reasons. (All of these noninterviews are referred to as Type A noninterviews.) Type A cases do not include vacant units, those that are used for nonresidential purposes, or other addresses that are not eligible for interview. Production guidelines. The production guidelines used in the CPS CAPI program are designed to measure the efficiency of individual FRs and the RO field functions. Efficiency is measured by total minutes per case which includes interview time and travel time. It is calculated by dividing total time reported on payroll documents by total workload. The standard acceptable minutes per case rate for FRs varies with the characteristics of the PSU. When looking at an FRs production, a program supervisor must consider extenuating circumstances, such as: • Unusual weather conditions such as floods, hurricanes, or blizzards. • Extreme distances between sample units, or assignment covers multiple PSUs. • Large number of inherited or confirmed refusals. • Working part of another FRs assignment. • Inordinate number of temporarily absent cases. • High percentage of Type B/C noninterviews that decrease the base or nonresponse rate. • Other substantial changes in normal assignment conditions. Supplement response rate. The supplement response rate is another measure that CPS program supervisors must use in measuring the performance of their FR staff. Transmittal rates. The ROSCO system allows the supervisor to monitor transmittal rates of each CPS FR. A daily receipts report is printed each day showing the progress of each case on CPS. Observation of field work. Field observation is one of the methods used by the supervisor to check and improve performance of the FR staff. It provides a uniform method for assessing the FRs attitudes toward the job, use of the computer and evaluating the extent to which FRs apply CPS concepts and procedures during actual work situations. There are three types of observations: 1. Initial observations. 2. General performance review. 3. Special needs. Initial observations are an extension of the initial classroom training for new hires and provides on-the-job training for FRs new to the survey. They also allow the survey supervisor to assess the extent to which a new CPS CAPI FR grasps the concepts covered in initial training and, therefore, are an integral part of the initial training given to all FRs. A 2-day initial observation (N1) is scheduled during the FRs first CPS CAPI assignment. A second 1-day initial observation (N2) is scheduled during the FRs second CPS CAPI assignment. A third 1-day initial observation (N3) is scheduled during the FRs fourth through sixth CPS CAPI assignment. General performance review observations are conducted at least annually and allow the supervisor to provide continuing developmental feedback to all CPS CAPI FRs. Each CPS CAPI FR is regularly observed at least once a year. Special-needs observations are made when there is evidence of a FR having problems or poor performance. The need for a special-needs observation is usually detected by other checks on the FR’s work. For example, specialneeds observations are conducted if a FR has a high Type A noninterview rate, a high minutes per case rate, a failure on reinterview, an unsatisfactory on a previous observation, a request for help, or for other reasons related to the FR’s performance. An observer accompanies the FR for a minimum of 6 hours during an actual work assignment. The observer takes note of the FR’s performance including how the interview is conducted and how the computer is used. The observer stresses good interviewing techniques: asking questions as worded and in the order presented on the CAPI screen, adhering to instructions on the instrument and in the manuals, knowing how to probe, recording answers correctly and in adequate detail, developing and maintaining good rapport with the respondent conducive to an exchange of information, avoiding questions or probes that suggest a desired answer to the respondent, and determining the most appropriate time and place for the interview. The observer reviews the FR’s household performance and discusses the FR’s strong and weak points with an emphasis on correcting habits that interfere with the collection of reliable statistics. In addition, the FR is encouraged to ask the observer to clarify survey procedures not F–5 fully understood and to seek the observer’s advice on solving other problems encountered. Unsatisfactory performance. When the performance of a FR is at the unsatisfactory level over any period (usually 90 days), he/she may be placed in a trial period for 30 to 90 days. Depending on the circumstances, the FR will be issued a letter stating that he/she is being placed in a Performance Opportunity Period (POP) or a Performance Improvement Period (PIP). These administrative actions warn the FR that his/her work is substandard, makes specific suggestions on ways to improve performance, alerts the FR to actions that will be taken by the survey supervisor to assist the FR to improve his/her performance, and notifies the FR that he/she is subject to separation if the work does not show improvement in the allotted time. G–1 Appendix G. Reinterview: Design and Methodology INTRODUCTION A continuing program of reinterviews on subsamples of Current Population Survey (CPS) households is carried out every month. Reinterview involves a second interview where all the labor force questions are repeated. The reinterview program is one of our major tools for limiting the occurrence of nonsampling error and is a critical part of the CPS program. The CPS reinterview program has been in place since 1954. Reinterviewing for CPS serves two main purposes: as a quality control (QC) tool to monitor the work of the field representatives (FRs) and to evaluate data quality via the measurement of response error (RE). Prior to the automation of CPS in January 1994, the reinterview consisted of one sample selected in two stages. The FRs were primary sampling units and the households within the FRs assignments were secondary sampling units. In 75 percent of the reinterview sample differences between original and reinterview responses were reconciled, and the results were used both to monitor the FRs and to estimate response bias; that is, the accuracy of the original survey responses. In the remaining 25 percent of the sample the differences were not reconciled and were used only to estimate simple response variance; that is, the consistency in response between the original interview and reinterview. Because the one sample approach did not provide a monthly reinterview sample that was fully representative of the original survey sample for estimating response error, the decision was made to separate the RE and QC reinterview samples beginning with the introduction of the automated system in January 1994. As a QC tool, reinterviewing is used to deter and detect falsification. As such, it provides a means of limiting nonsampling error as described in Chapter 15. The RE reinterview currently measures simple response variance or reliability. The measurement of simple response variance in the reinterview assumes an independent replication of the interview. However, this assumption does not always hold, since the respondent may remember his or her interview response and repeat it in the reinterview (conditioning). Also the fact that the reinterview is done by telephone for personal visit interviews may violate this assumption. In terms of the measurement of RE, errors or variations in response may affect the accuracy and reliability of the results of the survey. Responses from interviews and reinterviews are compared and differences identified and analyzed. (See Chapter 16 for a more detailed description of response variance.) Sample cases for QC and RE reinterviews are selected by different methods and have somewhat different field procedures. To minimize response burden, a household is only reinterviewed one time (or not at all) during its life in sample. This rule applies to both the RE and QC samples. Any household contacted for reinterview is ineligible for selection during its remaining months in sample. An analysis of respondent participation in later months of the CPS showed that the reinterview had no significant effect on the respondent’s willingness to respond (Bushery, Dewey, and Weller, 1995). Sample selection for reinterview is done immediately after the monthly assignments are certified (see Chapter 4). RESPONSE ERROR SAMPLE The regular RE sample is selected first. It is a systematic random sample across all households eligible for interview each month. It includes households assigned to both the telephone centers and the regional offices. Only households which can be reached by telephone and for which a completed or partial interview is obtained are eligible for RE reinterview. This restriction introduces a small bias into the RE reinterview results because households without ‘‘good’’ telephone numbers are made ineligible for the RE reinterview. About 1 percent of CPS households are assigned for RE reinterview each month. QUALITY CONTROL SAMPLE The QC sample is selected next. The QC sample has the FRs in the field as its first stage of selection. The QC sample does not include interviewers at the telephone centers. It is felt that the monitoring operation at the telephone centers sufficiently serves the QC purpose. The QC sample uses a 15-month cycle. The FRs are randomly assigned to 15 different groups. Both the frequency of selection and the number of households within assignments are based upon the length of tenure of the FRs. Through a falsification study it was determined that a relationship exists between tenure and both frequency of falsification and the percentage of assignment falsified (Waite, 1993 and 1997). Experienced FRs (those with at least 5 years of service) were found less likely to falsify. Also, experienced FRs who did falsify were more circumspect, falsifying fewer cases within their assignments than inexperienced FRs (those with under 5 years of service). G–2 Because inexperienced FRs are more likely to falsify, more of them are selected for reinterview each month: three groups of inexperienced FRs to two groups of experienced FRs. On the other hand, since inexperienced FRs falsify a greater percentage of cases within their assignments, fewer of their cases are needed to detect falsification. For inexperienced FRs, five households are selected for reinterview. For experienced FRs, eight households are selected. The selection system is set up so that an FR is in reinterview at least once, but no more than four times within a 15-month cycle. A sample of households assigned to the telephone centers is selected for QC reinterview, but these households become eligible for reinterview only if recycled to the field and assigned to FRs already selected for reinterview. Recycled cases are included in reinterview because recycles are more difficult to interview and may be more subject to falsification. All cases, except noninterviews for occupied households, are eligible for QC reinterview: completed and partial interviews and all other types of noninterviews (vacant, demolished, etc.). FRs are evaluated on their rate of noninterviews for occupied households. Therefore, FRs have no incentive to misclassify a case as a noninterview for an occupied household. Approximately 2 percent of CPS households are assigned for QC reinterview each month. REINTERVIEW PROCEDURES QC reinterviews are conducted out of the regional offices by telephone, if possible, but in some cases by personal visit.1 They are conducted on a flow basis extending through the week following the interview week. They are conducted mostly by senior FRs and sometimes by program supervisors. For QC reinterviews, the reinterviewers are instructed to try to reinterview the original household respondent, but are allowed to conduct the reinterview with another eligible household respondent. The QC reinterviews are computer assisted and are a brief check to verify the original interview outcome. The reinterview asks questions to determine if the FR conducted the original interview and followed interviewing procedures. The labor force questions are not asked.2 The QC reinterview instrument also has a special set of outcome codes to indicate whether or not any noninterview misclassifications occurred or any falsification is suspected. RE reinterviews are conducted only by telephone from both the telephone centers and out of the regional offices (ROs). Cases interviewed at the telephone centers are 1 Because the mail component was ineffective in detecting falsification, it was stopped as of March 1998. 2 Prior to October 2001, QC reinterviews asked the full set of labor force questions for all eligible household members. reinterviewed from the telephone centers, and cases interviewed in the field are reinterviewed in the field. At the telephone centers reinterviews are conducted by the telephone center interviewing staff on a flow basis during interview week. In the field they are conducted on a flow basis and by the same staff as the QC reinterviews. The reinterviewers are instructed to try to reinterview the original household respondent, but are allowed to conduct the reinterview with another knowledgeable adult household member.3 All RE reinterviews are computer assisted and consist of the full set of labor force questions for all eligible household members. The RE reinterview instrument dependently verifies household membership and asks the industry and occupation questions exactly as they are asked in the original interview.4 Currently, no reconciliation is conducted.5 SUMMARY Periodic reports on the QC reinterview program are issued showing the number of FRs determined to have falsified data. Since FRs are made aware of the QC reinterview program and are also informed of the results following reinterview, falsification is not a major problem in the CPS. Only about 0.5 percent of CPS FRs are found to falsify data. These FRs either resign or are terminated. From the RE reinterview, results are also issued on a periodic basis. They contain response variance results for the basic labor force categories and for certain selected other questions. REFERENCES Bushery, J. M., Dewey, J. A., and Weller, G. (1995), ‘‘Reinterview’s Effect on Survey Response,’’ Proceedings of the 1995 Annual Research Conference, U.S. Census Bureau, pp. 475-485. U.S. Census Bureau (1996), Current Population Survey Office Manual, Form CPS-256, Washington, DC: Government Printing Office, ch. 6. U.S. Census Bureau (1996), Current Population Survey SFR Manual, Form CPS-251, Washington, DC: Government Printing Office, ch. 4. U.S. Census Bureau (1993), ‘‘Falsification by Field Representatives 1982-1992,’’ memorandum from Preston Jay Waite to Paula Schneider, May 10, 1993. U.S. Census Bureau (1997), ‘‘Falsification Study Results for 1990-1997,’’ memorandum from Preston Jay Waite to Richard L. Bitzer, May 8, 1997. 3 Beginning in February 1998, the RE reinterview respondent rule was made the same as the QC reinterview respondent rule. 4 Prior to October 2001, RE reinterviews asked the industry and occupation questions independently. 5 Reconciled reinterview, which was used to estimate reponse bias and provide feedback for FR performance, was discontinued in January 1994. H–1 Appendix H. Sample Design Changes of the Current Population Survey: January 1996 (See Appendix J for sample design changes in July 2001.) INTRODUCTION As of January 1996, the 1990 Current Population Survey (CPS)1 sample changed because of a funding reduction. The budget made it necessary to reduce the national sample size from roughly 56,000 eligible housing units to 50,000 eligible housing units and from 792 sample areas to 754 sample areas. The U.S. Census Bureau and the Bureau of Labor Statistics (BLS) decided to achieve the budget reduction by eliminating the oversampling in CPS in seven states and two substate areas that made it possible to produce reliable monthly estimates of unemployment and employment in these areas. In effect, this decision produced the least possible damage to the precision of national estimates with a sample reduction of this size. It was a confirmation of the high priority attached to national statistics and data by demographic and social subgroups as compared to geographic detail. The four largest states had not required any oversampling for the production of monthly estimates, so no reductions were made in their sample sizes. The sample was reduced in seven states: Illinois, Massachusetts, Michigan, New Jersey, North Carolina, Ohio, and Pennsylvania; and two substate areas: Los Angeles and New York City. Survey Requirements The original survey requirements stated in Chapter 3 remain unchanged except for the reliability requirements. The new reliability requirements are: 1. The coefficient of variation (CV) on the monthly unemployment level for the Nation, given a 6 percent unemployment rate, is 1.9 percent. 2. The CV on the annual average unemployment level given a 6 percent unemployment rate for each of the 50 states and the District of Columbia is 8 percent or lower. These requirements effectively eliminated substate area reliability requirements and monthly CV requirements for 11 large states: California, Florida, Illinois, Massachusetts, Michigan, New Jersey, New York, North Carolina, Ohio, Pennsylvania, and Texas. The 39 other states and the District of Columbia were not 1 The CPS produced by the redesign after the 1990 census is termed the 1990 CPS. affected by this sample reduction. These states kept their original designs which meet an 8 percent CV requirement on the annual average estimates of unemployment, assuming a 6 percent unemployment rate. Strategy for Reducing the Sample The sample reduction plan was designed to meet both the national and state reliability requirements. The sample sizes in all states and the District of Columbia, other than the 11 largest, were already at the levels necessary to meet the national reliability requirements, so changes in their sizes were not required. The sample sizes in 7 of the 11 largest states and in New York City and Los Angeles were greater than necessary to meet the new requirements, and the sample sizes in these areas were reduced. The sample sizes in the four most populous states — Florida, Texas, and the parts of California and New York outside Los Angeles and New York City — were also greater than the precision requirements necessitated, but reductions in their sample sizes would have created problems in achieving the national goals. The sample sizes in these four areas used in the redesign, therefore, were retained. For the redesigned sample, an attempt was made to allocate sample proportionally to the state populations in the remaining large states and substate areas: Illinois, Massachusetts, Michigan, New Jersey, North Carolina, Ohio, Pennsylvania, Los Angeles, and New York City. Such an approach would yield the same sampling rate in all of these areas. The use of a consistent sampling rate would tend to minimize the increase in national variance resulting from the reduction. However, it was necessary to allow for a fair amount of variation in the sampling rates among these areas in order to meet the reliability requirement for annual average data for each of the 50 states (see Table H–2). This design results in variations in the precision among the individual states resulting in increased expected annual average CVs on estimates of unemployment ranging from 2.9 to 5.7 percent. The proportion of sample cut because of the reduction and the expected CVs before and after the reduction is shown in Table H–1. All 11 large states have better expected reliability than the 39 other states and the District of Columbia. H–2 Table H–1. The Proportion of Sample Cut and Expected CVs Before and After the January 1996 Reduction Expected CV (percent) State Before reduction After reduction Proportion of sample cut Monthly Annual Monthly Annual United States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.107 1.8 0.9 1.9 * 0.9 California . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Los Angeles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remainder of state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massachusetts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York City . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remainder of state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . North Carolina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.024 0.059 – – 0.130 0.478 0.320 0.348 0.195 0.348 – 0.458 0.240 0.154 0.000 6.2 9.2 7.9 8.2 7.9 7.9 7.8 8.0 6.6 9.1 8.9 8.0 7.9 8.0 7.7 2.9 4.1 3.7 3.7 3.7 3.5 3.6 3.6 3.0 4.1 4.1 3.9 3.6 3.6 3.6 6.3 9.5 7.9 8.2 8.8 11.1 9.7 9.8 7.1 11.5 8.9 11.1 9.4 8.8 7.7 2.9 4.3 3.7 3.7 4.2 4.2 4.5 4.4 3.2 5.2 4.1 5.7 4.4 4.1 3.6 * The national CV increased after the reduction but was within the rounding error. CHANGES TO THE CPS SAMPLE DESIGN Implementing the January 1996 sample reduction involved modifying the original 1990 CPS state designs described in Chapter 3. Both the first-stage and second-stage sample designs were changed to realize major cost savings. This differed from the maintenance reductions described in Appendix C which affected only the second-stage design. In addition, unlike the maintenance reductions which are implemented over a period of 16 months, this reduction was fully implemented in January 1996. Many characteristics of the original design were retained. Some of the major characteristics are: • State-based design. (and one group of three if there was an odd number of strata in the state.) The super-strata were formed using a linear programming technique to minimize the betweenstratum variation on the 1990 census levels of unemployment and civilian labor force. Then, one stratum was selected from the super-strata with probability proportional to the 1990 population. The sample PSU from this selected stratum remained in sample while the sample PSU from the nonselected stratum was dropped. The first-stage probability of selection for the PSUs in the new design is the product of the probability of selecting the stratum from the super-stratum and the probability of selecting the PSU from the stratum. Pfs ⫽ P共stratumⱍsuper-stratum兲 x P共PSU ⱍ stratum兲 • PSU definitions. • Self-representing PSU requirements. • One PSU per stratum selected to be in sample. • Systematic sample of ultimate sampling units consisting of clusters of four housing units. • Rotation group and reduction group assignments. First-Stage of the Sample Design New first-stage designs were developed by restratifying the original 1990 design strata. First, the 1990 design strata that were required to be self-representing (SR) remain SR for the new designs2. For each state, any nonrequired SR strata and all of the nonself-representing (NSR) strata were grouped into super-strata of two strata 2 See Chapter 3 ‘‘Stratification of Primary Sampling Units’’ for the areas which are required to be SR. ⫽ ⫽ Nstratum Nsuper-stratum x NPSU Nstratum (H.1) NPSU Nsuper-stratum where Pfs is the first-stage unconditional probability of selection of the PSU from the super-stratum, P(stratum|super-stratum) is the probability of selection of the stratum from the super-stratum, P(PSU|stratum) is the probability of selection of the PSU from the stratum, NPSU, Nstratum, and Nsuper-stratum are the corresponding 1990 populations. The restratifications were developed for five states: Illinois, Michigan, North Carolina, Ohio, and Pennsylvania. Two states, Massachusetts and New Jersey, and two substate areas, Los Angeles and New York City, are entirely SR and for the most part required to be SR so no H–3 PSUs were eliminated. The reduced CPS national design, as of January 1996, contains 754 stratification PSUs in sample of which 428 are SR and 326 are NSR. Table H–2 contains the number of strata by state for the nine states in which reductions were made. Table H–2. January 1996 Reduction: Number and Estimated Population of Strata for 754-PSU Design for the Nine Reduced States Self-representing (SR) State Strata Estimated population1 21 1 20 11 31 12 11 13 1 12 11 13 13 20,866,697 6,672,035 14,194,662 6,806,874 4,719,188 5,643,817 6,023,359 12,485,029 5,721,495 6,763,534 2,840,376 6,238,600 7,584,129 California . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Los Angeles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remainder of state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massachusetts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York City . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remainder of state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . North Carolina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonself-representing (NSR) Strata Estimated population1 State sampling interval 5 0 5 6 0 5 0 7 0 7 13 6 6 1,405,500 0 1,405,500 1,821,548 0 1,345,074 0 1,414,548 0 1,414,548 2,221,874 1,958,138 1,628,780 2,745 1,779 3,132 2,227 1,853 2,098 1,841 1,970 1,774 2,093 1,986 2,343 2,149 1 Projected 1995 estimates of civilian noninstitutional population ages 16 and over. Note: This results in an overall national sampling interval of 2,255 based on the 1995 population distribution. Table H–3. Reduction Groups Dropped From SR Areas State California . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Los Angeles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remainder of state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massachusetts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York City . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remainder of state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . North Carolina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of reduction groups dropped – 5 0 3 49 25 34 – 35 0 41 13 6 sampling interval required to achieve the national reliability given the new first-stage designs. The reduction groups were deleted in a specified order so that the nature of the systematic sample was retained to the extent possible. Within each state, the number of reduction groups dropped was uniform for all SR PSUs. Table H–3 gives the number of reduction groups dropped from each SR PSU by state. Calculation of Sampling Intervals The design changes do not maintain the self-weighting nature of the original designs. The stratum sampling intervals differ for the SR and NSR portions of the sample. In other words, the probability of selecting a housing unit varies by stratum. The stratum sampling interval is calculated as Second-Stage of the Sample Design New second-stage designs were developed for SR PSUs but not NSR PSUs. The NSR PSUs remaining in sample were originally designed to have a manageable workload for a single field representative. Cutting sample housing units for NSR PSUs would make workloads inefficient for these PSUs. Within SR PSUs, the within-PSU sampling intervals were adjusted and subsamples of ultimate sampling units (USU) were dropped from sample. The subsample of USUs to drop was identified by reduction groups (see Appendix C). The number of reduction groups to drop from the SR PSUs in each state was a function of the state SIstratum ⫽ SIPSU Pfs x 101 101 ⫺ RGD (H.2) where SIstratum is the stratum sampling interval, SIPSU is the original within-PSU sampling interval, Pfs is the first-stage probability of selection as defined for formula (H.1), and RGD is the integer number of reduction groups dropped (0 ≤ RGD ≤ 101). Each NSR PSU has a first-stage probability of selection calculated in formula (H.1), and the number of reduction H–4 groups dropped equals zero. For NSR strata, formula (H.2) reduces to SINSR stratum ⫽ SIPSU (H.3) Pfs SR strata have a first-stage probability of selection equal to one, and the number of reduction groups dropped is found in Table H–3. For SR strata, formula (H.2) reduces to SISR stratum ⫽ SIPSU x 101 (H.4) 101 ⫺ RGD The state sampling interval is a weighted average of the stratum sampling intervals using 1990 census stratum populations First-Stage Ratio Adjustment Changes The first-stage factors for CPS were recomputed in the five states where PSUs were dropped. The first-stage ratio adjustment factors were revised to account for the new first-stage designs in the following states: Illinois, Michigan, North Carolina, Ohio, and Pennsylvania. The factors did not change for California, Massachusetts, New Jersey, and New York. Table H–4 contains the revised CPS first-stage factors. Table H–4. State First-Stage Factors for CPS January 1996 Reduction [Beginning January 1996] H SI ⫽ 兺 Nh SIh h⫽1 H CPS State (H.5) 兺 Nh h⫽1 where SI is the state sampling interval, Nh is the 1990 census population of the hth stratum in the state, SIh is the stratum sampling interval for the hth stratum in the state, and H is the total number of SR and NSR strata in the state. Table H–2 gives the resulting state sampling intervals. Since Massachusetts and New Jersey contain only SR strata, they continue to have a self-weighting design. REVISIONS TO CPS ESTIMATION Three revisions were made to CPS estimation because of the January 1996 sample reduction, but the general estimation methodology was not changed. The revisions affected only the seven states and two substate areas affected by the reduction. The stages of weighting affected were the basic weight, adjustment for nonresponse, and the first-stage ratio adjustment (see Chapter 10). Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . North Carolina . . . . . . . . . . . . . . . . . . . . . . . . Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . . . Black Non-Black 0.96670 1.00000 1.04952 0.92339 1.17540 0.99477 ** 1.00000 1.00641 0.99953 0.99157 ** Race cells were collapsed. Monthly State and Substate Estimate Change After the January 1996 sample reduction, the 11 large states and 2 substate areas had insufficient sample to produce reliable monthly estimates of employment and unemployment. Now these estimates are produced using the same ‘‘signal-plus-noise’’ time series model used for the 39 small states and the District of Columbia discussed in Chapter 10. ESTIMATION OF VARIANCES Adjustment for Nonresponse Changes Changing the CPS design because of the January 1996 sample cut required changes to the CPS variance estimation procedures, but the variance estimation methodology did not change. These revisions affected only the seven states and two substate areas where the reduction occurred. A consequence of these changes is that consistent estimates of covariances between December 1995 (and previous months) and January 1996 (and subsequent months) cannot be produced. See Chapter 14 for a description of the CPS variance estimation methodology. Included in this appendix are estimates of variance properties of the CPS design following the January 1996 reduction. These estimates replace those in Chapter 14 as measures of the current CPS design variance properties3. In New Jersey and North Carolina, two noninterview clusters within each state were combined due to small sample sizes. For January 1996 and subsequent months, there are 125 noninterview clusters for the Nation. 3 The chapters of this document are written as of the conclusion of the implementation of the 1990 CPS design, through December, 1995. Many of the appendixes are written to provide updates that began in January, 1996. Basic Weight Changes The new basic weights for the remaining sample are the stratum sampling intervals given in formula (H.2). These stratum sampling intervals are the inverse of the probability of selecting a housing unit from the stratum. H–5 Changes to the Variance Estimation Procedure Three changes were made to the variance estimation procedure affecting replicate sample composition: new standard error computation units (SECUs) were defined (pairs of USUs), the NSR strata were combined into new pseudostrata, and associated replicate factors were calculated using new Hadamard matrix row assignments. New SECUs were defined in all seven states and two substate areas because the second-stage reduction in SR PSUs resulted in many SECUs with only one USU. NSR strata were combined into new pseudostrata only in the five states with new first-stage designs. These changes resulted in a break in consistent replicate sample definitions between December 1995 and January 1996. CPS Variance Estimates - 754 PSU Design Monthly levels. The estimated variance properties on monthly levels of select labor force characteristics from the CPS design following the January 1996 reduction are provided in Tables H–5 to H–8. These tables replace the corresponding Tables 14–2 to 14–5 in Chapter 14 as indicators of the variance properties of the current CPS design. (See previous footnote and see Chapter 14 for description of table headings.) The variance estimates in Tables H–5 to H–8 are averages over 12 months (January to December 1996), while Tables 14–2 to 14–5 are averages over 6 months, making the Appendix H tables more stable than the Chapter 14 tables. Overall, the reduction was expected to increase CVs by about 4 percent, based on the ratio of the square root of the national sampling rates before and after the reduction. The estimated CVs in Table H–5 are larger than those estimated prior to the reduction (Table 14–2) for all characteristics but Hispanic4 origin. The reliability of Hispanic origin characteristics is also adversely affected by the reduction. However, the variance on the Hispanic origin CVs is too large to measure this effect. The effect of the reduction on the contribution of the within- and between-PSU variances to the total variance 4 Hispanics may be of any race. (Table H–5) is even more difficult to assess because of the variance on the variance estimates. In general, no noticeable effect on the variance estimates is found in the contribution from the first- and second-stages of selection. Tables H–6 and H–7 show that the sample reduction did not appear to affect the variance properties related to estimation. The new estimated design effects in Table H–8 appear to have increased slightly. Month-to-month change in levels. The estimated variance properties on month-to-month change in levels of select labor force characteristics from the CPS design following the January 1996 reduction are in Tables H–9 and H–10. These variance estimates are averages of 11 monthto-month change estimates. Table H–9 indicates how the two stages of sampling affect the variance of the composited estimates of monthto-month change. The last two columns of the table show the percentage of the total variance due to sampling housing units within PSUs (within-PSU variance) and the variance due to sampling a subset of NSR PSUs (betweenPSU variance). For all characteristics in the table, the largest component of the total variance is due to withinPSU sampling. The Hispanic origin characteristics and teenage unemployed have smaller within-PSU components than most of the other characteristics. Total and White employed in agriculture also have relatively small within-PSU variance components. Negative numbers in the last column of the table imply negative estimates of the between-PSU variance. Because this component is relatively small and estimated by subtraction, some negative estimates are expected. Since variances cannot be negative, the variance on the estimated variance components is what is actually reflected in these negative estimates. The reduction in the variance of month-to-month change estimates due to the compositing is shown in Table H–10. The variance factor is the variance of the composited estimate of month-to-month change divided by the firstand second-stage combined (FSC) estimate of month-tomonth change. The factors are less than one for all but one of the characteristics shown in the table, indicating that compositing reduces the variance of most month-to-month change estimates. H–6 Table H–5. Components of Variance for FSC Monthly Estimates [Monthly average: January - December 1996] FSC1estimate (x 106) Standard error (x 105) Coefficient of variation (%) Unemployed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.31 5.34 1.61 1.14 1.32 1.48 1.26 0.71 0.62 0.59 Employed - agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.45 3.29 0.10 0.61 0.26 Employed - nonagriculture . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Civilian noninstitutional population, ages 16 and older Percent of total variance Within Between 2.02 2.36 4.40 5.45 4.51 97.4 95.2 102.8 94.1 97.1 2.6 4.8 −2.8 5.9 2.9 1.61 1.56 0.19 0.73 0.27 4.68 4.78 19.82 12.08 11.03 70.7 69.9 95.3 91.0 104.1 29.3 30.1 4.7 9.0 −4.1 123.37 104.62 13.46 11.06 6.26 3.55 3.13 1.34 1.43 0.99 0.29 0.30 1.00 1.30 1.58 90.2 91.6 100.2 92.1 103.5 9.8 8.4 −0.2 7.9 −3.5 Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.13 113.25 15.17 12.80 7.84 3.18 2.73 1.26 1.14 1.00 0.24 0.24 0.83 0.89 1.30 93.9 96.1 95.4 98.0 105.0 6.1 3.9 4.6 2.0 −5.0 Not in labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66.46 55.07 8.43 6.41 7.09 3.18 2.73 1.26 1.14 1.04 0.48 0.50 1.50 1.79 1.48 93.9 96.1 95.4 98.0 104.5 6.1 3.9 4.6 2.0 −4.5 1 FSC = first- and second-stage combined. Hispanics may be of any race. 2 H–7 Table H–6. Effects of Weighting Stages on Monthly Relative Variance Factors [Monthly averages: January - December 1996] Relative variance factor1 Relative variance of FSC estimate of level (x10-4) Unbiased estimator NI2 NI & FS3 NI & SS4 Unemployed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.109 5.594 19.392 30.261 19.959 1.03 1.05 1.16 1.15 1.13 1.03 1.04 1.16 1.14 1.13 1.10 1.10 1.20 1.21 1.16 1.00 1.00 0.99 1.00 1.00 Employed - agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.763 22.695 385.214 144.816 106.648 1.00 1.00 1.09 1.05 1.06 0.98 0.98 1.08 1.03 1.05 1.10 1.09 1.06 1.20 1.07 0.93 0.93 1.02 0.95 0.99 Employed - nonagriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.083 0.090 0.998 1.674 2.499 4.41 5.61 5.88 3.33 1.97 4.20 5.46 5.76 3.34 1.96 5.48 6.48 5.54 3.60 2.01 0.96 0.95 1.01 0.97 1.00 Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.056 0.058 0.693 0.800 1.653 6.23 8.38 7.82 6.65 2.47 5.88 8.10 7.66 6.61 2.46 8.24 10.20 7.45 7.34 2.57 0.99 0.99 1.01 1.00 1.00 Not in labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.230 0.247 2.245 3.193 2.154 2.88 3.24 3.98 2.57 2.06 2.74 3.12 3.88 2.52 2.03 3.74 3.98 3.41 2.76 2.26 0.99 0.99 1.00 1.00 1.00 Civilian noninstitutional population, ages 16 and older Unbiased estimator with— 1 Relative Variance Factor is the ratio of the relative variance of the specified level to the relative variance of FSC level. NI = Noninterview FS = First Stage 4 SS = Second Stage 5 Hispanics may be of any race. 2 3 Table H–7. Effect of Compositing on Monthly Variance and Relative Variance Factors [Monthly averages: January - December 1996] Civilian noninstitutional population, ages 16 and older Unemployed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Employed - agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Employed - nonagriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Not in labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Variance of composited estimate of level (x 109) Variance factor1 Relative variance factor2 21.063 15.487 4.712 3.887 3.429 23.478 22.151 0.325 4.764 0.645 109.120 85.476 15.741 18.009 8.787 86.616 64.105 14.260 11.474 9.163 86.616 64.105 14.260 11.474 9.696 0.96 0.97 0.93 0.99 0.99 0.91 0.90 0.85 0.89 0.88 0.86 0.87 0.87 0.88 0.90 0.85 0.86 0.89 0.87 0.90 0.85 0.86 0.89 0.87 0.89 0.98 0.99 0.96 1.00 1.01 0.91 0.91 0.87 0.89 0.89 0.86 0.87 0.87 0.88 0.90 0.86 0.86 0.90 0.88 0.91 0.85 0.85 0.89 0.87 0.89 Variance Factor is the ratio of the variance of the composited level to the variance of the FSC level. Relative Variance Factor is the ratio of the relative variance of the composited level to the relative variance of the FSC level. Hispanics may be of any race. 2 3 Note: Composite formula constants used: K = 0.4 and A = 0.2 H–8 Table H–8. Design Effects for Total Monthly Variance [Monthly averages: January - December 1996] Design effect Civilian noninstitutional population, ages 16 and older Total variance Within-PSU variance Not composited Composited Not composited 1.382 1.362 1.398 1.539 1.173 3.390 3.362 1.702 3.916 1.243 1.181 0.870 0.639 0.869 0.716 1.014 0.674 0.504 0.485 0.598 1.014 0.832 0.876 0.938 0.702 1.339 1.331 1.323 1.531 1.171 3.076 3.048 1.465 3.482 1.097 1.018 0.757 0.557 0.766 0.645 0.863 0.576 0.452 0.425 0.542 0.863 0.710 0.780 0.816 0.625 1.346 1.296 1.437 1.448 1.139 2.396 2.350 1.622 3.564 1.294 1.065 0.797 0.640 0.800 0.742 0.952 0.648 0.481 0.476 0.628 0.952 0.799 0.835 0.919 0.734 Unemployed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Employed - agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Employed - nonagriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Not in labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Hispanics may be of any race. Table H–9. Components of Variance for Composited Month-to-Month Change Estimates [Change averages: January - December 1996] Civilian noninstitutional population, ages 16 and older Unemployed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Employed - agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Employed - nonagriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Not in labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Estimate of change1 (x106) Standard error of change2 (x105) 0.29 0.23 0.08 0.06 0.11 0.14 0.13 0.01 0.04 0.05 0.60 0.47 0.12 0.10 0.35 0.77 0.60 0.17 0.12 0.46 0.76 0.59 0.18 0.10 0.45 1.59 1.30 0.80 0.67 0.73 0.82 0.79 0.16 0.42 0.24 2.40 2.11 1.00 0.97 0.91 2.30 2.00 0.98 0.89 0.99 2.30 2.00 0.98 0.89 1.00 The arithmetic mean of the absolute values of the 11 month-to-month changes. The square root of the arithmetic mean of the variances of the 11 month-to-month changes. 3 The percent of the estimated total variance attributed to the two stages of sampling. 4 Hispanics may be of any race. 2 Percent of total variance3 Within Between 98.0 102.7 100.3 96.9 93.7 92.6 92.6 103.7 88.8 101.5 102.5 105.3 100.7 94.8 100.6 106.0 106.0 98.7 97.5 101.9 106.0 106.0 98.7 97.5 100.9 2.0 −2.7 −0.3 3.1 6.3 7.4 7.4 −3.7 11.2 −1.5 −2.5 −5.3 −0.7 5.2 −0.6 −6.0 −6.0 1.3 2.5 −1.9 −6.0 −6.0 1.3 2.5 −0.9 H–9 Table H–10. Effect of Compositing on Month-to-Month Change Variance Factors [Change averages: January - December 1996] Variance of composited estimate of change1 (x 109) Variance factor2 Unemployed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.124 16.800 6.409 4.484 5.322 0.90 0.88 0.94 0.90 0.96 Employed - agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.717 6.244 0.249 1.796 0.556 0.77 0.77 0.80 0.67 0.82 Employed - nonagriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57.684 44.718 10.007 9.397 8.228 0.70 0.68 0.76 0.70 1.02 Civilian labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.997 39.834 9.582 7.919 9.782 0.70 0.68 0.87 0.70 0.98 Not in labor force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hispanic origin3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Teenage, 16-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.997 39.834 9.582 7.919 9.907 0.70 0.68 0.87 0.70 0.94 Civilian noninstitutional population, ages 16 and older 1 The arithmetic mean of the 11 variance estimates. Variance Factor is the ratio of the variance of the composited estimate to the variance of the FSC estimate. 3 Hispanics may be of any race. 2 I–1 Appendix I. New Compositing Procedure INTRODUCTION COMPOSITE ESTIMATION IN THE CPS Effective with the release of January 1998 data, BLS implemented a new composite estimation method for the Current Population Survey. The new technique provides increased operational simplicity for microdata users and allows optimization of compositing coefficients for different labor force categories. As described in Chapter 10, eight panels or rotation groups, approximately equal in size, make up each monthly CPS sample. Due to the 4-8-4 rotation pattern, six of these panels (three-quarters of the sample) continue in sample the following month and one-half of the households in a given month’s sample will be back in the sample for the same calendar month 1 year later. The sample overlap improves estimates of change over time. Through composite estimation, the positive correlation among CPS estimators for different months is increased. This increase in correlation improves the accuracy of monthly labor force estimates. The CPS AK composite estimator for a labor force total (e.g., the number of persons unemployed) in month t is given by Y´ t ⫽ 共1⫺K兲Ŷt ⫹ K共Y´ t⫺1⫹䉭t兲 ⫹ A␤ˆ t In the CPS estimation procedure, described in Chapter 10, a separate weight for each person in the CPS sample is computed and ratio adjusted through a sequence of weighting steps. These adjustments are followed by a composite estimation step that improves the accuracy of current estimates by incorporating information gathered in previous months, taking advantage of the fact that 75 percent of sample households are common in each pair of consecutive months. Under the old procedure, composite estimation was performed at the macro level. The composite estimator for each tabulated cell was a function of aggregated weights for sample persons contributing to that cell in current and prior months. The different months of data were combined together using compositing coefficients. Thus, microdata users needed several months of CPS data to compute composite estimates. To ensure consistency, the same coefficients had to be used for all estimates. The values of the coefficients selected were much closer to optimal for unemployment than for employment or labor force totals. The new composite weighting method involves two steps: (1) the computation of composite estimates for the main labor force categories, classified by important demographic characteristics and (2) the adjustment of the microdata weights, through a series of ratio adjustments, to agree with these composite estimates, thus incorporating the effect of composite estimation into the microdata weights. Under this procedure, the sum of the composite weights of all sample persons in a particular labor force category equals the composite estimate of the level for that category. To produce a composite estimate for a particular month, a data user may simply access the microdata file for that month and compute a weighted sum. The new composite weighting approach also improves the accuracy of labor force estimates by using different compositing coefficients for different labor force categories. The weighting adjustment method assures additivity while allowing this variation in compositing coefficients. where: 8 Ŷt ⫽ 䉭t ⫽ 4 3 兺 xt,i i⫽1 共xt,i ⫺ xt⫺1,i⫺1兲 and 兺 i⑀s ␤ˆ t⫽ 1 xt,i 兺 xt,i⫺3 兺 i⑀s i僆s i = 1,2,...,8 month in sample xt,i = sum of weights after second-stage ratio adjustment of respondents in month t, and month-in-sample i with characteristic of interest S = {2,3,4,6,7,8} sample continuing from previous month K = 0.4 for unemployed 0.7 for employed A = 0.3 for unemployed 0.4 for employed The values given above for the constant coefficients A and K are close to optimal–with respect to variance–for month-to-month change estimates of unemployment level and employment level. The coefficient K determines the weight, in the weighted average, of each of two estimators I– 2 for the current month: (1) the current month’s ratio estimator Ŷt and (2) the sum of the previous month’s composite estimator Y´ t⫺1 and an estimator 䉭t of the change since the previous month. The estimate of change is based on data from sample households in the six panels common to months t and t-1. The coefficient A determines the weight of ␤ˆ t, an adjustment term that reduces both the variance of the composite estimator and the bias associated with time in sample. (See Breau and Ernst 1983, Bailar 1975.) Before January 1998, a single pair of values for K and A was used to produce all CPS composite estimates. Optimal values of the coefficients, however, depend on the correlation structure of the characteristic to be estimated. Research has shown, for example, higher values of K and A result in more reliable estimates for employment levels because the ratio estimators for employment are more strongly correlated across time than those for unemployment. The new composite weighting approach allows use of different compositing coefficients, thus improving the accuracy of labor force estimates, while ensuring the additivity of estimates. For a more detailed description of the selection of compositing parameters, see Lent et al. (1997). direct composite estimates of UE are computed for 9 National age/sex/ethnicity cells (8 Hispanic1 age/sex cells and 1 non-Hispanic cell) and 66 National age/sex/race cells (38 White age/sex cells, 24 Black age/sex cells, and 4 other age/sex cells). These computations use cell definitions specified in the second-stage ratio adjustment (Tables 10– 2 and 10– 3), excluding cells containing only persons aged 15 or under. Coefficients K=0.4 and A=0.3 are used for all UE estimates in all categories (state, age/sex/ethnicity, and age/sex/race). 2. Sample records are classified by state. Within each state j, a simple estimate of UE, simp(UEj), is computed by adding the weights of all unemployed sample persons in the state. 3. Within each state j, the weight of each unemployed sample person in the state is multiplied by the following ratio: comp(UEj) / simp(UEj). 4. Sample records are cross-classified by age, sex, and ethnicity. Within each cross-classification cell, a simple estimate of UE is computed by adding the weights (as adjusted in step 3) of all unemployed sample persons in the cell. COMPUTING COMPOSITE WEIGHTS 5. Weights are adjusted within each age/sex/ethnicity cell in a manner analogous to step 3. Composite weights are produced only for sample persons age 16 or older. As described in Chapter 10, the CPS estimation process begins with the computation of a ‘‘baseweight’’ for each adult in the survey. The baseweight– the inverse of the probability of selection– is adjusted for nonresponse, and two successive stages of ratio adjustments to population controls are applied. The second-stage raking procedure, performed independently for each of the eight sample rotation groups, ensures that sample weights add to independent population controls for states, as well as for age/sex/ethnicity groups and age/sex/race groups, specified at the national level. The post-January 1998 method of computing composite weights for the CPS imitates the second-stage ratio adjustment. Sample person weights are raked to force their sums to equal control totals. Composite labor force estimates are used as controls in place of independent population estimates. The composite raking process is performed separately within each of the three major labor force categories: employed, unemployed, and those not in the labor force. Adjustment of microdata weights to the composite estimates for each labor force category proceeds as follows. For simplicity, we describe the method for estimating the number of people unemployed (UE); analogous procedures are used to estimate the number of people employed and the number not in the labor force. Data from all eight rotation groups are combined for the purpose of computing composite weights. 6. Steps 4 and 5 are repeated for age/sex/race cells. 1. For each state and the District of Columbia (51 cells) j, the direct (optimal) composite estimate of UE, comp(UEj), is computed as described above. Similarly, 7. Steps 2-6 are repeated five more times for a total of six iterations. An analogous procedure is done for estimating the number of people employed using coefficients K=0.7 and A=0.4. For not in labor force (NILF) the same raking steps are performed, but the controls are obtained as the residuals from the population controls and the direct composite estimates for employed (E) and unemployed (UE). The formula is NILF = Population – (E + UE). During computation of composite weights for persons who are unemployed, some further collapsing of cells is needed where cells contain insufficient sample. Any such collapsing of cells for UE leads to a similar collapsing for NILF. A NOTE ABOUT FINAL WEIGHTS Prior to the introduction of composite weights the weight after the second-stage ratio adjustment was called the final weight, as stated in Chapter 10. This is no longer true since the weight after compositing is now the final weight. In addition, since data from all eight rotation groups are combined for the purpose of computing composite weights, summations of final weights within each panel do not match independent population controls. However, summations of final weights for the entire sample still match these independent population controls. 1 Hispanics may be of any race. I– 3 REFERENCES Breau, P., and L. Ernst, (1983). ‘‘Alternative Estimators to the Current Composite Estimator.’’ Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 397-402. Bailar, B., (1975). ‘‘The Effects of Rotation Group Bias on Estimates From Panel Surveys.’’ Journal of the American Statistical Association, 70, pp. 23-30. Lent, J., S. Miller, P. Cantwell, and M. Duff, (1999). ‘‘Effect of Composite Weights on Some Estimates From the Current Population Survey.’’ Journal of Official Statistics, to appear. J-1 Appendix J. Changes to the Current Population Survey Sample in July 2001 INTRODUCTION In 1999, Congress allocated $10 million annually to the Census Bureau to ‘‘make appropriate adjustments to the annual Current Population Survey . . . in order to produce statistically reliable annual state data on the number of low-income children who do not have health insurance coverage, so that real changes in the uninsured rates of children can reasonably be detected.’’ (Public Law 106113) The health insurance coverage estimates are derived from the Annual Demographic Supplement (ADS) which is discussed in Chapter 11. Briefly, the ADS is a series of questions asked each March in addition to the regular CPS questions. Consequently, the ADS is also known as the ‘‘March Supplement.’’ To achieve the objectives of the new law, the Census Bureau added sample to the ADS in three different ways: A new sample designation code beginning with ‘‘B’’ was assigned to cases included in the general sample increase to distinguish SCHIP cases from regular CPS sample cases, which use an ‘‘A.’’ The rotation group code was not reassigned in order to maintain the structure of the sample. In states targeted for the general sample increase, sample was added in all existing CPS sample areas. The general sample increase differs from the other parts of the SCHIP expansion in two ways: • It affects CPS every month of the year; and • It is implemented differently for different states. More sample was added in same states, while other states received no general sample increase. For the other two parts of the SCHIP expansion, each state’s sample size was increased proportional to the amount of CPS sample already there. • General sample increase in selected states • ‘‘Split-path’’ assignments • Month-in-sample (MIS) 9 assignments These changes are collectively known as the State Children’s Health Insurance Program (SCHIP) sample expansion. The procedures used to implement the SCHIP sample expansion were chosen in order to minimize the effect on the basic CPS. The general sample increase will be explained first. Later sections of this appendix provide details on the split-path and MIS 9 assignments. GENERAL SAMPLE INCREASE IN SELECTED STATES The first part of the SCHIP plan expanded basic monthly CPS sample in selected states, using retired1 sample from CPS. Sample was identified using CPS sample designation, rotation group codes and all four frames. Expanding the monthly CPS was necessary, rather than simply interviewing many more cases in March, because of the difficulty in managing a large spike in the sample size for a single month in terms of data quality and staffing. 1 ‘‘Retired sample’’ consists of households that finished their series of eight attempted CPS interviews. Allocating the Sample to States (Round 1) The estimates to be improved by the SCHIP sample expansion are 3-year averages of the percentage of lowincome children without health insurance. Low-income children are defined, for SCHIP purposes, as those in households with incomes at or below 200 percent of the poverty level. To determine the states where added sample would be of the most benefit, the detectable difference of two consecutive 3-year averages was calculated (example: comparing the 1998-2000 3-year average to the 19992001 3-year average). The detectable difference is the smallest percentage difference between the two estimates that is statistically significant at the 90-percent confidence level. To reassign only portions of retired CPS sample, the natural units to use were reduction groups. The assignment of reduction group codes is explained in Appendix C. Using reduction group codes, the sample was divided into eight approximately equal parts called expansion groups. This was done to simplify the iteration process performed to reassign sample. Each expansion group contains 12 to 13 reduction groups. In most states, only some of the eight expansion groups were reassigned sample designations. Reassigning all eight expansion groups is roughly equivalent to doubling the sample size, meaning that all expired sample from the old CPS sample designations and rotation groups is brought back to be interviewed again. J-2 The uninsured estimates used for allocating the sample can be found at http://www.census.gov/hhes/hlthins/liuc98.html. To find the current detectable difference for each state, a variation of a formula found in the Source and Accuracy statement (http://www.bls.census.gov/cps/ads/1998/ssrcacc.htm) for the March Supplement was used: se x , p ⫽ 冑 bf 2 x p 共100 ⫺ p兲 (1) where: sex,p = standard error of the proportion x = the total number of children meeting the low-income criteria p = the percentage of low-income children without health insurance b = a general variance function (GVF) parameter = state factor to adjust for differences in the f2 CPS sample design between states The same proportion (p) was used for all of the states (the 3-year national average from 1996 to 1998, 24.53 percent); this is consistent with CPS procedures to not favor one state over another, based upon rates that may change. Because the estimates are 3-year averages, some manipulation was necessary to find the standard error of the difference from one year to the next. To do so, the following formula was used: se共p234 ⫺ p123兲 ⫽ 1 3 公se 共p4兲2 ⫹ se 共p1兲2 (2) where p123 is the average estimate for years 1, 2, and 3, and p123 is the average for years 2, 3, and 4. Since the single year standard error estimates were not available, the standard errors calculated using formula (1) were used for both standard errors beneath the radical. The detectable difference was calculated by finding the coefficient of variation (by dividing by 24.53 percent), then multiplying by 1.645 to create a 90-percent confidence interval. The resulting detectable difference is proportional to the standard errors calculated. Of the total annual funds allocated for this project, $7 million was budgeted for the general sample increase. This translated into approximately 12,000 additional assigned housing units each month. The 12,000 additional housing units were allocated among states by applying the formula below over several iterations, with the goal of lowering each state’s detectable difference on the health insurance coverage rate for poor children: SS ’ ⫽ SS * 冉冊 DD 2 DD ’ where: SS’ = new state sample size SS = old state sample size DD = old detectable difference for the state (3) DD’ = new detectable difference for the state (The number being decreased with each iteration.) An increase of 12,000 housing units was reached when the state detectable differences were lowered to a maximum of 10.01 percent. There were some states that did not achieve this threshold, even after the sample size was doubled. For example, Connecticut started with a detectable difference of 14.51 percent. When lowering the desired detectable difference by iteration, the necessary sample size there doubled when the detectable difference reached 10.25 percent. No more sample was added there because of the decision to no more than double the existing sample in any state. The iterative process continued to lower the detectable difference (and raise the sample size) for the remainder of the states, however. Alabama started with a detectable difference of 10.81 percent. After adding one expansion group, the detectable difference was 10.19 percent. After adding two, the detectable difference was 9.67 percent. Since the limit was 10.01 percent, only two expansion groups were added for Alabama. Arizona started with a detectable difference of 8.34 percent. Since the limit was 10.01 percent, no sample was added there. Round 2 of State Allocation The proposed allocation of sample to the states was then sent to the Census Bureau’s Field Division for approval. Based upon their comments, the proposed sample increases were scaled back in Maryland, Virginia, Delaware, and the District of Columbia. After setting limits on the amount of sample to be added to these areas, the iteration process was continued to get back to roughly 12,000 housing units added to sample. All states where four or more expansion groups had already been added were also removed from consideration. These states were prohibited from getting further sample increases because of the proportionally large increases they had already received. After these final iterations, an extra expansion group was assigned in Iowa, North Dakota, Nevada, Oklahoma, Oregon, Utah, West Virginia, and Wyoming. These were states where three or fewer expansion groups had been added in Round 1. Table J-1 shows the final allocation of the general sample increase to states. Phase-in of the General Sample Increase Successful implementation of the other two parts of the SCHIP expansion required that the general sample increase be fully implemented by November 2000. To provide enough time for listing and keying the new sample assignments, SCHIP interviews could not begin until September 2000 (for more details on these processes, see Chapter 4). It was advantageous to begin adding sample as soon as possible to make the additions as gradual as possible for the field staff. J-3 Table J-1. Size of SCHIP General Sample Increase and Effect on Detectable Difference by State After first iteration After final iteration Original detectable difference (percent) Expansion groups added Detectable difference (percent) Expansion groups added Detectable difference (percent) Alabama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alaska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arizona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arkansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . California . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Colorado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connecticut. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Delaware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . District of Columbia. . . . . . . . . . . . . . . . . . . . . . . . Florida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.81 11.54 8.34 8.87 4.00 12.68 14.51 12.59 11.31 5.93 2 3 0 0 0 5 8 5 3 0 9.67 9.84 8.34 8.87 4.00 9.95 10.26 9.88 9.65 5.93 2 3 0 0 0 5 8 3 2 0 9.67 9.84 8.34 8.87 4.00 9.95 10.26 10.74 10.12 5.93 Georgia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hawaii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Idaho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illinois. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indiana. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iowa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kansas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kentucky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Louisiana. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.88 12.11 9.30 6.57 12.76 11.38 11.75 10.79 9.25 13.64 0 4 0 0 5 3 4 2 0 7 8.88 9.89 9.30 6.57 10.01 9.71 9.60 9.65 9.25 9.96 0 4 0 0 5 4 4 2 0 7 8.88 9.89 9.30 6.57 10.01 9.29 9.60 9.65 9.25 9.96 Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massachusetts . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minnesota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mississippi. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Montana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nebraska. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nevada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Hampshire . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.65 9.40 7.23 11.82 8.92 11.84 9.23 11.31 11.71 14.75 8 0 0 4 0 4 0 3 3 8 10.36 9.40 7.23 9.65 8.92 9.67 9.23 9.64 9.98 10.43 5 0 0 4 0 4 0 3 4 8 11.50 9.40 7.23 9.65 8.92 9.67 9.23 9.64 9.56 10.43 New Jersey. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Mexico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . North Carolina . . . . . . . . . . . . . . . . . . . . . . . . . . . . North Dakota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oklahoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oregon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rhode Island . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.53 7.96 4.71 8.17 10.93 7.01 9.97 11.35 6.93 14.93 0 0 0 0 2 0 0 3 0 8 8.53 7.96 4.71 8.17 9.78 7.01 9.97 9.68 6.93 10.56 0 0 0 0 3 0 1 4 0 8 8.53 7.96 4.71 8.17 9.32 7.01 9.40 9.27 6.93 10.56 South Carolina . . . . . . . . . . . . . . . . . . . . . . . . . . . South Dakota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tennessee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Utah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vermont . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virginia. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Washington . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . West Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wisconsin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wyoming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.19 11.28 10.01 4.78 9.78 13.40 12.02 12.28 10.62 12.59 10.44 3 3 1 0 0 7 4 5 1 5 1 9.55 9.62 9.44 4.78 9.78 9.79 9.81 9.63 10.01 9.88 9.84 3 3 1 0 1 7 2 5 2 5 2 9.55 9.62 9.44 4.78 9.22 9.79 10.75 9.63 9.50 9.88 9.33 State J-4 Table J-2. Phase-in Table (First 11 Months of Sample Expansion) Old sample designation New sample designation Rotation group (both old and new) Last month in sample previously MIS upon return to sample Sep. 2000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A63 A65 B71 B73 6 2 Aug. 1995 Aug. 1996 5 1 Oct. 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A63 A65 A65 B71 B73 B73 7 1 3 Sep. 1995 July 1996 Sep. 1996 5 3 1 Nov. 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A63 A63 A65 B71 B71 B73 5 8 4 July 1995 Oct. 1995 Oct. 1996 8 5 1 Dec. 2000. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A64 A65 B72 B73 1 5 Nov. 1995 Nov. 1996 5 1 Jan. 2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A64 A65 B72 B73 2 6 Dec. 1995 Dec. 1996 5 1 Feb. 2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A64 A65 B72 B73 3 7 Jan. 1996 Jan. 1997 5 1 Mar. 2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A64 A65 B72 B73 4 8 Feb. 1996 Feb. 1997 5 1 Apr. 2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A64 A66 B72 B74 5 1 Mar. 1996 Mar. 1997 5 1 May 2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A64 A66 B72 B74 6 2 Apr. 1996 Apr. 1997 5 1 Jun. 2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A64 A66 B72 B74 7 3 May 1996 May 1997 5 1 Jul. 2001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A64 A66 B72 B74 8 4 June 1996 June 1997 5 1 Aug. 2001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A66 B74 5 July 1997 1 New first month in sample Table J-2 shows the phase-in of the expanded sample. The sample designation/rotation groups in italics did not enter sample MIS 1 or 5.2 B73(1) entered at MIS 3, and B71(5) entered at MIS 8. Once introduced, these cases followed the regular CPS rotation pattern. By November 2000, the phase-in of the general sample increase was completed. However, because of the 4-8-4 rotating design of the CPS sample, it was necessary to continue adding new SCHIP cases to MIS 1 and 5 assignments for the next 8 months. In other words, for the next 8 months, SCHIP sample was being interviewed for the first time in both MIS 1 and MIS 5, rather than just MIS 1. New SCHIP cases were added only to MIS 1 assignments after July 2001. A rotation chart for SCHIP follows (Figure J-1), for those who are more familiar with this method of visualizing survey sample. An explanation of rotation charts is found in Chapter 3. Use of General Sample Increase in Official Labor Force Estimates Although the original purpose of the SCHIP general sample increase focused on improving the reliability of 2 To maintain the known properties of the sample rotation pattern, it is desirable for CPS sample to begin interviewing in MIS 1. If sample cannot be ready (for example, permit sample that does not finish processing) by MIS 1, then it is deferred to MIS 5, thus getting four interviews in consecutive months. However, in this case there was no choice but to add sample at other points in the CPS interview cycle, because of the short time frame available for phase-in. estimates of low-income uninsured children, the Bureau of Labor Statistics (BLS) saw the increase in the monthly CPS sample size as an opportunity to improve the reliability of labor force estimates, particularly at the state level. However, the added SCHIP sample initially was not used in the official CPS labor force estimates, for a couple of reasons: • To give Field Division the time to train the new field representatives, and • To analyze the labor force estimates both with and without the new sample, ensuring that no bias was introduced into the official time series. In order to gauge the effect of the additional sample on estimates, CPS data were weighted both with and without the expanded sample beginning in November 2000. The analysis mentioned compared the estimates as calculated using both sets of weights. CPS weights for all SCHIP sample are 0, while combined CPS/SCHIP weights are the same for SCHIP sample and CPS sample cases within each Primary Sampling Unit (PSU). Weighting the CPS sample with the SCHIP sample included was done using this formula: BWCPS ⲐSCHIP ⫽ BWCPS * ( RGCPS ) RGCPS ⫹ RGSCHIP (4) J-5 Figure J-1. SCHIP Rotation Chart: September 2000 to December 2002 J-6 where: BWCPS/SCHIP = baseweight of combined CPS and SCHIP sample BWCPS RGSCHIP RGCPS = baseweight of CPS sample alone = reduction groups added in SCHIP expansion (as part of appropriate expansion groups) = reduction groups currently in CPS sample (usually 101) The updated baseweights (or sampling intervals) can be found in Table J-4, along with updated coefficients of variation. For more details on weighting, see Chapter 10. After analyzing several months of data, with and without the added SCHIP sample, BLS decided to include the SCHIP data in the official monthly CPS estimates, beginning with the July 2001 data release. The national unemployment rates were virtually identical when estimated from the sample with and without the additional SCHIP cases. The SCHIP expansion did not result in systematically higher or lower labor force estimates across states, although there were a few statistically significant differences in some months in a few states. Results for a typical month are shown in Table J-3. The additional sample will slightly improve the reliability of national labor force estimates, and will more substantially improve the state estimates for states in which SCHIP sample was added (see U.S. Department of Labor, Employment & Earnings, August 2001, for more details). Table J-4 shows the effect of the general sample increase on state sampling intervals and expected coefficients of variation (CVs) for the annual average estimated levels of unemployed assuming a 6 percent unemployment rate. The CVs will be improved (made smaller) by the additional sample in some states. The first row of the table shows the effect on the monthly national CV on unemployment level, which is expected to drop from about 1.9 percent to 1.8 percent. For purposes of comparison, the state sampling intervals and CVs are included, as estimated following a sample maintenance reduction which began in December 1999. (Details on how maintenance reductions are accomplished are found in Appendix C.) All type A rates, design effects, estimated state populations, and civilian labor force levels used to estimate the new CVs are the same as those used for the maintenance reduction, so the differences in CVs shown are resulting only from the changes in the sampling intervals due to the SCHIP general sample increase. Summary of CPS/SCHIP Sample Design and Reliability Requirements The current sample design, introduced in July 2001, includes about 72,000 assigned housing units from 754 sample areas (see Table J-5). Sufficient sample is allocated to maintain, at most, a 1.9 percent CV on national monthly estimates of unemployment level, assuming a 6-percent unemployment rate. This translates into a change of 0.2 percentage point in the unemployment rate being significant at a 90-percent confidence level. For each of the 50 states and for the District of Columbia, the design maintains a CV of at most 8 percent on the annual average estimate of unemployment level, assuming a 6-percent unemployment rate. About 60,000 assigned housing units are required in order to meet the national and state reliability criteria. Due to the national reliability criterion, estimates for several large states are substantially more reliable than the state design criterion requires. Annual average unemployment estimates for California, Florida, New York, and Texas, for example, carry a CV of less than 4 percent. Table J-3. Effect of SCHIP on National Labor Force Estimates: June 2001 CPS only CPS/SCHIP combined Difference Labor force participation rate . . . . . . . . . . . . . . . . . . . . . . . . . . Total, 16 years and over 16 to 19 years Men, 20 years and over Women, 20 years and over White Black Hispanic origin 67.4 58.1 76.5 60.5 67.6 66.2 67.9 67.4 58.5 76.6 60.4 67.7 66.1 67.8 0.0 –0.4 –0.1 0.1 –0.1 0.1 0.1 Employment-population ratio . . . . . . . . . . . . . . . . . . . . . . . . . . Total, 16 years and over 16 to 19 years Men, 20 years and over Women, 20 years and over White Black Hispanic origin 64.2 48.5 73.6 58.0 64.8 60.4 63.4 64.2 48.6 73.7 58.0 64.9 60.5 63.4 0.0 –0.1 –0.1 0.0 –0.1 –0.1 0.0 Unemployment rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Total, 16 years and over 16 to 19 years Men, 20 years and over Women, 20 years and over White Black Hispanic origin 4.7 16.6 3.8 4.0 4.1 8.7 6.6 4.7 16.8 3.8 4.0 4.1 8.6 6.6 0.0 –0.2 0 0 0 0.1 0 Estimate Population segment J-7 Table J-4. Effect of SCHIP on State Sampling Intervals and Coefficients of Variation Coefficients of variation (CVs) of annual average level of unemployed (percent) Sampling intervals State Total . . . . . . . . . . . . . . . . . . . . . . . . . . . Alabama . . . . . . . . . . . . . . . . . . . . . . . . . . . Alaska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arizona . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arkansas . . . . . . . . . . . . . . . . . . . . . . . . . . . California . . . . . . . . . . . . . . . . . . . . . . . . . . . Los Angeles . . . . . . . . . . . . . . . . . . . . . . Balance . . . . . . . . . . . . . . . . . . . . . . . . . . Colorado . . . . . . . . . . . . . . . . . . . . . . . . . . . Connecticut. . . . . . . . . . . . . . . . . . . . . . . . . Delaware . . . . . . . . . . . . . . . . . . . . . . . . . . . District of Columbia. . . . . . . . . . . . . . . . . . Florida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Georgia . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hawaii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Idaho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illinois. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indiana. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iowa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kansas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kentucky . . . . . . . . . . . . . . . . . . . . . . . . . . . Louisiana. . . . . . . . . . . . . . . . . . . . . . . . . . . Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . . Massachusetts . . . . . . . . . . . . . . . . . . . . . . Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . Minnesota . . . . . . . . . . . . . . . . . . . . . . . . . . Mississippi. . . . . . . . . . . . . . . . . . . . . . . . . . Missouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . Montana . . . . . . . . . . . . . . . . . . . . . . . . . . . Nebraska. . . . . . . . . . . . . . . . . . . . . . . . . . . Nevada . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Hampshire . . . . . . . . . . . . . . . . . . . . . New Jersey. . . . . . . . . . . . . . . . . . . . . . . . . New Mexico . . . . . . . . . . . . . . . . . . . . . . . . New York. . . . . . . . . . . . . . . . . . . . . . . . . . . New York City. . . . . . . . . . . . . . . . . . . . . Balance . . . . . . . . . . . . . . . . . . . . . . . . . . North Carolina . . . . . . . . . . . . . . . . . . . . . . North Dakota . . . . . . . . . . . . . . . . . . . . . . . Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oklahoma . . . . . . . . . . . . . . . . . . . . . . . . . . Oregon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . Rhode Island . . . . . . . . . . . . . . . . . . . . . . . South Carolina . . . . . . . . . . . . . . . . . . . . . . South Dakota . . . . . . . . . . . . . . . . . . . . . . . Tennessee. . . . . . . . . . . . . . . . . . . . . . . . . . Texas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Utah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vermont . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virginia. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Washington . . . . . . . . . . . . . . . . . . . . . . . . . West Virginia . . . . . . . . . . . . . . . . . . . . . . . Wisconsin . . . . . . . . . . . . . . . . . . . . . . . . . . Wyoming . . . . . . . . . . . . . . . . . . . . . . . . . . . Following 1996 reduction (Appendix H) Following 1999 maintenance reduction With SCHIP sample included Following 1999 maintenance reduction With SCHIP sample included 2,255 2,298 336 2,238 1,316 2,745 1,779 3,132 1,992 2,307 505 356 2,176 3,077 769 590 2,227 3,132 1,582 1,423 2,089 2,143 838 3,061 1,853 2,098 2,437 1,433 3,132 431 910 961 857 1,841 867 1,970 1,774 2,093 1,986 363 2,343 1,548 1,904 2,149 687 2,291 376 3,016 2,658 958 410 3,084 2,999 896 2,638 253 2,366 2,443 336 2,238 1,316 3,018 1,919 3,438 2,163 2,307 505 356 2,338 3,451 769 608 2,227 3,132 1,582 1,452 2,153 2,186 838 3,061 1,927 2,181 2,437 1,508 3,132 463 910 1,067 892 1,898 922 2,034 1,951 2,093 2,145 363 2,376 1,629 2,046 2,149 687 2,385 376 3,140 2,796 1,063 436 3,268 3,329 896 2,638 253 2,128 1,955 244 2,238 1,316 3,018 1,919 3,438 1,331 1,154 367 285 2,338 3,451 513 608 2,227 1,927 1,055 968 1,722 2,186 447 1,884 1,927 2,181 1,625 1,508 2,088 463 662 711 446 1,898 922 2,034 1,951 2,093 2,145 264 2,376 1,448 1,364 2,149 344 1,735 273 2,791 2,796 945 232 2,619 2,048 717 1,623 202 1.9 7.2 7.4 7.0 7.2 2.9 4.3 3.7 7.0 7.7 7.4 8.0 3.7 6.3 7.8 6.8 4.2 7.0 7.2 7.3 7.2 7.0 7.6 7.1 5.0 4.5 7.3 7.3 7.1 7.2 7.2 7.0 7.5 4.4 7.2 3.2 5.2 4.3 5.6 7.4 4.4 7.4 7.3 4.1 7.8 7.4 7.1 7.0 3.6 7.1 7.3 6.8 7.6 7.1 7.4 7.3 1.8 6.5 6.5 7.0 7.2 2.9 4.3 3.7 5.8 5.4 6.3 7.1 3.7 6.3 6.4 6.8 4.2 5.7 6.2 6.2 6.5 7.0 5.7 5.7 5.0 4.5 6.3 7.3 5.9 7.2 6.3 5.8 5.3 4.4 7.2 3.2 5.2 4.3 5.6 6.5 4.4 7.1 6.1 4.1 5.5 6.4 6.3 6.6 3.6 6.8 5.4 6.2 6.3 6.4 6.3 6.7 J-8 Table J-5. CPS Sample Housing Unit Counts With and Without SCHIP Housing unit categories Assigned housing units . . . . . . . . . . . . . . . . . Eligible (occupied) . . . . . . . . . . . . . . . . . . . Interviewed. . . . . . . . . . . . . . . . . . . . . . . . Type A noninterviews . . . . . . . . . . . . . . . CPS only CPS and SCHIP 60,000 50,000 46,800 3,200 72,000 60,000 55,500 4,500 In support of the SCHIP, about 12,000 additional housing units are allocated to the District of Columbia and 31 states. These are generally the states with the smallest samples after the 60,000 housing units are allocated to satisfy the national and state reliability criteria. In the first stage of sampling, the 754 sample areas are chosen. In the second stage, ultimate sampling unit clusters composed of about four housing units each are selected.3 Each month, about 72,000 housing units are assigned for data collection, of which about 60,000 are occupied and thus eligible for interview. The remainder are units found to be destroyed, vacant, converted to nonresidential use, to contain persons whose usual place of residence is elsewhere, or to be ineligible for other reasons. Of the 60,000 eligible housing units, about 7.5 percent are not interviewed in a given month due to temporary absence (such as a vacation), other failures to make contact after repeated attempts, inability of persons contacted to respond, unavailability for other reasons, and refusals to cooperate (about half of the noninterviews). Information is obtained each month for about 112,000 persons 16 years of age or older. OTHER ASPECTS OF THE SCHIP EXPANSION Although the general sample increase was the most difficult to implement of the three parts of the SCHIP expansion, the other two parts actually provide more sample for the March Supplement. The following paragraphs provide more details on these. supplement for that month. Only households that would ordinarily not receive the March Supplement will receive it. This means that February households in MIS 4 and 8 will receive the March Supplement in February, and April MIS 1 and 5 households will receive it in April. Households from MIS 1 and 5 in the previous November will be screened at that time based upon whether they have children (18 or younger) or non-White members. Those that do meet these criteria will receive the March CPS basic instrument and supplement in February, when they are MIS 4 and 8, respectively. Households with Hispanic residents will be excluded from this screening, since they will already be interviewed as part of the November Hispanic oversampling. The March supplement will take the place of the February supplement for these households, effectively lowering the February supplement’s sample size. In April, sample cases from MIS 1 and 5 will be screened in the regular CPS interview for presence of children or Hispanic or non-White inhabitants. Those who meet any of these criteria will be given the March supplement instead of the April supplement. See Chapter 11 for more details on CPS supplements. Month-in-Sample 9 Assignments CPS sample from MIS 6, 7, and 8 in November will be assigned for interview in February or April using the March CPS basic instrument and income supplement. Before interviewing, these households will be screened, using CPS data, so that each household with at least one child age 18 or younger or a non-White member will be in the sample. (Hispanics will already be interviewed as a result of the Hispanic oversampling of the March supplement; see Chapter 11.) To the extent possible, these cases will be assigned for interview through one of the telephone centers, but recycles will be interviewed in the field during the week of the appropriate month in which CPS interviews are conducted. Split-Path Assignments As part of the SCHIP expansion, some CPS sample from February and April will receive the March CPS Supplement. This will take the place of the regularly scheduled 3 In states with the SCHIP general sample increase, the size of these ultimate clusters can be between four and eight housing units. EFFECT OF SCHIP EXPANSION ON STANDARD ERRORS OF ESTIMATES OF UNINSURED LOW-INCOME CHILDREN Table J-6 shows the effect of the SCHIP sample expansion on the standard errors of the state-level estimates of J-9 Table J-6. Standard Errors on Percentage of Uninsured, Low-Income Children Compared at Stages of the SCHIP Expansion1 [In percent] Before SCHIP expansion (percent) After general sample increase (percent) After SCHIP expansion (percent)2 Alabama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alaska. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arizona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arkansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . California . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Colorado. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connecticut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Delaware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . District of Columbia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florida. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.61 1.72 1.24 1.32 0.60 1.89 2.16 1.88 1.69 0.88 1.44 1.47 1.24 1.32 0.60 1.48 1.53 1.60 1.51 0.88 1.22 1.30 1.14 1.17 0.53 1.32 1.34 1.41 1.29 0.80 Georgia. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hawaii. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Idaho. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iowa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kentucky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Louisiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.32 1.81 1.39 0.98 1.90 1.70 1.75 1.61 1.38 2.03 1.32 1.47 1.39 0.98 1.49 1.39 1.43 1.44 1.38 1.49 1.12 1.18 1.24 0.86 1.33 1.24 1.26 1.28 1.17 1.36 Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massachusetts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michigan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minnesota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mississippi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Missouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Montana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nebraska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nevada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Hampshire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.19 1.40 1.08 1.76 1.33 1.77 1.38 1.69 1.75 2.20 1.71 1.40 1.08 1.44 1.33 1.44 1.38 1.44 1.43 1.56 1.48 1.26 0.94 1.25 1.15 1.29 1.23 1.24 1.30 1.37 New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Mexico. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New York . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . North Carolina. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . North Dakota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oklahoma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oregon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rhode Island . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.27 1.19 0.70 1.22 1.63 1.05 1.49 1.69 1.03 2.23 1.27 1.19 0.70 1.22 1.39 1.05 1.40 1.38 1.03 1.57 1.12 1.09 0.62 1.07 1.22 0.91 1.23 1.24 0.92 1.40 South Carolina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . South Dakota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tennessee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Utah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vermont . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Washington . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . West Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wisconsin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wyoming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.67 1.68 1.49 0.71 1.46 2.00 1.79 1.83 1.58 1.88 1.56 1.42 1.43 1.41 0.71 1.38 1.46 1.60 1.44 1.42 1.47 1.39 1.23 1.23 1.25 0.64 1.20 1.34 1.39 1.24 1.28 1.32 1.24 State 1 Standard error on difference of two consecutive 3-year averages. Includes MIS 9 and split-path cases. 2 J-10 the proportions of low-income children without health insurance. These standard errors: • Refer to the difference between two consecutive 3-year averages, • Are calculated after 4 years of the SCHIP expansion; i.e. all 3 years of both estimates are expanded sample, • Use the same rate of low-income children without health insurance for all states. • Are based on expected, not actual, sample sizes. The values in the middle column (After General Sample Increase) are proportional to the final detectable differences found in Table J-1. Included in the last column in the table is an estimated standard error for each state after the Split-path and MIS 9 cases are taken into account. REFERENCES Cahoon, L. (2000), ‘‘CPS 1990 Design Reassignment #1 - September 2000 (SCHIP90-2, SAMADJ90-20),’’ Memorandum for J. Mackey Lewis, April 11th, Demographic Statistical Methods Division, U.S. Census Bureau. Rigby, B. (2000), ‘‘SCHIP CPS Sample Expansion used to Quantify Sample Expansion Among the States (SCHIP1),’’ Memorandum for Documentation, March 29th, Demographic Statistical Methods Division, U.S. Census Bureau. Shoemaker, H. (2000), ‘‘Effect of SCHIP/CPS General Sample Expansion on State Sampling Intervals and Coefficients of Variation (SAMADJ90-22, SCHIP90-5),’’ Memorandum for Distribution List, Demographic Statistical Methods Division, U.S. Census Bureau. U.S. Department of Labor, Bureau of Labor Statistics, Employment and Earnings, Vol.48, No.8, Washington, DC: Government Printing Office. U.S. Public Law 106-113. ACRONYMS–1 Acronyms ADS API ARMA BLS BPC CAPE CAPI CATI CCM CCO CES CESEM CNP CPS CPSEP CPSRT CV DA DAFO DEFF DOD DRAF FOSDIC FR FSC FSCPE GQ GVF HVS I&O IM INS IRCA LAUS MARS MIS Annual Demographic Supplement Asian and Pacific Islander Autoregressing Moving Average Bureau of Labor Statistics Basic PSU Components Committee on Adjustment of Postcensal Estimates Computer Assisted Personal Interviewing Computer Assisted Telephone Interviewing Civilian U.S. Citizens Migration CATI and CAPI overlap Current Employment Statistics Current Employment Statistics Survey Employment Civilian Noninstitutional Population Current Population Survey CPS Employment-to-Population Ratio CPS Unemployment Rate Coefficient of Variation Demographic Analysis Deaths of the Armed Forces Overseas Design Effect Department of Defense Deaths of the Resident Armed Forces Film Optical Sensing Device for Input to the Computer Field Representative First- and Second-Stage Combined Federal State Cooperative Program for Population Estimates Group Quarters Generalized Variance Function Housing Vacancy Survey Industry and Occupation International Migration Immigration and Naturalization Service Immigration Reform and Control Act Local Area Unemployment Statistics Modified Age, Race, and Sex Month-in-Sample MLR MOS MSA NCEUS NCHS NHIS NPC NRAF NRO NSR OB OD OMB ORR PAL PES PIP POP PSU QC RDD RE RO SCHIP SECU SFR SMSA SOC SR STARS SW TE UE UI URE USU WPA Major Labor Force Recode Measure of Size Metropolitan Statistical Area National Commission on Employment and Unemployment Statistics National Center for Health Statistics National Health Interview Survey National Processing Center Net Recruits to the Armed Forces Net Recruits to the Armed Forces from Overseas Nonself-Representing Number of Births Number of Deaths Office of Management and Budget Office of Refugee Resettlement Permit Address List Post-Enumeration Survey Performance Improvement Period Performance Opportunity Period Primary Sampling Unit Quality Control Random Digit Dialing Response Error Regional Office State Children’s Health Insurance Program Standard Error Computation Unit Senior Field Representative Standard Metropolitan Statistical Area Survey of Construction Self-Representing State Time Series Analysis and Review System Start-With Take-Every Number of People Unemployed Unemployment Insurance Usual Residence Elsewhere Ultimate Sampling Unit Work Projects Administration INDEX–1 Index A Active duty military personnel D-4 Address identification 4-1–4-4 Address lists, See Permit address lists Adjusted population estimates definition D-1 ADS, See Annual Demographic Supplement Age base population by state D-18 deaths by D-15–D-16 institutional population D-18 international migration D-16–D-17 migration of Armed Forces and civilian citizens D-17 modification of census distributions D-14–D-15 population updates by D-10–D-14 Alaska addition to population estimates 2-2 American Indian Reservations 3-9 Annual Demographic Supplement 11-2–11-8, J-1–J-10 Area frame address identification 4-1–4-2 Area Segment Listing Sheets B-3, B-8 Area Segment Maps B-3, B-9–B-10 county maps B-9 description of 3-8–3-9 Group Quarters Listing Sheets B-3 listing in 4-5 Map Legends B-9, B-12 materials B-3, B-9 segment folders B-3 Segment Locator Maps B-9, B-11 Area Segment Maps B-3, B-9–B-10 ARMA, See Autoregressive moving average process Armed Forces members D-4, D-10, D-17, 11-5–11-6, 11-8 Autoregressive moving average process E-3–E-4 B Base population definition D-1 Baseweights (see also basic weights) I-2, 10-2 Basic PSU components 3-10–3-11 Basic weights (see also baseweights) H-4, 10-2 Behavior coding 6-3 Between-PSU variances 10-3 Births definition D-6 distribution by sex, race, and Hispanic origin D-15 BLS, See U.S. Bureau of Labor Statistics BPCs, See Basic PSU components BPO, See Building permit offices Building permit offices B-9, 3-9 Building Permit Survey 3-7, 3-9, 4-2 Building permits permit frame 3-9–3-10 sampling 3-2 Bureau of Labor Statistics, See U.S. Bureau of Labor Statistics Business presence 6-4 C California substate areas 3-1 CAPE, See Committee on Adjustment of Postcensal Estimates CAPI, See Computer Assisted Personal Interviewing CATI, See Computer Assisted Telephone Interviewing CATI recycles 4-7, 8-2 CCM, See Civilian U.S. citizens migration CCO, See Computer Assisted Telephone Interviewing and Computer Assisted Personal Interviewing Overlap Census base population D-6 Census Bureau, See U.S. Census Bureau Census Bureau report series 12-2 Census maps B-3, B-9 Census-level population definition D-2 CES, See Current Employment Statistics CESEM, See Current Employment Statistics survey employment Check-in of sample 15-4 Civilian housing 3-8 Civilian noninstitutional population D-11, 3-1, 5-3, 10-5 Civilian population definition D-2 migration by age, sex, race, and Hispanic origin D-17 Civilian U.S. citizens migration D-9 Class-of-worker classification 5-4 Cluster design changes in 2-3 Clustering, geographic 4-2, 4-4 Clustering algorithm 3-4 CNP, See Census base population; Civilian noninstitutional population Coding data 9-1 INDEX–2 Coefficients of variation calculating 3-1 changes in requirements H-1—H-2 changes resulting from SCHIP J-6–J-8 Collapsing cells 10-7, 10-12 College students state population adjustment D-20 Combined Reference File listing B-3, B-9 Committee on Adjustment of Postcensal Estimates D-20 Complete nonresponse 10-2 Components of population change definition D-2 Composite estimation introduction of 2-2 new procedure for I-1–I-2 Composite estimators 10-9–10-10 Computer Assisted Interviewing System 2-4–2-5, 4-6 Computer Assisted Personal Interviewing daily processing 9-1 description of 4-6–4-7, 6-2 effects on labor force characteristics 16-7–16-8 implementation of 2-5 nonsampling errors 16-3, 16-7–16-8 Computer Assisted Telephone Interviewing daily processing 9-1 description of 4-6–4-7, 6-2, 7-6, 7-9 effects on labor force characteristics 16-7–16-8 implementation of 2-4–2-5 nonsampling errors 16-3, 16-7–16-8 training of staff F-1–F-5 transmission of 8-1–8-3 Computer Assisted Telephone Interviewing and Computer Assisted Personal Interviewing Overlap 2-5 Computerization electronic calculation system 2-2 questionnaire development 2-5 Conditional probabilities A-1–A-4 Consumer Income report 12-2 Control card information 5-1 Counties combining into PSUs 3-3 selection of 3-2 County maps B-9 Coverage errors controlling 15-2—15-4 coverage ratios 16-1—16-3 sources of 15-2 CPS, See Current Population Survey CPS control universe definition D-2 CPS employment-to-population ratio E-3 CPS unemployment rate E-3 CPSEP, See CPS employment-to-population ratio CPSRT, See CPS unemployment rate Current Employment Statistics E-1–E-2, 1-1, 10-11 Current Employment Statistics survey employment E-6 Current Population Survey data products from 12-1–12-4 decennial and survey boundaries F-2 design overview 3-1–3-2 history of revisions 2-1–2-5 overview 1-1 participation eligibility 1-1 reference person 1-1, 2-5 requirements H-1, 3-1 sample design H-1–H-9, 3-2–3-15 sample rotation 3-15–3-16 supplemental inquiries 11-1–11-8 CV, See Coefficients of variation D DA population, See Demographic analysis population DAFO, See Deaths of the Armed Forces overseas Data collection staff organization and training F-1–F-5 Data preparation 9-1–9-3 Data quality 13-1–13-3 Deaths defining D-6 distribution by age D-15–D-16 distribution by sex, race, and Hispanic origin D-15 Deaths of the Armed Forces overseas D-10 Deaths of the resident Armed Forces D-10 deff, See Design effects Demographic analysis population definition D-2 Demographic data for children 2-4 codes 9-2 edits 7-8, 9-2 interview information 7-7, 7-9 procedures for determining characteristics 2-4 questionnaire information 5-1–5-2 Denton Method 10-11 Department of Defense D-9, D-10 Dependent interviewing 7-5–7-6 Design effects 14-7–14-8 Direct-use states 10-11 Disabled persons questionnaire information 6-7 Discouraged workers definition 5-5–5-6, 6-1–6-2 questionnaire information 6-7–6-8 Document sensing procedures 2-1 DOD, See Department of Defense Dormitory adjustment D-20 DRAF, See Deaths of the resident Armed Forces Dual-system estimation D-20–D-22 Dwelling places 2-1 INDEX–3 E Earnings edits and codes 9-3 information 5-4–5-5, 6-5 weights 10-12 Edits 9-2–9-3, 15-8–15-9 Educational attainment categories 5-2 Electronic calculation system 2-2 Emigration definition D-2 Employed persons definition 5-3 Employment and Earnings 12-1, 14-5 The Employment Situation 12-1 Employment statistics, See Current Employment Statistics Employment status, See also Unemployment full-time workers 5-4 layoffs 2-2, 5-5, 6-5–6-7 monthly estimates 10-10–10-11 not in the labor force 5-5–5-6 occupational classification additions 2-3 part-time workers 2-2, 5-3–5-4 seasonal adjustment E-5, 2-2, 10-10 state model-based labor force estimation E-1–E-6 Employment-population ratio definition 5-6 Employment-to-population ratio models 10-11 Enumerative Check Census 2-1 Estimates population definition D-2 Estimating variance, See Variance estimation Estimators E-1–E-6, 10-9–10-10, 13-1 Expected values 13-1 F Families definition 5-2 Family businesses presence of 6-4 Family equalization 11-5 Family weights 10-12 Federal State Cooperative Program for Population Estimates D-10, D-18 Field PSUs 3-13 Field representatives, See also Interviewers conducting interviews 7-1, 7-3–7-6 evaluating performance F-4–F-5 performance guidelines F-3 training procedures F-1–F-2 transmitting interview results 8-1–8-3 Field subsampling 3-14–3-15, 4-6 Film Optical Sensing Device for Input to the Computer 2-2–2-3 Final hit numbers 3-12 Final weight 10-9 First- and second-stage combined estimates 10-9, 14-5–14-7 First-stage weights 10-4, 11-4 FOSDIC, See Film Optical Sensing Device for Input to the Computer 4-8-4 rotation system 2-1–2-2, 3-15 Friedman-Rubin clustering algorithm 3-4 FRs, See Field representatives FSC, See First- and second-stage combined estimates FSCPE, See Federal State Cooperative Program for Population Estimates Full-time workers definition 5-4 G Generalized variance functions C-2, 14-3–14-5 Geographic clustering 4-2, 4-4 Geographic Profile of Employment and Unemployment 12-2 GQ, See Group quarters frame Group Quarters frame address identification 4-2 base population by state D-18 census maps B-9 Combined Reference File listing B-9 description of 3-7–3-9 Group Quarters Listing Sheets B-3, B-9, B-13, 4-5–4-6 Incomplete Address Locator Actions form B-9 listing in 4-5–4-6 materials B-9 segment folder B-9 GVFs, See Generalized variance functions H Hadamard matrix 14-2 Hawaii addition to population estimates 2-2 Hispanic origin distribution of births and deaths D-15 institutional population D-18 international migration D-16–D-17 migration of Armed Forces and civilian citizens mover and nonmover households 11-5–11-8 population updates by D-10–D-11 Hit strings 3-11–3-12 Homeowner vacancy rates 11-2 Horvitz-Thompson estimators 10-9 ‘‘Hot deck’’ allocations 9-2–9-3, 11-5 Hours of work definition 5-3 questionnaire information 6-4 Households base population by state D-18–D-19 definition 5-1–5-2 edits and codes 9-2 eligibility 7-1, 7-3 mover and nonmover households 11-5–11-8 questionnaire information 5-1–5-2 D-17 INDEX–4 Households—Con. respondents 5-1 rosters 5-1, 7-3, 7-6 weights 10-12 Housing units definition 3-7 frame omission 15-2 interview information 7-4–7-5, 7-9 measures 3-8 misclassification of 15-2 questionnaire information 5-1 selection of sample units 3-7 ultimate sampling units 3-2 vacancy rates 11-2–11-4 Housing Vacancy Survey 7-1, 11-2–11-4 HUs, See Housing units HVS, See Housing Vacancy Survey I IM, See International migration Immigration definition D-2 Immigration and Naturalization Service D-2, D-7, D-16 Immigration Reform and Control Act D-8, D-16 Imputation techniques 9-2–9-3, 15-8–15-9 Incomplete Address Locator Actions forms B-3, B-7, B-9 Independent population controls, See Population controls Industry and occupation data coding of 2-4, 8-2, 9-1, 9-3, 15-8 edits 9-3 questionnaire information 5-4, 6-4–6-5 verification 15-8 Inflation-deflation method D-11–D-14, 2-3 Initial adjustment factors 10-7 INS, See Immigration and Naturalization Service Institutional housing 3-8 Institutional population changes in D-10 definition D-2 population by age, sex, race, and Hispanic origin D-18 Internal consistency of a population definition D-2 International migration by age, sex, race, and Hispanic origin D-16–D-17 definition D-2 population projections D-7–D-9 Internet CPS reports 12-2 employment statistics 12-1 questionnaire revision 6-3 Interview week 7-1 Interviewers, See also Field representatives assignments 4-6–4-7 controlling response error 15-7 debriefings 6-3 interaction with respondents 15-7–15-8 Spanish-speaking 7-6 Interviews beginning questions 7-5, 7-7 dependent interviewing 7-5–7-6 ending questions 7-5, 7-7 household eligibility 7-1, 7-3 initial 7-3 noninterviews 7-1, 7-3, 7-4 reinterviews D-20–D-22, G-1–G-2, 15-8 results 7-10 telephone interviews 7-4–7-10 transmitting results 8-1–8-3 Introductory letters 7-1, 7-2 I&O, See Industry and occupation data IRCA, See Immigration Reform and Control Act Item nonresponse 9-2, 10-2, 16-5 Item response analysis 6-2–6-3 Iterative proportional fitting 10-5 J Job leavers 5-5 Job losers 5-5 Job seekers definition 5-5 duration of search 6-6–6-7 methods of search 6-6 Joint probabilities A-1–A-4 L Labor force adjustment for nonresponse 10-2–10-3 composite estimators 10-9–10-10 definition 5-6 edits and codes 9-2–9-3 effect of Type A noninterviews on classification 16-4–16-5 family weights 10-12 first-stage ratio adjustment 10-3–10-5 gross flow statistics 10-13–10-14 household weights 10-12 interview information 7-3 longitudinal weights 10-13–10-14 monthly estimates 10-10–10-11, 10-14 new composite estimation method I-1–I-2 outgoing rotation weights 10-12–10-13 questionnaire information 5-2–5-6, 6-2 ratio estimation 10-3 seasonal adjustment 10-10 second-stage ratio adjustment 10-5–10-9 state model-based estimation E-1–E-6 unbiased estimation procedure 10-1–10-2 veterans’ weights 10-13 Labor force participation rate definition 5-6 LABSTAT 12-1 Large special exceptions Unit/Permit Listing Sheets B-6 Layoffs 2-2, 5-5, 6-5–6-7 INDEX–5 Legal permanent residents definition D-2 Levitan Commission 2-4, 6-1, 6-7 Listing activities 4-4–4-6 Listing checks 15-4 Listing Sheets Area Segment B-3, B-8 Group Quarters B-3, B-9, B-13, 4-5–4-6 reviews 15-3–15-4 Unit/Permit B-1, B-3–B-6, B-9, B-14, 4-4–4-5 Longitudinal edits 9-2 Longitudinal weights 10-13–10-14 M Maintenance reductions C-1–C-2 Major Labor Force Recode 9-2–9-3 Map Legends B-9, B-12 March Supplement (see also Annual Demographic Supplement) 11-2–11-8 Marital status categories 5-2 MARS, See Modified age, race, and sex Maximum Overlap procedure A-1–A-4, 3-6 Mean squared error 13-1–13-2 Measure of size 3-8 Metropolitan Statistical Areas 3-3, 10-2–10-3 Microdata files 12-1–12-2, 12-4 Military housing 3-8, 3-9 MIS, See Month-in-sample MLR, See Major Labor Force Recode Modeling errors 15-9 Modified age, race, and sex application of coverage ratios D-21–D-22 definition D-2 Month-in-sample 7-1, 7-4–7-5, 11-4–11-5, 11-8, 16-7–16-9 Month-In-Sample 9 J-1, J-9–J-10 Monthly Labor Review 12-1 MOS, See Measure of size Mover households 11-5–11-8 MSAs, See Metropolitan Statistical Areas Multiple jobholders definition 5-3 questionnaire information 6-2, 6-4 Multiunit structures Unit/Permit Listing Sheets B-5 Multiunits 4-4–4-5 N National Center for Education Statistics D-20 National Center for Health Statistics D-6 National Commission on Employment and Unemployment Statistics 2-4, 6-1, 6-7 National Health Interview Survey 3-10 National Park blocks 3-9 National Processing Center F-1 Natural increase of a population definition D-2 NCES, See National Center for Education Statistics NCEUS, See National Commission on Employment and Unemployment Statistics NCHS, See National Center for Health Statistics Net recruits to the Armed Forces D-10 Net recruits to the Armed Forces from overseas D-10 New entrants 5-5 New York substate areas 3-1 NHIS, See National Health Interview Survey Nondirect-use states 10-11 Noninstitutional housing 3-8 Noninterview adjustments 10-3, 11-4 clusters 10-2–10-3 factors 10-3 Noninterviews 7-1, 7-3–7-4, 9-1, 16-2–16-5 Nonmover households 11-5–11-8 Nonresponse H-4, 9-2, 10-2, 15-4–15-5 Nonresponse errors item nonresponse 16-5 mode of interview 16-6–16-7 proxy reporting 16-10 response variance 16-5–16-6 time in sample 16-7–16-10 Type A noninterviews 16-3–16-5 Nonsampling errors controlling response error 15-6–15-8 coverage errors 15-2–15-4, 16-1 definitions 13-2–13-3 miscellaneous errors 15-8–15-9 nonresponse errors 15-4–15-5, 16-3–16-5 overview 15-1–15-2 response errors 15-6 Nonself-representing primary sampling units 3-4–3-6, 10-4, 14-1–14-3 NPC, See National Processing Center NRAF, See Net recruits to the Armed Forces NRO, See Net recruits to the Armed Forces from overseas NSR PSUs, See Nonself-representing primary sampling units O OB, See Overseas births Occupational data coding of 2-4, 8-2, 9-1 questionnaire information 5-4, 6-4–6-5 OD, See Overseas deaths Office of Management and Budget D-2, 2-2 Office of Refugee Resettlement D-8 Old construction frames 3-8–3-9, 3-11–3-12 OMB, See Office of Management and Budget ORR, See Office of Refugee Resettlement Outgoing rotation weights 10-12–10-13 Overseas births D-9 Overseas deaths D-9 INDEX–6 P PALs, See Permit Address Lists Part-time workers definition 5-4 economic reasons 5-3 monthly questions 2-2 noneconomic reasons 5-3–5-4 Performance Improvement Period F-5 Performance Opportunity Period F-5 Permit address lists forms B-9, B-15, B-17, 4-6 operation 4-2, 4-4, 4-6 Permit frames address identification 4-2, 4-4 description of 3-9–3-11 listing in 4-6 materials B-9, B-17 permit address list forms B-9, B-15, B-17 Permit Sketch Map B-16–B-17 segment folder B-9 Unit/Permit Listing Sheets B-9, B-14 Permit Sketch Maps B-16–B-17 Persons on layoff definition 5-5, 6-5–6-7 PES, See Post-Enumeration Survey PIP, See Performance Improvement Period POP, See Performance Opportunity Period Population base definition D-1 Population Characteristics report 12-2 Population controls adjustment for net underenumeration D-20 annual revision process D-22 dual-system estimation D-20–D-22 monthly revision process D-22 nonsampling error 15-9 population projection calculations D-4–D-20 the population universe D-3–D-4 procedural revisions D-22–D-23 sources D-23, 10-6 for states D-18–D-20 terminology D-1–D-3 Population data revisions 2-3 Population estimates definition D-2 Population projections calculation of D-4–D-20 definition D-2 total D-4–D-10 Population universe for CPS controls D-3–D-4 definition D-3 Post-Enumeration Survey D-1, D-2, D-20–D-22 Post-sampling codes 3-12–3-14 pps, See Probability proportional to size President’s Committee to Appraise Employment and Unemployment Statistics 2-3 Primary Sampling Units adding a county A-3 basic PSU components 3-10–3-11 changes in variance estimates H-5 definition 3-2–3-3 division of states 3-2 dropping a county A-3–A-4 dropping and adding A-4 field PSUs 3-13 increase in number of 2-1–2-4 Maximum Overlap procedure A-1–A-4, 3-6 noninterview clusters 10-2–10-3 nonself-representing 3-4–3-6, 10-4, 14-1–14-3 rules for defining 3-3 selection of 3-4–3-6 self-representing 3-3–3-5, 10-4, 14-1–14-3 stratification of 3-3–3-5 Probability proportional to size (see also Chapter 3) A-1 Probability samples 10-1–10-2 Projections, population calculation of D-4–D-20 definition D-2 total D-4–D-10 Proxy reporting 16-10 PSUs, See Primary Sampling Units Publications 12-1–12-4 Q QC, See Quality control Quality control reinterview program G-1–G-2, 15-8 Quality measures of statistical process 13-1–13-3 Questionnaires, See also Interviews controlling response error 15-6 demographic information 5-1–5-2 household information 5-1–5-2 labor force information 5-2–5-6 revision of 2-1, 6-1–6-8 structure of 5-1 R Race category modifications 2-1, 2-3–2-4 determination of 2-4 distribution of births and deaths D-15 institutional population D-18 international migration D-16–D-17 migration of Armed Forces and civilian citizens D-17 modification of census distributions D-14–D-15 population updates by D-10–D-11 racial categories 5-2 Raking ratio 10-5, 10-6–10-8 Random digit dialing sample 6-2 Random groups 3-12–3-13 INDEX–7 Random starts 3-11 Ratio adjustments first-stage 10-3–10-5 second-stage 10-5–10-9 Ratio estimates revisions to 2-1–2-4 two-level, first-stage procedure 2-4 RDD, See Random digit dialing sample RE, See Response error Recycle Reports 8-2 Reduction groups C-1–C-2 Reduction plans C-1–C-2, H-1 Reentrants 5-5 Reference dates definition D-2 Reference persons 1-1, 2-5, 5-1–5-2 Reference week 5-3, 6-3–6-4, 7-1 Referral codes 15-8 Refusal rates 16-2–16-4 Regional offices operations of 4-7 organization and training of data collection staff transmitting interview results 8-1–8-3 Regression components E-1–E-2 Reinterviews D-20–D-22, G-1–G-2, 15-8 Relational imputation 9-2 Relationship information 5-2 Relative variances 14-4–14-5 Rental vacancy rates 11-2 Replicates definition 14-1 Replication methods 14-1–14-2 Report series 12-2 Residence cells 10-3 Resident population definition D-2–D-3 Respondents controlling response error 15-7 debriefings 6-3 interaction with interviewers 15-7–15-8 Response distribution analysis 6-2–6-3 Response error G-1, 6-1, 15-6–15-8 Response variance 13-2, 16-5–16-6 Retired persons questionnaire information 6-7 ROs, See Regional offices Rotation chart 3-15–3-16, J-4–J-5 Rotation groups 3-12, 10-1, 10-12 Rotation system 2-1–2-2, 3-2, 3-15–3-16 RS, See Random starts S Sample characteristics Sample data revisions 3-1 2-3 F-1–F-5 Sample design changes due to funding reduction H-1–H-9 controlling coverage error 15-3 county selection 3-2–3-6 definition of the PSUs 3-2–3-3 field subsampling 3-14–3-15, 4-6 old construction frames 3-8–3-9 permit frame 3-9–3-10 post-sampling code assignment 3-12–3-14 quality measures 13-1–13-3 sample housing unit selection 3-7–3-14 sample unit selection 3-10 sampling frames 3-7–3-8, 3-10 sampling procedure 3-11–3-12 sampling sources 3-7 selection of the sample PSUs 3-4–3-6 stratification of the PSUs 3-3–3-5 variance estimates 14-5–14-6 within-PSU sample selection for old construction 3-12 within-PSU sort 3-11 Sample preparation address identification 4-1–4-4 area frame materials B-3, B-9 components of 4-1 controlling coverage error 15-3–15-4 goals of 4-1 group quarters frame materials B-9 interviewer assignments 4-6–4-7 listing activities 4-4–4-6 permit frame materials B-9, B-17 unit frame materials B-1–B-3, B-4–B-8 Sample redesign 2-5 Sample size determination of 3-1 maintenance reductions C-1–C-2 reduction in C-1–C-2, H-1, 2-3–2-5 Sample Survey of Unemployment 2-1 Sampling bias 13-1–13-2 error 13-1–13-2 frames 3-7–3-8 intervals H-3–H-4, 3-6, J-6–J-7 variance 13-1–13-2 SCHIP, See State Children’s Health Insurance Program School enrollment edits and codes 9-3 School enrollment data 2-4 Seasonal adjustment of employment E-5, 2-2, 10-10 Second jobs 5-4 Second-stage adjustment 11-4–11-8 Security transmitting interview results 8-1 SECUs, See Standard Error Computation Units INDEX–8 Segment folders area frames B-3 group quarters frames B-9 permit frames B-9 unit frame B-1–B-2 Segment Locator Maps B-9, B-11 Segment number suffixes 3-13 Segment numbers 3-13 Selection method revision 2-1 Self-employed definition 5-4 Self-representing primary sampling units 3-3–3-5, 10-4, 14-1–14-3 Self-weighting samples 10-2 Senior field representatives F-1 Sex distribution of births and deaths D-15 institutional population D-18 international migration D-16–D-17 migration of Armed Forces and civilian citizens D-17 population updates by D-10–D-11 SFRs, See Senior field representatives Signal components E-1 Signal-plus-noise models E-1, 10-11 Simple response variance 16-5–16-6 Simple weighted estimators 10-9 Skeleton sampling 4-2 Sketch maps B-16–B-17 SMSAs, See Standard Metropolitan Statistical Areas SOC, See Survey of Construction Spanish-speaking interviewers 7-6 Special Studies report 12-2 Split-path J-8 SR PSUs, See Self-representing primary sampling units Staff organization and training F-1–F-5 Standard Error Computation Units H-5, 14-1–14-2 Standard Metropolitan Statistical Areas 2-3 State Children’s Health Insurance Program 1-1, 2-5, 3-1, 9-1, 11-4–11-8, E-1, H-1, J-1–J-10 STARS, See State Time Series Analysis and Review System Start-with procedure 3-14–3-15, 4-2 State sampling intervals 3-6 State supplementary samples 2-3–2-4 State Time Series Analysis and Review System E-4 State-based design 2-4 States model-based labor force estimation E-1–E-6 monthly employment and unemployment estimates 10-10– 10-11 population controls for D-18–D-20 Statistical Policy Division 2-2 Statistics quality measures 13-1–13-3 Subfamilies definition 5-2 Subsampling 3-14–3-15, 4-6 Successive difference replication 14-2–14-3 Supplemental inquiries Annual Demographic Supplement 11-2–11-8 criteria for 11-1–11-2 Housing Vacancy Survey 11-2–11-4, 11-8 Survey, See Current Population Survey Survey instrument, See Questionnaires Survey of Construction B-9, 4-4 Survey week 2-2 SW, See Start-with procedure Systematic sampling 4-2–4-3 T Take-every procedure 3-14–3-15, 4-2 TE, See Take-every procedure Telephone interviews 7-4–7-10, 16-6–16-8 Temporary jobs 5-5 Time series component modeling E-1–E-4 Two-stage ratio estimators 10-9 Type A noninterviews 7-1, 7-3–7-4, 10-2, 16-3–16-5 Type B noninterviews 7-1, 7-3–7-4 Type C noninterviews 7-1, 7-3–7-4 U UI, See Unemployment insurance Ultimate sampling units area frames 3-8–3-9 description 3-2 hit strings 3-11–3-12 reduction groups C-1–C-2 selection of 3-7 special weighting adjustments 10-2 variance estimates 14-1–14-4 Unable to work persons questionnaire information 6-7 Unbiased estimation procedure 10-1–10-2 Unbiased estimators 13-1 Undocumented migration definition D-3 Unemployment classification of 2-2 definition 5-5 duration of 5-5 early estimates of 2-1 employment search information 2-3 household relationships and 2-3 monthly estimates 10-10–10-11 questionnaire information 6-5–6-7 reason for 5-5 seasonal adjustment E-5, 2-2, 10-10 state model-based labor force estimation E-1–E-6 variance on estimate of 3-6 Unemployment insurance E-1, E-6, 10-11 Unemployment rate definition 5-6 models 10-11 INDEX–9 Union membership 2-4 Unit frame address identification 4-1 census maps B-3 Combined Reference File listing B-3 description of 3-8 Incomplete Address Locator Actions form B-3, B-7 listing in 4-4–4-5 materials B-1–B-8 segment folders B-1–B-2 Unit/Permit Listing Sheets B-1, B-3–B-6 Unit nonresponse 10-2, 16-3–16-5 Unit segments Unit/Permit Listing Sheets B-4 Unit/Permit Listing Sheets B-1, B-3–B-6, B-9, B-14, 4-4–4-5 Universe population definition D-3 URE, See Usual residence elsewhere U.S. Bureau of Labor Statistics E-1, 1-1, 2-2, 12-1–12-2 U.S. Census Bureau 1-1, 2-2, 2-5, 4-1, 12-2 Usual residence elsewhere 7-1 USUs, See Ultimate sampling units V Vacancy rates 11-2 Variance estimation changes in H-4–H-9 Variance estimation—Con. definitions 13-1–13-2 design effects 14-7–14-8 determining optimum survey design 14-5–14-6 generalizing 14-3–14-5 objectives of 14-1 replication methods 14-1–14-2 for state and local area estimates 14-3 successive difference replication 14-2–14-3 total variances as affected by estimation 14-6–14-7 Veterans data for females 2-4 Veterans’ weights 10-13 W Web sites CPS reports 12-2 employment statistics 12-1 questionnaire revision 6-3 Weighting samples I-2, 10-2–10-3, 11-3–11-4, 15-8–15-9 Work Projects Administration 2-1 Workers, See Employment status WPA, See Work Projects Administration X X-11 ARIMA program E-5, 10-10, 10-11

Design and Methodology Current

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib