How Much Has Quality Improved at National Statistical Institutes in the Last 25 Years? David A. Marker Westat DavidMarker@Westat.com Presented at the 30th Anniversary of the Journal of Official Statistics Conference Stockholm, Sweden June 12, 2015 Obvious Answer: Yes Leadership Expert Group recommendations Biennial European Quality Conferences (Stockholm to Vienna) STATCAP Some change in focus from product to process (Morganstein and Marker, 1997) 2 On The Other Hand Quality is defined by our customers, and they are demanding more Timeliness expectations constantly shortening Increased use of administrative records; measuring their quality? Each change in Director General wants to do things “their way” – Saebo (2014) pointed out a conflict with Deming’s constancy of purpose Big, but not representative data available quickly on the web – BLS Quick, cheap, possibly non-representative on-line panel “surveys” The relative importance of accuracy and timeliness seems to be shifting – Are we responding? – Preliminary estimates 3 Key Points Changing relative importance of accuracy and timeliness Support for Continuous Quality Improvement cannot be a passive statement Move emphasis from measuring to improving quality This requires re-focusing on process improvement Increase efforts to use admin and other Big Data, but generally not stand-alone options to well done surveys 4 Quality Frameworks Brackstone (1999) SCB (2001) OECD (2011) 5 Relevance Accuracy Timeliness Accessibility Content Accuracy Timeliness Availability & Clarity Relevance Accuracy Timeliness Accessibility Interpretability Coherence Comparability & Coherence Interpretability Coherence Credibility OECD (2011) Only difference from Brackstone - addition of credibility: Confidence by users is built over time Objectivity of the data Perceived to be produced professionally – In accordance with appropriate statistical standards – Policies and practices are transparent Data are not manipulated, nor their release timed in response to political pressure 6 Quality Frameworks Brackstone (1999) SCB (2001) OECD (2011) ESS (2011) UN (2012) ONS (2013) 7 Relevance Accuracy Timeliness Accessibility Content Accuracy Timeliness Availability & Clarity Relevance Accuracy Timeliness Accessibility Relevance Accuracy & Reliability Timeliness & Punctuality Availability & Clarity Interpretability Coherence Comparability & Coherence Interpretability Coherence Coherence & Comparability Credibility European Statistical System Quality not a passive statement Statistical authorities must “systematically and regularly identify strengths and weaknesses to continuously improve process and product quality” Not only requires an organizational structure for managing quality, but also a focus on procedures to monitor process quality “Results are analyzed regularly and senior management is informed in order to decide [on] improving actions” 8 Cost One component missing from all these is Cost Opportunity cost measured in staff hours – There are many more projects that every NSI can undertake than they have resources for – Freeing up resources will allow us to improve other aspects of the NSI So cost should really be seen as a component of quality as well 9 Some NSIs are Considering Cost ONS (2013): “Cost, performance and respondent burden: These are important process quality components that are not readily covered by the output quality dimensions. There are invariably trade-offs required between all of the output quality components and cost, performance and response burden” OECD (2011) also mentions cost “which though is not strictly speaking, a quality dimension, is still an important consideration” 10 Measuring vs. Improving Quality Difference between NSIs focus on measuring quality of their products and services, rather than continuous improvement of quality The internal cost (in hours, Kronors, or Euros) to produce a database or analysis doesn’t affect the quality to the user, thus is independent of a user’s measure of its quality But the internal cost is vital to efforts to continuously improve quality A serious mistake we let happen; released some of the pressure that should have remained on quality improvement 11 How Do We Measure and Assure Progress? Internal and External reviews – – – – Identify opportunities to improve Opportunity to share best practices Labor intensive Internal requires a culture accepting of criticism and willing to learn/change A System for Product Improvement, Review, and Evaluation (ASPIRE) – External reviews – Formal structure – Dependent on common external reviewers “ASPIRE is a system for assessing the risks of error – from each potential source of error… – rating progress that has been made to reduce this [sic] risks – according to clearly specified evaluation criteria” (Biemer et al., 2014) 12 ASPIRE 8 components of error: – – – – – – – – 13 Sampling Frame Nonresponse Measurement Data processing Modeling/estimation Revision Specification Revision Error Revisions are an excellent way to improve timeliness Addresses risk from Web-based Big Data Revisions provide a measure of “truth” Report the expected error of the preliminary number – Captures non-sampling error sources as well as sampling Estimated errors are not only more accurate than others, they can be a way to educate the press and public 14 ASPIRE Focus on Accuracy “Biemer and Lyberg (2003) viewed accuracy as the dimension to be optimized in a survey while the other dimensions (the so-called user dimensions) can be treated as constraints during the design and implementation phases of production” Biemer et al. (2014) Elvers (2014) strongly disagreed with this as a general approach. So do I Accuracy should be one of the components of quality being optimized, not the only component Going back to Brackstone (1999), “Accuracy is important, but without attention to other dimensions of quality, accuracy alone will not satisfy users” Lyberg (2012) “During the last decades it has become obvious that accuracy and relevance are necessary but not sufficient when assessing survey quality” 15 Estimating Risk ASPIRE provides a formal, but subjective, measure of risk ONS in early 2000s (Linacre 2002) used informal risk for prioritization to evaluate all major statistical sources – – – – World Class Sound fundamentals but not World Class Needed improvements for users, or Information recognized as faulty for key users Identified sources for which “ONS reputation for quality seriously at risk” – – – New Earnings Survey had World Class estimates by area and industry Estimates were subject to influence of outliers No major redesign in 30 years put ONS’ reputation at risk This process helped ONS acquire major funding to overhaul its entire IT system to eliminate many of the major risks Eltinge (2011) suggested Total Survey Risk as an alternative to Total Survey Error 16 ASPIRE - Risk ASPIRE has 5 components of risk: – – – – – Knowledge of risks Communication with users and data suppliers Available expertise Compliance with standards and best practices Achievement towards risk mitigation or improvement plans Nice system for focusing efforts on key products and year-to-year improvement “Measurement error had the highest average inherent risk of any error source. It also ranked near the bottom in percent mitigated risk…Sampling error ranked the highest in percent mitigated risk” So measurement error had lots that could go wrong, but little had been done to address this, while lots had been done to reduce sampling error Like this focus on sampling error, could ASPIRE’s focus on accuracy over the other 7 components of error be related to its frequency of study? Is this like the drunk searching for his car keys under the lamppost? 17 Know Your Users Quality is defined in terms of its use so it is vital to understand your users and their intent for your data Lyberg (2012) lamented how little is known about NSI’s data users Costa et al. (2014, same issue of JOS!) reports that Spanish users don’t rate the quality components equally The relative weights vary across different types of users – Accuracy and reliability was most important overall, but – Users in central administration viewed it 3rd, after Coherence and comparability, and Timeliness and punctuality Yet another example of why focusing on accuracy as first among equals can be dangerous 18 Know Your Users (continued) Not only do users focus on different components, but different statistics – Any particular user won’t care about all of the products of an NSI – This complicates the process of understanding the user – Producers of main statistical products should identify their primary users and ask what quality characteristics are most important to them Government Statistics Division of the Census Bureau organized a review by a panel of the U.S. National Academy of Sciences, on which I served – Division estimates 14% of US GDP – Historically the only customer the Division focused on was the Bureau of Economic Analysis (BEA) – Wide range of other users: state and local governments, academics, others – Division now holds a series of meetings with other users to gain their input 19 Documentation Clearly doing a better job of documenting strengths and weaknesses of products Don’t congratulate yourselves too much, there was no world wide web in 1990, so almost by default we have better and more accessible documentation One of the first attempts was the Quality Profile (Brooks and Bailar, 1978) “About the Statistics” on the Stat Norway website provides a nice summary of many of the important definitions, sources of error, etc. However, Saebo (2014) points out that much of the information is dated, and the lack of complaints indicate most users probably don’t use the detail that is provided Many other NSIs probably have similar systems. Deciding how much detail to provide, and how frequently to update, are areas we can all work on The European Statistical Commission website has a similar page “Statistics Explained” http://ec.europa.eu/eurostat/statistics-explained/index.php/Main_Page. This is very impressive looking, but I wonder if it suffers from the same concerns that Saebo pointed out? 20 Organizational Leadership and Process Improvement Focus Jan Carling was Director General of SCB in the 1990s – Renowned for asking staff what quality improvement processes they were working on – Personal demonstration of a focus on improving quality understood throughout the organization – Never seen a better demonstration of how top managers can truly lead quality improvement – Are any of our current NSI managers demonstrating this personal level of commitment? The European Leadership Expert Group (Lyberg et al., 2001) was incredibly successful – Every Eurostat country bought in to the need to improve quality – Biannual meetings were held to share ideas , starting here in Stockholm in 2001 – Standardized procedures and harmonized definitions across Europe 21 Current Best Methods (CBMs) Some NSIs have developed Current Best Methods (CBMs) – Conditions will change in the future, so they need regular updating – This is difficult; we haven’t kept this going as well as we should at Westat either – Training new staff in well-developed CBMs can produce major quality improvements; at the very least by minimizing the re-occurrence of errors we previously produced – Lyberg (2012) pointed out how useful CBMs can be – Described in more detail in Morganstein and Marker (1997) I believe the most important CBM we developed at Westat is on communication between analysts (statisticians) and programming staff – Clearly states inputs and outputs – Encourages building in quality checks – Provides vital reference documentation at the end of a survey to explain what was actually done at earlier stages – This CBM was originally developed almost 20 years ago, but I have introduced it to new parts of Westat just this year, when questions about how best to communicate were raised 22 Current Best Methods (continued) Lack of recent publication of examples may indicate that the focus on process improvement has waned over the years Morganstein and Marker (1997) laid out how one moves from – – – – A focus on product characteristics specified (hopefully) by users To the key processes impacting their quality How one measures whether those processes are under control Only then determine if they are capable of producing the outputs needed by the NSI I urge a re-focus on continuously improving quality 23 Big Data Changes the Role of NSIs Larry Brown (UPenn) at the 2015 Hansen Lecture suggested there might be an agricultural analogy to Big Data – For the last 80 years we carefully planted the rows of crops, then knew what we harvested – Now the crops are planted by others; there is a huge amount – We have to harvest their crops and choose the best – Like the distillers at Johnny Walker, we have to produce a smooth, satisfying (unbiased) blend 24 Big Data Challenge to NSIs Many politicians and scientists say that we can rely on Big Data to answer statistical questions – Tracking the next flu epidemic – Understanding environmental exposures by measuring blood levels (metabolomics) and genetics (genomics) – Without careful weeding and blending the inferences will be useless and statistical reputations will be ruined We must make efforts to use administrative and other big data, but they cannot be viewed as stand-alone options to well done surveys 25 Big Data Challenge to NSIs (continued) One of the biggest impacts of Big Data has been to change the expectation on timeliness of data Transactional data not as good on the non-timeliness components of quality, but fits the demands of a 24-hour news mentality We need to figure out how to do surveys (and Censuses) quicker and for lower cost, or users will rely on Big Data without understanding what they are losing. Groves (20??) 26 Adaptive Design One part of addressing this concern is adaptive design (add ref) Basically the same idea as monitoring process variables that we argued for in Morganstein and Marker (1997) With real-time monitoring of how the data collection is progressing NSIs can react quickly to improve the quality – Response rates for certain key domains lower than expected, while higher in others, R-indicators (Schouten, Cobben, and Bethlehem, 2009) – Can follow-up resources (e.g. interviewer hours) be moved to get enough responses for all domains? – If these decisions are made sooner can we shorten data collection? – If we run initial data sets through editing/imputation programs • Identify problems that can be addressed before later respondents • Shorten the time period between the end of data collection and publishing findings 27 Adaptive Design (continued) To do this requires having sufficient paradata, on key steps and completely filled out It requires flexibility in staffing, interviewers changing assignments, and cross-training data processing staff None of these are the “traditional” model Management will have to break down these barriers to provide the system needed to take advantage of adaptive design from measuring process variables. Only by doing this can we continuously improve. 28 Developing World One clear area for increased efforts to improve quality is in developing nations – 25 years ago almost exclusively focused in the developed world – SRC Summer Program brought individuals to Ann Arbor for training, then sent them home Now: STATCAP, Southern Africa Young Statisticians, UN Handbook on Household Surveys, Reference Regional Strategic Framework for Statistical Capacity Building in Africa (Sanga, Dosso, and Gui-Diby, 2011) STATCAP is a World Bank program of grants and loans over $100 million – Over $50 million just for Indonesia – Goal is to change the structure of the NSIs to support higher quality work, training, IT, organization, skills, communication We still haven’t figured out how to keep newly trained statisticians in their country when higher pay and opportunities can be found in developed nations. But STATCAP has the potential to help, because it supports large international programs to help the entire economy 29 Conclusions and Recommendations Quality of NSIs is improving throughout the world Changing relative importance of accuracy and timeliness Support for Continuous Quality Improvement cannot be a passive statement Move emphasis from measuring to improving quality This requires re-focusing on process improvement Increase efforts to use admin and other Big Data, but generally not stand-alone options to well done surveys 30