How much has Quality improved at National Statistical Institutes in

advertisement
How Much Has Quality Improved at National
Statistical Institutes in the Last 25 Years?
David A. Marker
Westat
DavidMarker@Westat.com
Presented at the 30th Anniversary of the Journal of Official Statistics Conference
Stockholm, Sweden
June 12, 2015
Obvious Answer: Yes
 Leadership Expert Group recommendations
 Biennial European Quality Conferences (Stockholm to
Vienna)
 STATCAP
 Some change in focus from product to process
(Morganstein and Marker, 1997)
2
On The Other Hand
 Quality is defined by our customers, and they are demanding more
 Timeliness expectations constantly shortening
 Increased use of administrative records; measuring their quality?
 Each change in Director General wants to do things “their way”
– Saebo (2014) pointed out a conflict with Deming’s constancy of purpose
 Big, but not representative data available quickly on the web – BLS
 Quick, cheap, possibly non-representative on-line panel “surveys”
 The relative importance of accuracy and timeliness seems to be
shifting
– Are we responding?
– Preliminary estimates
3
Key Points
 Changing relative importance of accuracy and timeliness
 Support for Continuous Quality Improvement cannot be
a passive statement
 Move emphasis from measuring to improving quality
 This requires re-focusing on process improvement
 Increase efforts to use admin and other Big Data, but
generally not stand-alone options to well done surveys
4
Quality Frameworks
Brackstone (1999)
SCB (2001)
OECD (2011)
5
Relevance
Accuracy
Timeliness
Accessibility
Content
Accuracy
Timeliness
Availability
& Clarity
Relevance
Accuracy
Timeliness
Accessibility
Interpretability
Coherence
Comparability
& Coherence
Interpretability
Coherence
Credibility
OECD (2011)
 Only difference from Brackstone - addition of credibility:
 Confidence by users is built over time
 Objectivity of the data
 Perceived to be produced professionally
– In accordance with appropriate statistical standards
– Policies and practices are transparent
 Data are not manipulated, nor their release timed in
response to political pressure
6
Quality Frameworks
Brackstone (1999)
SCB (2001)
OECD (2011)
ESS (2011)
UN (2012)
ONS (2013)
7
Relevance
Accuracy
Timeliness
Accessibility
Content
Accuracy
Timeliness
Availability
& Clarity
Relevance
Accuracy
Timeliness
Accessibility
Relevance
Accuracy
&
Reliability
Timeliness
&
Punctuality
Availability
& Clarity
Interpretability
Coherence
Comparability
& Coherence
Interpretability
Coherence
Coherence &
Comparability
Credibility
European Statistical System
 Quality not a passive statement
 Statistical authorities must “systematically and regularly
identify strengths and weaknesses to continuously
improve process and product quality”
 Not only requires an organizational structure for
managing quality, but also a focus on procedures to
monitor process quality
 “Results are analyzed regularly and senior management
is informed in order to decide [on] improving actions”
8
Cost
 One component missing from all these is Cost
 Opportunity cost measured in staff hours
– There are many more projects that every NSI can
undertake than they have resources for
– Freeing up resources will allow us to improve other
aspects of the NSI
 So cost should really be seen as a component of quality
as well
9
Some NSIs are Considering Cost
 ONS (2013): “Cost, performance and respondent
burden: These are important process quality
components that are not readily covered by the output
quality dimensions. There are invariably trade-offs
required between all of the output quality components
and cost, performance and response burden”
 OECD (2011) also mentions cost “which though is not
strictly speaking, a quality dimension, is still an important
consideration”
10
Measuring vs. Improving Quality
 Difference between NSIs focus on measuring quality of their
products and services, rather than continuous improvement
of quality
 The internal cost (in hours, Kronors, or Euros) to produce a
database or analysis doesn’t affect the quality to the user,
thus is independent of a user’s measure of its quality
 But the internal cost is vital to efforts to continuously improve
quality
 A serious mistake we let happen; released some of the
pressure that should have remained on quality improvement
11
How Do We Measure and Assure Progress?
 Internal and External reviews
–
–
–
–
Identify opportunities to improve
Opportunity to share best practices
Labor intensive
Internal requires a culture accepting of criticism and willing to learn/change
 A System for Product Improvement, Review, and Evaluation (ASPIRE)
– External reviews
– Formal structure
– Dependent on common external reviewers
 “ASPIRE is a system for assessing the risks of error
– from each potential source of error…
– rating progress that has been made to reduce this [sic] risks
– according to clearly specified evaluation criteria” (Biemer et al., 2014)
12
ASPIRE
 8 components of error:
–
–
–
–
–
–
–
–
13
Sampling
Frame
Nonresponse
Measurement
Data processing
Modeling/estimation
Revision
Specification
Revision Error
 Revisions are an excellent way to improve timeliness
 Addresses risk from Web-based Big Data
 Revisions provide a measure of “truth”
 Report the expected error of the preliminary number
– Captures non-sampling error sources as well as sampling
 Estimated errors are not only more accurate than others,
they can be a way to educate the press and public
14
ASPIRE Focus on Accuracy
 “Biemer and Lyberg (2003) viewed accuracy as the dimension to be
optimized in a survey while the other dimensions (the so-called user
dimensions) can be treated as constraints during the design and
implementation phases of production” Biemer et al. (2014)
 Elvers (2014) strongly disagreed with this as a general approach. So do I
 Accuracy should be one of the components of quality being optimized, not
the only component
 Going back to Brackstone (1999), “Accuracy is important, but without
attention to other dimensions of quality, accuracy alone will not satisfy
users”
 Lyberg (2012) “During the last decades it has become obvious that
accuracy and relevance are necessary but not sufficient when assessing
survey quality”
15
Estimating Risk
 ASPIRE provides a formal, but subjective, measure of risk
 ONS in early 2000s (Linacre 2002) used informal risk for prioritization to evaluate all
major statistical sources
–
–
–
–
World Class
Sound fundamentals but not World Class
Needed improvements for users, or
Information recognized as faulty for key users
 Identified sources for which “ONS reputation for quality seriously at risk”
–
–
–
New Earnings Survey had World Class estimates by area and industry
Estimates were subject to influence of outliers
No major redesign in 30 years put ONS’ reputation at risk
 This process helped ONS acquire major funding to overhaul its entire IT system to
eliminate many of the major risks
 Eltinge (2011) suggested Total Survey Risk as an alternative to Total Survey Error
16
ASPIRE - Risk
 ASPIRE has 5 components of risk:
–
–
–
–
–
Knowledge of risks
Communication with users and data suppliers
Available expertise
Compliance with standards and best practices
Achievement towards risk mitigation or improvement plans
 Nice system for focusing efforts on key products and year-to-year improvement
 “Measurement error had the highest average inherent risk of any error source. It also
ranked near the bottom in percent mitigated risk…Sampling error ranked the highest
in percent mitigated risk”
 So measurement error had lots that could go wrong, but little had been done to
address this, while lots had been done to reduce sampling error
 Like this focus on sampling error, could ASPIRE’s focus on accuracy over the other 7
components of error be related to its frequency of study?
 Is this like the drunk searching for his car keys under the lamppost?
17
Know Your Users
 Quality is defined in terms of its use so it is vital to understand your
users and their intent for your data
 Lyberg (2012) lamented how little is known about NSI’s data users
 Costa et al. (2014, same issue of JOS!) reports that Spanish users
don’t rate the quality components equally
 The relative weights vary across different types of users
– Accuracy and reliability was most important overall, but
– Users in central administration viewed it 3rd, after Coherence and
comparability, and Timeliness and punctuality
 Yet another example of why focusing on accuracy as first among
equals can be dangerous
18
Know Your Users (continued)
 Not only do users focus on different components, but different statistics
– Any particular user won’t care about all of the products of an NSI
– This complicates the process of understanding the user
– Producers of main statistical products should identify their primary users
and ask what quality characteristics are most important to them
 Government Statistics Division of the Census Bureau organized a
review by a panel of the U.S. National Academy of Sciences, on which I
served
– Division estimates 14% of US GDP
– Historically the only customer the Division focused on was the Bureau of
Economic Analysis (BEA)
– Wide range of other users: state and local governments, academics, others
– Division now holds a series of meetings with other users to gain their input
19
Documentation
 Clearly doing a better job of documenting strengths and weaknesses of products
 Don’t congratulate yourselves too much, there was no world wide web in 1990, so
almost by default we have better and more accessible documentation
 One of the first attempts was the Quality Profile (Brooks and Bailar, 1978)
 “About the Statistics” on the Stat Norway website provides a nice summary of many
of the important definitions, sources of error, etc.
 However, Saebo (2014) points out that much of the information is dated, and the lack
of complaints indicate most users probably don’t use the detail that is provided
 Many other NSIs probably have similar systems. Deciding how much detail to
provide, and how frequently to update, are areas we can all work on
 The European Statistical Commission website has a similar page “Statistics
Explained” http://ec.europa.eu/eurostat/statistics-explained/index.php/Main_Page.
This is very impressive looking, but I wonder if it suffers from the same concerns that
Saebo pointed out?
20
Organizational Leadership and Process
Improvement Focus
 Jan Carling was Director General of SCB in the 1990s
– Renowned for asking staff what quality improvement processes they were
working on
– Personal demonstration of a focus on improving quality understood throughout
the organization
– Never seen a better demonstration of how top managers can truly lead quality
improvement
– Are any of our current NSI managers demonstrating this personal level of
commitment?
 The European Leadership Expert Group (Lyberg et al., 2001) was incredibly
successful
– Every Eurostat country bought in to the need to improve quality
– Biannual meetings were held to share ideas , starting here in Stockholm in 2001
– Standardized procedures and harmonized definitions across Europe
21
Current Best Methods (CBMs)
 Some NSIs have developed Current Best Methods (CBMs)
– Conditions will change in the future, so they need regular updating
– This is difficult; we haven’t kept this going as well as we should at Westat either
– Training new staff in well-developed CBMs can produce major quality
improvements; at the very least by minimizing the re-occurrence of errors we
previously produced
– Lyberg (2012) pointed out how useful CBMs can be
– Described in more detail in Morganstein and Marker (1997)
 I believe the most important CBM we developed at Westat is on
communication between analysts (statisticians) and programming staff
– Clearly states inputs and outputs
– Encourages building in quality checks
– Provides vital reference documentation at the end of a survey to explain what
was actually done at earlier stages
– This CBM was originally developed almost 20 years ago, but I have introduced it
to new parts of Westat just this year, when questions about how best to
communicate were raised
22
Current Best Methods (continued)
 Lack of recent publication of examples may indicate that the
focus on process improvement has waned over the years
 Morganstein and Marker (1997) laid out how one moves from
–
–
–
–
A focus on product characteristics specified (hopefully) by users
To the key processes impacting their quality
How one measures whether those processes are under control
Only then determine if they are capable of producing the outputs
needed by the NSI
 I urge a re-focus on continuously improving quality
23
Big Data Changes the Role of NSIs
 Larry Brown (UPenn) at the 2015 Hansen Lecture suggested
there might be an agricultural analogy to Big Data
– For the last 80 years we carefully planted the rows of crops, then
knew what we harvested
– Now the crops are planted by others; there is a huge amount
– We have to harvest their crops and choose the best
– Like the distillers at Johnny Walker, we have to produce a
smooth, satisfying (unbiased) blend
24
Big Data Challenge to NSIs
Many politicians and scientists say that we can rely on Big
Data to answer statistical questions
– Tracking the next flu epidemic
– Understanding environmental exposures by measuring
blood levels (metabolomics) and genetics (genomics)
– Without careful weeding and blending the inferences will
be useless and statistical reputations will be ruined
 We must make efforts to use administrative and other big
data, but they cannot be viewed as stand-alone options
to well done surveys
25
Big Data Challenge to NSIs (continued)
 One of the biggest impacts of Big Data has been to
change the expectation on timeliness of data
 Transactional data not as good on the non-timeliness
components of quality, but fits the demands of a 24-hour
news mentality
 We need to figure out how to do surveys (and Censuses)
quicker and for lower cost, or users will rely on Big Data
without understanding what they are losing. Groves
(20??)
26
Adaptive Design
 One part of addressing this concern is adaptive design (add ref)
 Basically the same idea as monitoring process variables that we
argued for in Morganstein and Marker (1997)
 With real-time monitoring of how the data collection is progressing
NSIs can react quickly to improve the quality
– Response rates for certain key domains lower than expected, while
higher in others, R-indicators (Schouten, Cobben, and Bethlehem,
2009)
– Can follow-up resources (e.g. interviewer hours) be moved to get
enough responses for all domains?
– If these decisions are made sooner can we shorten data collection?
– If we run initial data sets through editing/imputation programs
• Identify problems that can be addressed before later respondents
• Shorten the time period between the end of data collection and publishing
findings
27
Adaptive Design (continued)
 To do this requires having sufficient paradata, on key
steps and completely filled out
 It requires flexibility in staffing, interviewers changing
assignments, and cross-training data processing staff
 None of these are the “traditional” model
 Management will have to break down these barriers to
provide the system needed to take advantage of
adaptive design from measuring process variables. Only
by doing this can we continuously improve.
28
Developing World
 One clear area for increased efforts to improve quality is in developing nations
– 25 years ago almost exclusively focused in the developed world
– SRC Summer Program brought individuals to Ann Arbor for training, then sent them
home
 Now: STATCAP, Southern Africa Young Statisticians, UN Handbook on
Household Surveys, Reference Regional Strategic Framework for Statistical
Capacity Building in Africa (Sanga, Dosso, and Gui-Diby, 2011)
 STATCAP is a World Bank program of grants and loans over $100 million
– Over $50 million just for Indonesia
– Goal is to change the structure of the NSIs to support higher quality work, training, IT,
organization, skills, communication
 We still haven’t figured out how to keep newly trained statisticians in their
country when higher pay and opportunities can be found in developed nations.
But STATCAP has the potential to help, because it supports large international
programs to help the entire economy
29
Conclusions and Recommendations
 Quality of NSIs is improving throughout the world
 Changing relative importance of accuracy and timeliness
 Support for Continuous Quality Improvement cannot be
a passive statement
 Move emphasis from measuring to improving quality
 This requires re-focusing on process improvement
 Increase efforts to use admin and other Big Data, but
generally not stand-alone options to well done surveys
30
Download