The Data Deluge: What Does It Mean for Official Statistics?

advertisement
United Nations Economic Commission for Europe
Statistical Division
The Data Deluge:
What Does It Mean for Official
Statistics?
Steven Vale
UNECE
steven.vale@unece.org
Contents




What is the Data Deluge?
Response of the official statistics
community
Industrialisation – a related challenge
Changing roles for national statistical
organisations
What is the Data Deluge?
The internet
has 1800
exabytes of
data in 2011
exa = 10^18
50,000 exabytes by 2020
27 fold
growth in
the next
9 years
We live in
exponential
times!
Are these data interesting?

Probably 99.9% are videos, photos,
audio files, text messages and other
nonsense

But that still leaves
1,800,000,000,000,000,000
bytes of potentially relevant data
An Observation
More and more people post information
about themselves on Facebook, but less
and less complete statistical surveys
Should we
create
STATVILLE?
Private sector competitors?

Google:
•
Real-time price indices
• Public Data Explorer
• First point of reference for the “data generation”

Facebook, store cards, credit agencies, ...
•

What if they link their data?
The private sector now understands the
value of data:
Can they beat us at our own game?
Official Statistics Response




High-Level Group for Strategic Directions in
Business Architecture in Statistics
UNECE group, created by the Conference
of European Statisticians in 2010
8 heads of national and international
statistical organisations
Develop and promote new:
Sources
Processes
Products
HLG-BAS Strategic Vision

Endorsed by the Conference of
European Statisticians on 14 June
We have to re-invent our
products and processes
and adapt to a changed
world
The Challenges are too big
for statistical organisations
to tackle on their own.
We need to work together
What does this mean in practice?

Collaboration

Coordination

Communication
Many international groups
and projects are talking
about streamlining and
industrialising statistics
Industrialisation is:





Common processes
Common tools
Common methodologies
Recognising that all statistics are
produced in a similar way:
No domain is “special”
Increased flexibility to adapt to new
sources and produce new outputs
Changing roles in NSOs?







One source = one output
Data integration – multiple sources
Process quality assurance
More focus on analysis and interpretation
Partnerships for dissemination
Changing staff and cost profiles
Changing organisational culture
Wider Definition of Admin Data?


New sources are “non-statistical”
But - similar issues to “traditional”
administrative data sources
Whatever we call the new sources,
we can’t ignore them!
Which new
sources are you
going to check
out?
Questions?
steven.vale@unece.org
www1.unece.org/stat/platform/display/hlgbas
Download