Big Data for Official Statistics*

advertisement

Big Data for Official Statistics*

Herman Smith

UNSD

10 th Meeting of the Advisory Expert

Group on National Accounts

13-15 April 2016, Paris

* Prepared by Ronald Jansen, UNSD

1

Drivers

o o o

Availability of automatically generated data in electronic format, such as mobile phone, social media, electronic commercial transactions, sensor networks, smart meters, GPS tracking device, or satellite images

Higher frequency, more granularity, wider coverage, lower cost for data collection

Modernisation of statistical production and services

2

Key messages

Big Data for core national statistics – for integrated economic, social and environmental policies

Big Data for agile statistics – for emergency issues

Big Data to keep official statistics relevant – private sector moves fast

Big Data as part of modernization of statistical

systems – new production processes and partnerships

Big Data to meet the data demand of the 2030

agenda – monitoring policies – “leave no one bend”

3

Big Data for Official Statistics

Benefits – Example of Social media data o o

Widespread use of social media, also in developing countries

Timely, high frequency and wide coverage o o

Great potential in tracking sentiments , such as consumer confidence

Potential use for tracking prices and outbreak of diseases, and useful in combination with other data, such as population census and geo-spatial data

Examples of Big Data projects

5

Examples 1:

Telenor Big Data project on Poverty prediction (SDG 1)

Among the major mobile operators in the world

Approaching 200 million mobile subscriptions

(e.g. in Bangladesh, India, Pakistan, Myanmar and

Thailand)

33 000 employees

Present in markets with 1.6 billion people

A team of 9 Data scientists

Collaboration partners at leading academic research institutions

• Bridge between academic research and all business units

Explore and develop new ways to utilize customer data across markets

6

Billions of data points collected each day

A number - Caller

Date & time

B number –

Receiving party

Type: Call, SMS,

Data, etc

Cell_ID: Location

Data volume

IMSI: SIM card

TAC: Handset

7

Introducing mobile phone data in

Poverty prediction

Survey data

• Telco surveys

• DHS

• PPI

Mobile phone data

• Basic phone usage

• Advanced phone usage

• Social Network

• Mobility

• Top-up

• Revenue

• Handset

PREDICTION

Satellite layers

Population

Aridity index

Evapotranspiration

Various animal densities

Night time lights

Elevation

Vegetation

Distance to roads/waterways

Urban/Rural

Land cover

Pregnancy data

Births

Ethnicity

Precipitation

Annual temperature

Global human settlement layer

• # poor per km 2

• Prediction maps

8

Introducing mobile phone data in

Poverty prediction

Poverty Prediction map

Methods

1.

Spatial prediction

Bayesian geostatistical modelling

Prediction maps

2.

Individual classification using machine learning methods

RF

GBM

SVM

Deep learning

9

Example 2:

National Statistical Office of

Tunisia

Big Data project on Good Governance (SDG 16)

10

October 2015

BIG DATA and Monitoring

SDG 16 in Tunisia?

SOCIAL MEDIA as a BIG data source

Kamel ABDELLAOUI,

Direction de la diffusion , INS- Tunisie

Eduardo López-Mancisidor,

Programme des Nations Unies pour le développement - Tunisie

Analyzing Social media for SDG 16: Why?

Could social media data provide similar or new insights on public opinion to potentially complement or substitute household survey data?

Social media, WHY?

Internet users in Tunisia (in thousands)

6 000

5 000

 Free, public, easy access

 No privacy issues

 Express opinion

4 000

3 000

2 000

Opinions in here

1 000

0

2000 2002 2004 2006 2008 2010 2012 2014

Analyzing Social media for SDG 16: How?

Selecting sources

Taxonomy of keywords

Categ orising

Training

Exploring Analysing

Comparing Exporting

Analyzing Social media for SDG 16: Outputs

Volume

Data Sources

Sentiment

Word Cluster

Word Cloud

Example 3:

Statistics Canada linking

Google Maps with the Statistical

Business Register (SDG 9)

15

What can be gained from linking the

SBR with Geo-spatial Information?

Cross-sectional views of enterprise characteristics by (sub-national) regions:

• Are there regional patterns of economic activity?

• Are larger enterprises equally spread over the country?

Is FDI equally spread over the country?

16

Statistics Canada – Geolocation of SBR data

To study the potential of conducting economic analysis of small geographic areas by using Business Register (BR) microdata

Using BR data geocoded at the census subdivision

(CSD) level, in combination with travel distance data generated from the Google Maps API

The identification of resource sectors is based on the aggregation of business data at the CSD level from the BR

A database was created, containing: o

BR employment data, derived from payroll deduction files o o

BR revenue data, derived from the General Index of Financial

Information, and the six-digit North American Industrial Classification System

(NAICS) code from the Business Register.

17

Statistics Canada – employment by community & economic activity

11/04/2020 United Nations Statistics Division

18

GWG on Big Data for official statistics

19

United Nations Global Working Group on Big Data for Official Statistics o Created in March 2014 o 32 Members –

 22 Countries and 10 International Agencies

20

Global survey on Big Data Projects

21

Global survey on Big Data Projects

22

Thank you

23

URLs to websites

Telenor Research http://www.telenor.com/media/press-releases/2015/telenor-research-deploys-big-dataagainst-dengue/

Mexico - Business Register on Google Earth http://www3.inegi.org.mx/sistemas/mapa/denue/default.aspx

Geo-location of Business Register https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.42/2015/Session_III_Can ada_-_Geolocation_of_BR_data__room_document_.pdf

Global Survey on Big Data http://unstats.un.org/unsd/trade/events/2015/abudhabi/presentations/day1/04/UNSD%20-

%20Global%20Survey%20on%20Big%20Data.pdf

Big Data Quality Framework http://unstats.un.org/unsd/trade/events/2015/abudhabi/presentations/day3/01/3_Quality_Fra mework_Righiv3.pdf

24

URLs to websites

United Nations Statistics Division http://unstats.un.org/unsd/ http://unstats.un.org/unsd/dnss/QualityNQAF/nqaf.aspx

United Nations Statistics Division/ Trade Statistics Branch http://unstats.un.org/unsd/trade/default.asp

United Nations Statistical Commission http://unstats.un.org/unsd/statcom/commission.htm

United Nations Global Working Group on Big Data for official statistics http://unstats.un.org/unsd/bigdata/ http://unstats.un.org/unsd/trade/events/2014/Beijing/default.asp

http://unstats.un.org/unsd/trade/events/2015/abudhabi/default.asp

United Nations General Assembly Resolutions http://www.un.org/en/ga/70/resolutions.shtml

United Nations History Publications http://www.unhistory.org/publications/

25

URLs to websites

United Nations Sustainable Development https://sustainabledevelopment.un.org/ https://sustainabledevelopment.un.org/topics

United Nations Global Pulse http://www.unglobalpulse.org/

Project 8 http://demandinstitute.org/projects/project-8/

United Nations Data Revolution http://www.undatarevolution.org/

United Nations Statistics Division / SDG indicators http://unstats.un.org/sdgs/

United Nations Statistics Division/ Modernization of Statistical Systems http://unstats.un.org/unsd/nationalaccount/workshops/2015/NewYork/lod.asp

26

URLs to websites

United Nations Global Pulse http://www.unglobalpulse.org/

World Pop http://www.worldpop.org.uk/

Data Pop http://datapopalliance.org/

Flowminder http://www.flowminder.org/

UNU-EHS http://ehs.unu.edu/

Future Earth http://www.futureearth.org/

UProject http://ureport.ug/

27

Download