M9-822

advertisement
ADVISORY EXPERT GROUP
BIG DATA
Statistics Canada
Outline



Big data and the National Accounts
Establishing the right infrastructure
Lessons learned: case studies from Statistics Canada





2
Traditional big data
Scanner data
Electricity consumption
Credit card and Interact
Remote sensing
Statistics Canada • Statistique Canada
Big data and the National Accounts

From a business perspective "Big data is high volume, high
velocity, and/or high variety information assets that require new
forms of processing to enable enhanced decision making,
insight discovery and process optimization.…. – (Gartner 2012)
Wikipedia

From an NSO perspective "Big data is high volume, high velocity,
and/or high variety information assets that require new forms of
processing to which could reduce respondent burden, increase
quality, develop new statistical products or enhance the detail
of existing statistical products…..…. – ????
3
Statistics Canada • Statistique Canada
Big data and the National Accounts
 Mich Couper from the University of Michigan’s’ Survey Research
Center sites the following limitations NSO will face when confronting
Big data:
•
•
•
•
•
•
•
•
lack of covariates in the datasets;
self-selection and self-reporting biases;
lack of stability;
privacy issues;
access issues;
opportunity for mischief;
size issues; and
selective reporting of results (file drawer problem).
 You could add to that
•
1.
4
Sustainability – data sources disappear, systems change, perceptions change.
Couper, Mick P., Is the Sky Falling: New Technology, Changing Media, and the Future of Surveys. (Presentation,
European Survey Research Association, 5th Conference, Ljubljana, Slovenia, July, 2013)
Statistics Canada • Statistique Canada
Big data and the National Accounts


There needs to be up-front acknowledgement that we are
trying to fit a square peg in a round hole….
The needs of business (big data to increase business
intelligence) and national accountants (big data to produce
comprehensive macroeconomic statistics) is quite different.
Dimensions of the data
Needs of National
Accountants
Needs of business
Scope of the dataset
Comprehensive
Limited to the needs of
the business
Use of the dataset
Produce meaningful
aggregate statistics
Find patterns, explore the
detail
Structure of the dataset
On-going, stable, regular
Structure can change as
required by the business
5
Statistics Canada • Statistique Canada
Putting in place the appropriate
infrastructure
 In order to determine how to best leverage big data NSO
needs to put in place the proper infrastructure to:
1.
2.
3.
4.
6
Obtain the data
Process the data
Evaluate the data
Integrate the data
Statistics Canada • Statistique Canada
Putting in place the appropriate
infrastructure – Obtaining the data



7
Use of legislation – e.g., Section 13 of Canada’s Statistics Act states that “A person
having the custody or charge of any documents or records that are maintained in any
department or in any municipal office, corporation, business or organization, from
which information sought in respect of the objects of this Act can be obtained or that
would aid in the completion or correction of that information, shall grant access
thereto for those purposes to a person authorized by the Chief Statistician to obtain
that information or aid in the completion or correction of that information.” 1970-7172, c. 15, s. 12.
Memorandum of understanding (MOUs) which outline:
• Roles and responsibilities
• Delivery mechanism
• Uses of data
• Termination of the agreement
Purchasing big data
• Many firms sell big data that can be used for business intelligence – it could also
be purchased for statistical purposes. Under what conditions and terms should
NSOs purchase big data?
Statistics Canada • Statistique Canada
Putting in place the appropriate
infrastructure – Processing the data

File transfer system - NSOs need a secure, high capacity file transfer system to
transfer data from the data provider to the NSO.

Storage and processing capacity - In most NSOs (especially NA divisions) the
processing capacity for big data does not exist.

Software - Statistics Canada is leveraging the SAS distributed computing solution
called “SAS Grid” to shorten the time needed to process and analyze its larger data
holdings. Also, the Data Analysis Resource Center at Statistics Canada maintains a
research computer with analytical software installed, offering a wide range of add-ons
that provide advanced analytical and visualization tools particular to big data
analytics.

Information management policies – Access, privacy, confidentiality, retention
8
Statistics Canada • Statistique Canada
Putting in place the appropriate
infrastructure – Evaluating the data
 Big data community of practice
• There needs to be a structure in place that allows analysts and
programs to gain knowledge and share experiences with respect to big
data, to engage with colleagues internally or externally when needed
and to report findings to senior managers when appropriate.
 Big data needs to be evaluated with respect to its:





Quality
Coverage
Timeliness
Detail
Regularity
 In order to leverage big data we need to develop a research and
development orientation.
9
Statistics Canada • Statistique Canada
Examples of big data research at Statistics Canada:
International merchandise trade statistics

Collection/access agreement: Access to detailed customs data is
governed by two memorandum of understanding: one with the
Canadian Revenue Agency and one with the U.S. Census Bureau
Cost: Nil
Dimensions: 1.5 Terabytes, 60 attributes
Uses: Balance of Payments, International Merchandise Trade
Statistics
Timeliness: 35 days following the reference period
Frequency: Daily, if required
Potential uses: Creating an importer and exporter characteristics file
which can be used to analyze the entry an exit of Canadian traders
within the Canadian economy, used in studies of globalization, global
production, goods for processing, foreign affiliate statistics.






10
Statistics Canada • Statistique Canada
Examples of big data research at Statistics Canada:
Taxation statistics

Collection/access agreement: Access to detailed taxation statistics
is governed by a memorandum of understanding with the Canada
Revenue Agency.
Cost: Approximately $1.6 million
Dimensions: 6 Terabytes and growing
Uses: Benchmark estimates of wages and salaries; output; property
incomes, taxes, etc.
Timeliness: Earliest use – 45 data following the reference period
Frequency: Mainly annual, some monthly (goods and services
taxation statistics)
Potential uses: Creation of a National Accounts longitudinal file—a
business level micro-data file that can be used to undertake studies
such as GDP by city, GDP by firm size, productivity by firm size.






11
Statistics Canada • Statistique Canada
Examples of big data research at Statistics Canada:
Government finance statistics







12
Collection/access agreement: No formal agreement in place –
institutional understanding between Statistics Canada and the
government jurisdictions.
Cost: Nil
Dimensions: 40 million financial transactions, 200 GB
Uses: Government Finance Statistics, government sector – National
Accounts
Timeliness: Earliest is 15 days following the reference period.
Frequency: Monthly, quarterly, annual
Potential uses: Local government remains a ‘survey of
municipalities’, access to electronic files will increase our ability to
provide CMA level data as well as increased revenue and
expenditure details. Potential data uses for the health, education and
justice programs.
Statistics Canada • Statistique Canada
Examples of big data research at Statistics Canada:
Electronic household transactions (credit and debit)







13
Collection/access agreement: Memorandum of understanding
outlining the roles and responsibilities of both Statistics Canada and
the data provider.
Cost: Nil
Dimensions: “Aggregated” big data - number of transactions, value
of transactions aggregated by merchant group by place of
transaction (domestic, international) by class of transactor (personal
or commercial).
Uses: Indicator for household final consumption expenditure and
international travel abroad
Timeliness: Earliest is 15 days following the reference period.
Frequency: Monthly
Potential uses: International travel services, monthly household final
consumption expenditure.
Statistics Canada • Statistique Canada
Examples of big data research at Statistics Canada:
Electronic household transactions (credit and debit)
Growth rates – household final consumption expenditure and
credit transactions (domestically acquired by residents)
8
7
% change
6
5
4
3
2
1
0
2008
2009
2010
2011
Credit Transactions domestically acquired by residents
Total Household Final Consumption Expenditure
14
Statistics Canada • Statistique Canada
2012
Examples of big data research at Statistics Canada:
Electronic household transactions (credit and debit)
Growth rates – credit transactions and household final
consumption expenditure (accommodation services)
12
10
8
6
% change
4
2
0
2008
2009
2010
2011
-2
-4
-6
-8
-10
Credit Transactions domestically acquired by residents (Accomodations)
Household Final Consumption Expenditure - Accomodation Services
15
Statistics Canada • Statistique Canada
2012
Examples of big data research at Statistics Canada:
Scanner data: vendor specific








16
Collection/Access Agreement: MOU in negotiation
Cost: Current costs are nil though the long-term approach being proposed would involve a quid
pro quo agreement where CPD would provide the company their data back with value added (i.e.,
an implicit cost would be borne by the division).
Dimensions: Sales, quantities, and item descriptions of all goods sold for a given store over a
given period
Uses: Consumer prices and household expenditure weights to feed the CPI
Timeliness: TBD, though potentially as little as a one day lag (e.g., weekly data for a given week
could be delivered on the first day of the following week).
Frequency: Initial data has been provided on a weekly aggregated basis. Future work will look at
daily and / or transactional level data.
Dataset size: For one week of sales data (aggregated on the week) for one store,
• roughly 4,000 KB
• roughly 30,000 rows (i.e., unique items sold)
• implies roughly 200MB for one year of weekly aggregated data for one store.
Potential uses moving forward: Direct input into the calculation of the CPI (potential
replacement for collected prices), studies on consumer behaviour, CPI weights, household final
consumption expenditures, retail sales.
Statistics Canada • Statistique Canada
Examples of big data research at Statistics Canada
Smart meter: household electricity consumption







17
Collection/access agreement: Two memoranda of understanding
with two regional electricity distributors
Cost: Nil
Dimensions: Roughly 200 GB of raw hourly electricity
consumption data have been obtained, providing detailed
information on approximately 120,000 customers, between the
years of 2008 to 2013
Uses: Household electricity consumption
Timeliness: Earliest is 15 days following the reference period.
Frequency: Hourly
Potential uses: Household final consumption expenditure, monthly
Gross Domestic Product’s utilities.
Statistics Canada • Statistique Canada
Examples of big data research at Statistics Canada
Smart meter: household electricity consumption
Total residential consumption
18
Statistics Canada • Statistique Canada
Examples of big data research at Statistics Canada
Satellite Imaging: Land Account
 Collection/Access Agreement: Public data
 Cost: Nil
 Dimensions: 20 GB. Although not apparent here, “dimension” of this
type of big data (which is not really big data, strictly speaking) may
well explode in the coming years. LIDAR datasets (high resolution
radar), as well as higher resolution (space and time) satellite data will
require terabytes of storage and “terahertz” of processing capacity.
 Uses: Land accounts: Land cover / land use change 2000 and 2010 2013
 Timeliness: 3 years lag
 Frequency: Annual
 Potential Uses moving forward: Landscape and freshwater
ecosystem accounts
19
Statistics Canada • Statistique Canada
Examples of big data research at Statistics Canada
Remote sensing: land use
20
Statistics Canada • Statistique Canada
Examples of big data research at Statistics Canada
Water Measurement Instruments: Water Account
 Collection/Access Agreement: Informal agreement with Water
Survey of Canada
 Cost: Nil
 Dimensions: Original WSC data is 5 GB; derived water yield data
is 90 GB
 Uses: Water accounts: Water Yield
 Timeliness: From real-time to lag of several years
 Frequency: Daily
 Potential Uses moving forward: Freshwater ecosystem accounts
21
Statistics Canada • Statistique Canada
Some lessons learned so far
1. Quid pro quo – is important when trying to obtain ‘big data’. Firms are
more willing to part with their ‘big data’ if you show them how they will
receive a ‘business intelligence’ benefit on their side.
2. Cost – ‘big data’ is not always the cheapest option. It is sometimes
easier to have the firm complete the survey than to create an
infrastructure to receive and process their data. For example, the data
received from local electricity providers is equivalent to the completion
of two questions on our current survey.
3. Classification systems – ‘big data’ does not follow any standard
classification system. For example, electronic retail transactions are
classified according to merchant groups rather than industries.
4. Big data aggregates – asking firms to aggregate their ‘big data’ is an
option.
5. Data formats – Need to work with new data formats that we are often
not familiar with.
22
Statistics Canada • Statistique Canada
Discussion point for the AEG
•
In order to exploit the potential of big data, NSOs need to make
significant investments. How can we leverage the work taking place
across various NSOs to minimize the investment and maximize the
return?
•
How do we promote the development of new data products using
big data over using big data to re-construct existing data products?
Do we adjust our frameworks to accommodate big data or do we
adjust big data to accommodate our frameworks?
23
Statistics Canada • Statistique Canada
Download