Presentation

advertisement
Developing Metadata Standards in an Integration
Project at Statistics Canada
United Nations Economic Commission for Europe
Workshop on International Collaboration for Standards-Based
Modernization (Geneva, Switzerland 5-7 May 2015)
Governing and Maintaining Statistical Standards
Judy Lee
Enterprise Statistics Division
I
Statistics Canada
1
Statistics Canada’s Corporate Business Architecture
Corporate Context
Corporate Business Architecture launched in 2010

Objectives: Efficiencies, Quality, Robustness


Efficient , Robust, and Responsive business architecture

Reduce operating cost, enhance quality assurance, improve
responsiveness

Maximize re-use; eliminate re-work: metadata driven systems
Review of Statistics Canada’s Business Statistics Program


Develop generalized model for producing business statistics

Shared and generic corporate services and systems for collecting,
processing, disseminating and storing statistical information.

Global optimum supersedes local preferences
2
Integrated Business Statistics Program




IBSP Objectives and Business Outcomes
 Generic model with improved quality and coherence across programs
 Robust Infrastructure; Reduce Cost and maintenance
 Flexible to respond to client needs; Reduce respondent burden
IBSP Key Components
 Maximum Use of Tax Data; Content Harmonization; Electronic
Questionnaire as the primary mode ; Two-phase sampling; Rolling
Estimates; Common Metadata Driven Tools and Generalized Systems; Top
Down Analytical Approach
IBSP Scope, Partners, and Dependencies
 Over 90 existing business surveys covering manufacturing, services, retail,
agriculture, capital expenditure, energy and research and development;
financial and ad hoc surveys
 Partners: 8 subject matter divisions + 8 service provider divisions
IBSP project of development and integration
3
IBSP Metadata Value Proposition

IBSP Metadata Objectives and Business Outcomes
 More efficient, flexible tools; reduction of manual intervention
 Coherence from questionnaire development to processing to
dissemination
 More harmonized data and metadata definitions
 Ensures consistently applied standards and structure
 Shared metadata repository(s) across systems and partners
 Promotes uptake / integration of large volume of surveys
 Facilitates training, maintenance and knowledge transfer
 Aids in development of common information model and the
Enterprise Architecture Integration Platform (EAIP)
4
IBSP Variable Naming Framework
Goals and Expected Outcomes:
Standardization, Coherence, Usability
 Consistent, coherent, and logical naming framework
 One variable to many question text/wording;
 Root variables and sub-variables
 Logical generation of cell numbers from variable names

Naming Structure by Variable Type





Statistical Variables: Anchored on Statistical Concepts to measure;
Process Control and Design Variables: Anchored on GSBPM
Identification Variables: Anchored on level of statistical activity and “What it
identifies”
Derived Variable – A statistical variable with a formula
Transformed Variables – Transformed Variable Code set
5
IBSP Variable Semantic – Deconstructing Variable
6
IBSP Statistical Concepts
7
IBSP Statistical Concepts
Business Attribute
Financial
Economic
Social
Physical
Administrative
Disposition
Labour
Resource Use
Business-Activity
Asset
Capital
Expenditure
Value Added
Population
Business-performance
Equity
Supply
Business-Size
Expense
Adjustment
Business-Structure
Liability
Product
Profit-loss
Input
Output
Client of Business
Revenue
Geographic-location
Disposals
Funding Organization
Performance Use
8
Statistical Variable Naming Convention - Structure
CONCEPT
(PRIME
WORD)
ShortEnglishName
revenue sales goods service
revenue rent leasing
revenue commission
revenue subsidy
revenue royalty
revenue dividend
revenue interest
revenue other
revenue description other
revenue total sum
CLASS WORDS
(Chronology,
Measurement,
Identification, Text)
Mnemonic
rvSlsGdSrv
rvRntLse
rvCmsn
rvSbsdy
rvRylty
rvDvdnd
rvIntst
rvOth
rvDscOth
rvTtlSm
Modifiers
(“Last”, “First”, )
cell_Number
Question
F43008
F45801
F45701
F47101
F47201
F51101
F51201
F51301
F51302
F40000
1. Sales of goods and services
Other
(“By”
Classification)
2. Rental and leasing revenue (report only if this is a secondary
3. Commission revenue (report only if this is a secondary source
4. Subsidies (including grants, donations and fundraising)
5. Royalties revenue
6. Dividends revenue
7. Interest revenue
8. Other revenue
8. Other revenue (please specify)
9. Total revenue (sum of lines 1 to 8)
9
Statistical Variables – Examples of Cell Number Ranges
Concept
Financial
revenue
expense
profit-loss
capital-expenditure
disposal
Business Attribute
administrative
business activity
business performance
business size
business structure
commodity service
geographic location
From
To
F40000
F60000
F70000
F80000
F85000
F59999
F69999
F79999
F84999
F89999
B00000
B05000
B10000
B20000
B30000
B40000
B50000
B05999
B09999
B19999
B29999
B39999
B49999
B59999
10
Results to Date – Statistical Variables and Sub Variables by Concept
Concepts
administrative
adjustment
asset
business-activity
business-performance
business-size
business-structure
capacity-utilization
capital-expenditure
client-of-business
Disposal
Disposition
Equity
expense
geographic-location
input
liability
net-profit
performance-use
profit-loss
product
resource-use
revenue
supply
Grand Total
Wave 1
Variables
64
Wave 1
Sub Variables
114
Wave 1
Total
178
35
10
16
1
72
16
2
107
26
16
3
36
673
709
6
312
318
120
4
89
26
209
30
2
105
2369
2
2474
451
256
707
850
3929
4779
Wave 2
Variables
1
56
12
100
Wave 2
Sub Variables
21
338
0
160
Wave 2
Total
22
394
12
260
9
3
2
26
11
13
75
4
53
119
22
78
6
79
130
514
2
82
26
4
7
1
6
7787
0
743
727
418
0
0
149
8301
2
825
753
422
7
1
155
86
3
46
409
1406
908
4
290
4289
16098
994
7
336
4698
17504
Grand
Total
200
394
12
367
26
38
81
6
788
130
318
8301
2
1034
783
422
7
1
155
2
3468
7
1043
4698
22283
11
12
IBSP Content Metadata Outcomes




IBSP Content
 Modular approach to Content Harmonization
IBSP Variable Naming Framework
 Statistical Variables systematically generated based on
Statistical concepts
Content Metadata stored in One Relational Database
 Variables Names, Cell Numbers, Mnemonics, Question Texts,
Response Sets
 Promotes coherence, searchability, harmonization and delivery
Delivery of Content Metadata:
 Automated delivery to other systems and databases: Collection,
Processing, Integrated Metadatabase
13
Conclusion and Next Steps
1.
2.
3.
4.
5.
6.
Unprecedented opportunity to name variables consistently from
collection to just before dissemination.
Naming Framework has proven to be robust and expandable
•
Possible expansion of bandwidths for future waves
•
Will accommodate a total of almost 100 economic surveys
Integration of naming functionality into core IBSP system for
consolidation phrase
Development, Implementation, and Governance centralized at
project level.
Strong governance and Change Management control
Significant impact on usability, searchability, and interoperability
14
Download