Developing Metadata Standards in an Integration Project at Statistics Canada United Nations Economic Commission for Europe Workshop on International Collaboration for Standards-Based Modernization (Geneva, Switzerland 5-7 May 2015) Governing and Maintaining Statistical Standards Judy Lee Enterprise Statistics Division I Statistics Canada 1 Statistics Canada’s Corporate Business Architecture Corporate Context Corporate Business Architecture launched in 2010 Objectives: Efficiencies, Quality, Robustness Efficient , Robust, and Responsive business architecture Reduce operating cost, enhance quality assurance, improve responsiveness Maximize re-use; eliminate re-work: metadata driven systems Review of Statistics Canada’s Business Statistics Program Develop generalized model for producing business statistics Shared and generic corporate services and systems for collecting, processing, disseminating and storing statistical information. Global optimum supersedes local preferences 2 Integrated Business Statistics Program IBSP Objectives and Business Outcomes Generic model with improved quality and coherence across programs Robust Infrastructure; Reduce Cost and maintenance Flexible to respond to client needs; Reduce respondent burden IBSP Key Components Maximum Use of Tax Data; Content Harmonization; Electronic Questionnaire as the primary mode ; Two-phase sampling; Rolling Estimates; Common Metadata Driven Tools and Generalized Systems; Top Down Analytical Approach IBSP Scope, Partners, and Dependencies Over 90 existing business surveys covering manufacturing, services, retail, agriculture, capital expenditure, energy and research and development; financial and ad hoc surveys Partners: 8 subject matter divisions + 8 service provider divisions IBSP project of development and integration 3 IBSP Metadata Value Proposition IBSP Metadata Objectives and Business Outcomes More efficient, flexible tools; reduction of manual intervention Coherence from questionnaire development to processing to dissemination More harmonized data and metadata definitions Ensures consistently applied standards and structure Shared metadata repository(s) across systems and partners Promotes uptake / integration of large volume of surveys Facilitates training, maintenance and knowledge transfer Aids in development of common information model and the Enterprise Architecture Integration Platform (EAIP) 4 IBSP Variable Naming Framework Goals and Expected Outcomes: Standardization, Coherence, Usability Consistent, coherent, and logical naming framework One variable to many question text/wording; Root variables and sub-variables Logical generation of cell numbers from variable names Naming Structure by Variable Type Statistical Variables: Anchored on Statistical Concepts to measure; Process Control and Design Variables: Anchored on GSBPM Identification Variables: Anchored on level of statistical activity and “What it identifies” Derived Variable – A statistical variable with a formula Transformed Variables – Transformed Variable Code set 5 IBSP Variable Semantic – Deconstructing Variable 6 IBSP Statistical Concepts 7 IBSP Statistical Concepts Business Attribute Financial Economic Social Physical Administrative Disposition Labour Resource Use Business-Activity Asset Capital Expenditure Value Added Population Business-performance Equity Supply Business-Size Expense Adjustment Business-Structure Liability Product Profit-loss Input Output Client of Business Revenue Geographic-location Disposals Funding Organization Performance Use 8 Statistical Variable Naming Convention - Structure CONCEPT (PRIME WORD) ShortEnglishName revenue sales goods service revenue rent leasing revenue commission revenue subsidy revenue royalty revenue dividend revenue interest revenue other revenue description other revenue total sum CLASS WORDS (Chronology, Measurement, Identification, Text) Mnemonic rvSlsGdSrv rvRntLse rvCmsn rvSbsdy rvRylty rvDvdnd rvIntst rvOth rvDscOth rvTtlSm Modifiers (“Last”, “First”, ) cell_Number Question F43008 F45801 F45701 F47101 F47201 F51101 F51201 F51301 F51302 F40000 1. Sales of goods and services Other (“By” Classification) 2. Rental and leasing revenue (report only if this is a secondary 3. Commission revenue (report only if this is a secondary source 4. Subsidies (including grants, donations and fundraising) 5. Royalties revenue 6. Dividends revenue 7. Interest revenue 8. Other revenue 8. Other revenue (please specify) 9. Total revenue (sum of lines 1 to 8) 9 Statistical Variables – Examples of Cell Number Ranges Concept Financial revenue expense profit-loss capital-expenditure disposal Business Attribute administrative business activity business performance business size business structure commodity service geographic location From To F40000 F60000 F70000 F80000 F85000 F59999 F69999 F79999 F84999 F89999 B00000 B05000 B10000 B20000 B30000 B40000 B50000 B05999 B09999 B19999 B29999 B39999 B49999 B59999 10 Results to Date – Statistical Variables and Sub Variables by Concept Concepts administrative adjustment asset business-activity business-performance business-size business-structure capacity-utilization capital-expenditure client-of-business Disposal Disposition Equity expense geographic-location input liability net-profit performance-use profit-loss product resource-use revenue supply Grand Total Wave 1 Variables 64 Wave 1 Sub Variables 114 Wave 1 Total 178 35 10 16 1 72 16 2 107 26 16 3 36 673 709 6 312 318 120 4 89 26 209 30 2 105 2369 2 2474 451 256 707 850 3929 4779 Wave 2 Variables 1 56 12 100 Wave 2 Sub Variables 21 338 0 160 Wave 2 Total 22 394 12 260 9 3 2 26 11 13 75 4 53 119 22 78 6 79 130 514 2 82 26 4 7 1 6 7787 0 743 727 418 0 0 149 8301 2 825 753 422 7 1 155 86 3 46 409 1406 908 4 290 4289 16098 994 7 336 4698 17504 Grand Total 200 394 12 367 26 38 81 6 788 130 318 8301 2 1034 783 422 7 1 155 2 3468 7 1043 4698 22283 11 12 IBSP Content Metadata Outcomes IBSP Content Modular approach to Content Harmonization IBSP Variable Naming Framework Statistical Variables systematically generated based on Statistical concepts Content Metadata stored in One Relational Database Variables Names, Cell Numbers, Mnemonics, Question Texts, Response Sets Promotes coherence, searchability, harmonization and delivery Delivery of Content Metadata: Automated delivery to other systems and databases: Collection, Processing, Integrated Metadatabase 13 Conclusion and Next Steps 1. 2. 3. 4. 5. 6. Unprecedented opportunity to name variables consistently from collection to just before dissemination. Naming Framework has proven to be robust and expandable • Possible expansion of bandwidths for future waves • Will accommodate a total of almost 100 economic surveys Integration of naming functionality into core IBSP system for consolidation phrase Development, Implementation, and Governance centralized at project level. Strong governance and Change Management control Significant impact on usability, searchability, and interoperability 14