6 & 7 May 2010 Committee for the Coordination of Statistical Activities Conference on Data Quality for International Organisations Data Quality Management for Securities – the case of the Centralised Securities DataBase Francis Gross DG Statistics, External Statistics Division Financial statistics based on micro data • Statistics based on individual securities data – Micro-data generated from business processes – Classifications by statisticians • Benefits – Flexibility in serving event-driven policy needs, near time – Ability to drill down, linking macro- to micro-issues • Tool: the Centralised Securities Database (CSDB) – Holds data on nearly 7 million securities – Is in production with 27 National Central Banks online Securities data Issuer data Holdings data * Macro-statistics on “who finances whom” and “how” * aggregated by economic sector and country of residence 2 Centralised Securities Database (CSDB) • CSDB provides consistent and up-to-date reference, income, price and volume data on individual securities • CSDB is shared by the European System of Central Banks (ESCB) • CSDB is intended to be the backbone for producing consistent and harmonised securities statistics • CSDB plays a pivotal role in s-b-s* reporting as the reference database for the ESCB & associated institutions * s-b-s: security by security 3 Main features of the CSDB • Multi-source system For coverage and quality: data from several providers (5 commercial data providers, National Central Banks (NCBs)) • Daily Update frequency 2 million price- & 1 million reference data records per day • Automated construction of a “golden copy” Data on an instrument are grouped using algorithms; most plausible attribute values are selected • Data Quality Management (DQM) Network Staff from all NCBs contribute to DQM Through human intervention and increasingly systems, to Check “raw” data and validate “golden copy” results 4 Data quality – two pillars • Data quality managers face two critical problems: – What data is correct? Where to look for the “truth”? (verify vs prospectus, internal databases, Google…) – How can dubious data be identified? • The data quality of the CSDB depends heavily on: Data Quality Management • • • • Tackle issue at NCB DOWNSTREAM Individual securities Decentralised process Data Source Management • • • • Tackle issue at source UPSTREAM Preferably in bulk Centralised process 5 Metrics to steer and prioritise DQM Strategy: identification, quantification, prioritisation • Problem: no access to benchmark data • Macro metrics: – distribution of different reference data attributes (eg price, income) • Micro Metrics: – inter-temporal comparison (stability index, concentration change) – consistency checks – can drill down to level of individual security 6 Example: metric for change in country / sector • The system calculates indices of stability in country / sector for the relevant group of securities between t0 and t1 • An index for a country / sector pair below 1 shows change in the group of securities (country or sector or both) • 2 indices are calculated: from the perspective t0 (sees leavers) and from t1 (sees joiners) in a country/ sector group T0 NL/S.11 Security 1 t0 perspective leaves (Laspeyres concept) sector/ covers leavers but no country joiners T1 NL/S.11 Security 2 Security 3 stays stays joins sector/ country Security 2 Security 3 Security 4 Join both concepts Fisher index = t1 perspective (Paasche concept) covers joiners but no leavers index Laypeyres index Paasche 7 19 instruments, of which 14 kept their issuer identifier (CH_IDENT = 1) 8 CSDB Data Quality Management Made us face a choice: either SYSYPHUS or CHANGE, Change beyond Statistics 9 Finding data quality further upstream Data Quality Management & defaulting Compounding a costly process Prospectus “perfect”, public data source, but no common language costly Candidates for compounding, a costly collection Commercial data sets error prone, selective, costly production, duplicate efforts, proprietary formats Compound first shot Duplication and nonstandardisation in the very data generation process hamper the whole downstream value chain. Golden Copy after DQM & defaulting, still not back to perfect - and not standard. 10 Where to start? The first layer of data out of reality. Its generation process matters 15 Data capture drives IT output quality • Once good data is in the system, processing can work well. • Data capture from the “real” world is the key step. • Once lost at capture, information in data is lost. • No “data cleaning” will help: data must be captured again. • Messy data capture at source is very expensive downstream: – Most applications perform badly – “Data cleaning” and fixing failed processes are costly for all – Processes and IT must be designed in complicated ways Large scale IT processing can be simple and cheap when data fulfils the programmers’ quality assumptions. Messy data capture delivers “garbage in, garbage out”. 16 Progress is on its way 19 “…a standard for reference data on securities and issuers, with the aim of making such data available to policymakers, regulators and the financial industry through an international public infrastructure.” (J.C. Trichet, 23.2.09) 20 The industry expresses demand for a Utility • Industry panel at Conference 15 Feb 10 in London: – “An international Utility for reference data has its place, but – Keep it simple, (concept of a “Thin Utility”) – Ask industry to design the standards (ISO does exactly that) and – Give us the legal stick” A viable reference data infrastructure benefits from constructive dialogue. 25 From browsing to farming for data: The long way to standardisation 26 Climbing the stairway to action Build into data ecosystem Design a legal framework Imagine solutions addressing legacy Accept the issue among priorities Build the business case with all stakeholders Imagine a feasible way; accept that way as useful Understand dynamics of standardisation Understand basic data as a shared strategic resource Understand how basic data is generated Understand the role of data as a necessary infrastructure Business leaders, Policy makers, Regulators & Legislators now embrace the dialogue with the Data Community 27 “Thin” Utility 38 “Thin Utility”: a unique, shared reference frame • • • • Two registers: one for instruments, one for entities Simple and light, complete and unequivocal Hard focus on identification and minimal description The shared infrastructure of basic reference data for: – – – – Data users in the financial industry Data vendors Authorities The Public • An internationally shared infrastructure of reference A “Thin Utility” provides the certainty of a single source on known, bare basics. 39 Two reference registers: the Thin Utility’s frame Register of entities Register of instruments • • • • unique identifier, key attributes, interrelations, classifications, 40 Two reference registers: the Thin Utility’s frame Register of entities Register of instruments • • • • • unique identifier, key attributes, interrelations, classifications, electronic contact address 41 The Utility grows from a quickly feasible base Register of entities Register of instruments For both registers: • begin with feasible scope, • grow over time by • adding instruments & entities, • adding attribute classes, • driven by demand • from industry and authorities, • and by feasibility 42 The international aspect 52 Global Utility vs. National Law: an option International Community e.g. G20: • Discusses new regulatory framework for financial markets • Defines principles / goals for data National Constituency National Legislator National Authority Entity Issues law: • mandates national authority • empowers it to enforce the process and • to farm out operations to an international entity. • (EU issues specific EU law) International Institutions Governance of the Operational Entity • Global „tour de table“ (IMF, BIS, industry, etc.) • Establishment of Int’l Operational Entity • Seed funding of Int’l Operational Entity ? International Operational Entity Service agreements with national authorities Utility Executes the legal mandate: • Farms out operations to the Int’l Operational Entity • Monitors compliance • Enforces, applies sanctions Complies with national law: • Delivers and maintains data in the Utility as required, possibly using services. Runs the service: • Collects data • Distributes / sells data • Certifies analysts • Monitors compliance • Informs national authorities • Releases new standard items Standards College Ideally, ISO-based Develops/maintains standards: • Designs initial standards • Monitors market developments • Steers evolution of standards • Designs new standard items 53 Positioning and Design 56 Positioning in the data supply chain Competitive market Issuer Policy makers Regulators Utility Public Option: Utility as Tier 1 CDP Initially, the downstream supply chain remains untouched, except for quality: data users don’t need to invest Data User 59 Utility value chain: monopoly vs. competition Value chain Standards: - design - setting Analyst training Analyst certification Data production Data distribution: - primary - secondary Organisational model Multilateral (ISO?) Monopoly Competition Monopoly Competition Monopoly Competition Each stage of the value chain should be given the most efficient organisation 61