The Science of Finance Case study: A DSL for normalization of financial data sets Software Development Automation 2014 Edwin Hautus Agenda — Problem Domain — First iteration — Second iteration — Lessons learned \ 2 Problem Domain We are a leading global diversified provider of financial information services 2003 Founded 3,000+ We help our customers reduce risk, improve operational efficiency and benefit from enhanced transparency Employees 3,000+ Customers 20+ Our customers include investment banks, hedge funds, asset managers, central banks, regulators, auditors, fund administrators and insurance companies Offices \ 3 Problem Domain Input Input Normalized Model Input Database Output \ 4 Problem Domain Normalization questions — Identification - is it the same object? — Field mapping - do these fields have the same semantics? — Structural changes – how can the data be transformed? \ 5 Problem Domain Workflow & Roles Data Analyst Input Specification Developer Normalization Specification Software \ 6 Problem Domain Problem Statement Define a framework to improve efficiency and quality of normalization \ 7 First iteration First iteration (2007) — Standardize writing of normalization specifications — Create a domain model to describe the data structures — Create a XML-based DSL to define mappings — Use a runtime engine to execute the mappings \ 8 First iteration Results — Good improvement in efficiency and quality — Applied in over 75 projects — The XML DSL is too technical for data analysts — Quite a few iterations required to make sure mappings meet the requirements — Mappings can become really complicated! \ 9 Second iteration Second Iteration (2013) — Have analysts write mappings directly — Create parsed language to replace XML (based on ANTLR) — Simplify business logic — Allow analyst to verify mappings with runnable tests before handover \ 10 Second iteration Field name mapping input Bond bond Instrument Bond isin isinCode Instrument instrument = from bond { isin = isinCode } \ 11 Second iteration Field mapping with simple transformation input Sedol sedol InstrumentType Sedol cfi cfiCode InstrumentType instrumentType = from sedol { cfi = cfiCode.substring(0,2) + “XXXX” } \ 12 Second iteration Field mapping via map input Anna anna Anna InstrumentType InstrumentType instrumentType = debtEquityCode cfi from Anna { cfi = debtEquityCodeToCfi[debtEquityCode] Map } Map debtEquityCodeToCfi = { "D" = "DBXXXX" "E" = "ESXXXX" >> "MMXXXX“ } \ 13 Second iteration Structural change: unfold input MarkitMap markitMap Company company = Company MarkitMap from markitMap { lei lei from lei { lei = leiCode Lei } leiCode } \ 14 Second iteration Structural change: fold input Company company MarkitMap Company MarkitMap markitMap = Lei lei lei from company { lei = Lei { leiCode leiCode = lei } } \ 15 Second iteration Structural change: fold list field input Instrument instrument Sedol Instrument Sedol sedol = details[] sedols[] from instrument { details = SedolDetails from sedols sedolCode { sedolCode = it actionIdentifier actionIdentifier = “I” } } \ 16 Second iteration Conditional mapping input Bond bond Instrument instrument = from bond { identifier if isin Bond Instrument ? { isin identifier = isin } cusip else { identifier = cusip } } \ 17 Second iteration Annotations input Bond bond Instrument Bond isin isinCode name name Instrument instrument = from bond { isin = isinCode @Manual name = name } \ 18 Second iteration Additional language features — Asserts — Functions — Imports — Comments \ 19 Lessons learned Lessons learned on DSL development — Using a syntax based approach is worth the effort — DSLs need to be periodically refined as you learn about the domain — Provide integrated testing solution \ 20 Lessons learned Key factors in adoption of the DSL by business users — Requires more effort from users in order to be precise — Provides feeling of empowerment — Immediate feedback loop through test functionality is crucial \ 21 mines data pools intelligence surfaces information enables transparency builds platforms provides access scales volume extends networks & transforms business. Disclaimer The information contained in this presentation is confidential. Any unauthorised use, disclosure, reproduction or dissemination, in full or in part, in any media or by any means, without the prior written permission of Markit Group Holdings Limited or any of its affiliates ("Markit") is strictly prohibited. Opinions, statements, estimates and projections in this presentation (including other media) are solely those of the individual author(s) at the time of writing and do not necessarily reflect the opinions of Markit. Neither Markit nor the author(s) has any obligation to update this presentation in the event that any content, opinion, statement, estimate or projection (collectively, "information") changes or subsequently becomes inaccurate. Markit makes no warranty, expressed or implied, as to the accuracy, completeness or timeliness of any information in this presentation, and shall not in any way be liable to any recipient for any inaccuracies or omissions. Without limiting the foregoing, Markit shall have no liability whatsoever to any recipient, whether in contract, in tort (including negligence), under warranty, under statute or otherwise, in respect of any loss or damage suffered by any recipient as a result of or in connection with any information provided, or any course of action determined, by it or any third party, whether or not based on any information provided. The inclusion of a link to an external website by Markit should not be understood to be an endorsement of that website or the site's owners (or their products/services). Markit is not responsible for either the content or output of external websites. Copyright ©2014, Markit Group Limited. All rights reserved and all intellectual property rights are retained by Markit.