(Edwin Hautus) A DSL for normalization of financial data sets

The Science of Finance
Case study: A DSL for normalization of
financial data sets
Software Development Automation 2014
Edwin Hautus
Agenda
— Problem Domain
— First iteration
— Second iteration
— Lessons learned
\
2
Problem Domain
We are a leading global diversified provider
of financial information services
2003
Founded
3,000+
We help our customers reduce risk, improve
operational efficiency and benefit from
enhanced transparency
Employees
3,000+
Customers
20+
Our customers include investment banks,
hedge funds, asset managers, central banks,
regulators, auditors, fund administrators and
insurance companies
Offices
\
3
Problem Domain
Input
Input
Normalized
Model
Input
Database
Output
\
4
Problem Domain
Normalization questions
— Identification - is it the same object?
— Field mapping - do these fields have the same semantics?
— Structural changes – how can the data be transformed?
\
5
Problem Domain
Workflow & Roles
Data
Analyst
Input
Specification
Developer
Normalization
Specification
Software
\
6
Problem Domain
Problem Statement
Define a framework to improve efficiency and quality of normalization
\
7
First iteration
First iteration (2007)
— Standardize writing of normalization specifications
— Create a domain model to describe the data structures
— Create a XML-based DSL to define mappings
— Use a runtime engine to execute the mappings
\
8
First iteration
Results
— Good improvement in efficiency and quality
— Applied in over 75 projects
— The XML DSL is too technical for data analysts
— Quite a few iterations required to make sure mappings meet the
requirements
— Mappings can become really complicated!
\
9
Second iteration
Second Iteration (2013)
— Have analysts write mappings directly
— Create parsed language to replace XML (based on ANTLR)
— Simplify business logic
— Allow analyst to verify mappings with runnable tests before
handover
\
10
Second iteration
Field name mapping
input Bond bond
Instrument
Bond
isin
isinCode
Instrument instrument =
from bond
{
isin = isinCode
}
\
11
Second iteration
Field mapping with simple transformation
input Sedol sedol
InstrumentType
Sedol
cfi
cfiCode
InstrumentType instrumentType =
from sedol
{
cfi = cfiCode.substring(0,2) + “XXXX”
}
\
12
Second iteration
Field mapping via map
input Anna anna
Anna
InstrumentType
InstrumentType instrumentType =
debtEquityCode
cfi
from Anna
{
cfi = debtEquityCodeToCfi[debtEquityCode]
Map
}
Map debtEquityCodeToCfi =
{
"D"
= "DBXXXX"
"E"
= "ESXXXX"
>> "MMXXXX“
}
\
13
Second iteration
Structural change: unfold
input MarkitMap markitMap
Company company =
Company
MarkitMap
from markitMap
{
lei
lei
from lei
{
lei = leiCode
Lei
}
leiCode
}
\
14
Second iteration
Structural change: fold
input Company company
MarkitMap
Company
MarkitMap markitMap =
Lei lei
lei
from company
{
lei =
Lei
{
leiCode
leiCode = lei
}
}
\
15
Second iteration
Structural change: fold list field
input Instrument instrument
Sedol
Instrument
Sedol sedol =
details[]
sedols[]
from instrument
{
details =
SedolDetails
from sedols
sedolCode
{
sedolCode = it
actionIdentifier
actionIdentifier = “I”
}
}
\
16
Second iteration
Conditional mapping
input Bond bond
Instrument instrument =
from bond
{
identifier
if isin
Bond
Instrument
?
{
isin
identifier = isin
}
cusip
else
{
identifier = cusip
}
}
\
17
Second iteration
Annotations
input Bond bond
Instrument
Bond
isin
isinCode
name
name
Instrument instrument =
from bond
{
isin = isinCode
@Manual
name = name
}
\
18
Second iteration
Additional language features
— Asserts
— Functions
— Imports
— Comments
\
19
Lessons learned
Lessons learned on DSL development
— Using a syntax based approach is worth the effort
— DSLs need to be periodically refined as you learn about the domain
— Provide integrated testing solution
\
20
Lessons learned
Key factors in adoption of the DSL by business users
— Requires more effort from users in order to be precise
— Provides feeling of empowerment
— Immediate feedback loop through test functionality is crucial
\
21
mines data
pools intelligence
surfaces information
enables transparency
builds platforms
provides access
scales volume
extends networks
& transforms business.
Disclaimer
The information contained in this presentation is confidential. Any unauthorised use, disclosure, reproduction or dissemination, in full or in part, in
any media or by any means, without the prior written permission of Markit Group Holdings Limited or any of its affiliates ("Markit") is strictly
prohibited.
Opinions, statements, estimates and projections in this presentation (including other media) are solely those of the individual author(s) at the time
of writing and do not necessarily reflect the opinions of Markit. Neither Markit nor the author(s) has any obligation to update this presentation in the
event that any content, opinion, statement, estimate or projection (collectively, "information") changes or subsequently becomes inaccurate.
Markit makes no warranty, expressed or implied, as to the accuracy, completeness or timeliness of any information in this presentation, and shall
not in any way be liable to any recipient for any inaccuracies or omissions. Without limiting the foregoing, Markit shall have no liability whatsoever
to any recipient, whether in contract, in tort (including negligence), under warranty, under statute or otherwise, in respect of any loss or damage
suffered by any recipient as a result of or in connection with any information provided, or any course of action determined, by it or any third party,
whether or not based on any information provided.
The inclusion of a link to an external website by Markit should not be understood to be an endorsement of that website or the site's owners (or their
products/services). Markit is not responsible for either the content or output of external websites.
Copyright ©2014, Markit Group Limited. All rights reserved and all intellectual property rights are retained by Markit.