GRCTC Working Paper December 2014 SEMANTIC TECHNOLOGIES FOR REGULATORY INTELLIGENCE IN THE FINANCIAL INDUSTRY Dr Tom Butler, Dr Elie Abi-Lahoud and Dr Angelina Espinoza Limon © GRCTC 1 Abstract The financial industry is undergoing major regulatory changes in all jurisdictions across the globe. The growth in the number and complexity of regulations is causing major problems for the industry. Extant governance risk and compliance (GRC) systems and traditional business intelligence (BI) systems are not serving the needs of the most systemically important sector for the global economy. Increasingly, the industry is looking to semantic technologies to understand complex regulations. This paper presents the results of an ongoing research initiative to develop semanticallyenabled regulatory intelligence capabilities to underpin regulatory compliance change management in the domain of anti-money laundering (AML). This ontology-based system enables innovative semantic tagging and querying of complex regulatory texts and associated rules which have been extracted, transformed and loaded into an RDF triple store. The design research methodology by which this is achieved is described, as is the progress in both the Design and Relevance Cycles of the design science research approach being adopted. The paper ends by describing future steps and significance of this research for regulatory intelligence and regulatory compliance change management initiatives. Introduction The financial crisis in 2008 had disastrous consequences for the world economy (Campbell, 2011). Regulators across the globe responded rapidly to this situation by instituting a raft of new regulations and strengthening existing regulations (Grant and Wilson, 2012). A recent commentary in The Economist outlined the current challenges facing the financial industry: “Bankers had hoped that, after seven years of penance for their part in the financial crisis, the end of wrenching overhauls forced by fierce new regulations might be nigh. But to their dismay, the regulators’ zeal is undimmed. Far from giving banks respite, they are toughening up old rules and devising new ones, perhaps heralding a new wave of restructuring.”1 Comparing US Banking Reform Dodd-Frank Act (2010) with previous legislation: it stands at 2,319 pages, while in comparison the Sarbanes-Oxley Act (2002) is just 66 pages. However, Dodd-Frank is being translated in over 400 regulatory rules which will fill over 40,000 pages of regulatory text which will be published across various Titles in the Code of Federal Register (CFR). Currently only 200 regulatory rules are published and the international financial industry is in considerable confusion. Taking Based on data published by Figure 1 illustrates the restrictions in Titles 12 and 17 of the CFR from 2010. In 2010, before Dodd-Frank, the US Code of Federal Regulations Titles 12 and 17 combined had a total of 343 restrictions. As of end 2012, the amount of restrictions increased to 3172. As large and onerous as Dodd-Frank is, similar increases the number and complexity of regulatory rules are to be found across all regulatory regimes, including the European Union. Thus, industry experts warn of a “looming train wreck in risk management, regulatory compliance and reporting” (Kendall, 2013). This is significant challenges for governance, risk and compliance (GRC) for organisations in the financial industry and for GRC vendors. 1 http://www.economist.com/news/finance-and-economics/21620225-big-banks-prayers-halt-newregulation-have-fallen-deaf-ears-no-respite : Accessed November 2014. 2 Figure 1. Financial Restrictions in the CFR 2010-2012 (Adapted from McLaughlin and Greene 2013) There is a dearth of information systems (IS) research on GRC for financial services; however, one study of note out comes from Kenneth Bamberger (2010), who drew on IS perspectives in his analysis of the failures in GRC practice and related information systems leading up to the crisis. Bamberger (2010, p. 706) concluded that GRC-related IS failures were due to “problems of translation…of both legal mandates and business understandings of risk into computer code and actionable controls.” This problem continues in that Kendall (2013) reports that “most of the largest banks understand that they not only have inadequate capabilities to address the regulations that have already been imposed, but that even more regulation is inevitable.” Our ongoing research supports her conclusion on another looming train wreck for the financial industry. Tangible evidence for this state of affairs is found in the significant fines being imposed by regulators across the industry for operational risk events such as anti-money laundering (AML), LIBOR rigging, fraud events, and, more recently, and the Forex rigging scandal in the UK. The Governance Risk and Compliance Technology Centre (GRCTC) was founded to conduct R&D on the use of semantic technolo- gies for governance risk and compliance in the financial industry. In late 2012, a number of Global Systemically Important Banks (GSIBs) and a leading GRC vendor to the financial industry identified operational risk and its sub-domain of AML risk, in particular, as areas that required urgent research attention. They argued that existing business intelligence and knowledge management systems (KMS) were not addressing their concerns and they therefore sought an innovative solution based on semantic technologies (cf. Declerck et al., 2007; Sheth,, 2005)), which they argued overcame the limitations of traditional IS for GRC. There is however a nascent body of research on IS support for requirements engineering in legal domains. Here researchers have proposed concepts and models of regulatory texts and have created conceptual models of laws and regulations (cf. Zeni et al., 2013). However, there is little evidence that such models or underlying concepts have been subjected to evaluation (Design Cycle) or been subjected to field test (Relevance Cycle), although the work of Zeni et al. (2013) on their GiausT looks promising. 3 However, the field of regulatory science is well developed in the life-sciences and the pharmaceutical industry in particular (Hamburg, 2011). Semantic technologies are being applied across the fields of medicine and pharmacology (Gomez-Perez et al., 2013), and more recently to automate regulatory compliance in pharmaceutical manufacturing (Sesen et al. 2010). More recently, this has given rise of the concept of regulatory intelligence (Badreddin et al., 2013), as traditional business intelligence tools and techniques do not address the specific challenges posed by the regulatory domain. This paper presents the findings of our design science research initiative on the development of semantic technologies to enable the regulatory intelligence capabilities to underpin regulatory compliance change management in the Financial Industry. The paper describes how regulatory ontologies are being developed at the GRCTC to enable regulatory texts to be queried in order to help GRC executives answer questions such as ‘What are the various restrictions in an individual instrument of legislation or a regulatory rule?’ Likewise its semantic technologies have the ability to query legislation and regulatory texts and identify obligations, derogations, exemptions, exclusions, and so on. These semantic technologies also enable “simpler” questions based on meta-data related to regulations, such as agency, enforcement type(s), dates, etc., to be answered. It is planned to develop regulatory intelligence systems based on this research in order to informate the development of governance policies, risk management strategies and compliance reporting and a new generation of regulatory compliance knowledge management systems (RKMS). The remainder of this paper is structured as follows. The following section describes our design science research approach. The third section describes our research in progress on our nascent R&D on regulatory compliance change management system. The final section describes on-going R&D towards the completion of this project. Design Science Research Approach The motivation for and research object of this study’s research in progress has been described. In positioning our research we look to Winter (2008, p. 471), who states that design research (DR) is aimed at “creating solutions to specific classes of relevant problems by using a rigorous construction and evaluation process.” Winter (ibid.) indicates that “design science reflects the design research process and aims at creating standards for its rigour.” We therefore classify our research-in-progress project as Design Research (DR) as it concurs with Winter’s (2008) conception of this type of research. The design artefacts being produced in this study include: (a) Constructs (i.e. concepts in an ontology); (b) Relationships between, and axioms that govern, these constructs; (b) Models (in Web Ontology Language (OWL2) represented in Protege); and (d) Methods (an approach to the construction of concepts, relationships, axioms and models). According to Hevner (2007) design science research should include: (a) a Design Cycle, which involves the essential activities of developing and evaluating the design artefacts and research processes; (b) a Rigor Cycle, which connects the design cycle with a knowledge base of scientific theories, experience & expertise, and meta-artefacts; and (c) a Relevance Cycle, which incorporates interactions between the environment of the problem domain and the core design activities (cf. Hevner et al., 2004). Each of these cycles were incorporated into our design science research.In the DR project described below, the Rigor Cycle was underpinned by Design Science (DS) theory based on Formalism (West, 2004), which adhered to the Bunge-Wand-Weber (BWW) Ontology (Wand and Weber, 1993, 1995, 2002), knowledge engineering principles, and in particular the formalisms underpinning the application of the Web Ontology Language (OWL2) published by the W3C. We also align our DR with standards published by the Object Management Group (OMG), particularly the Semantic of Business Vocabulary and Business Rules (SBVR) standard and the OMG and Enterprise Data Management (EDM) Council’ Financial Industry Business Ontology (FIBO) standard (Bennett, 2011, 2013). 4 The Relevance Cycle in this project involves regular feedback and demonstrations to GRC executives and GRC application vendors, as well as progress reports to the OMG’s Financial Domain Task Force, which consists on members of the OMG Ontology SIG, SBVR SIG and subject matter experts from the financial industry globally. The relevance of our theoretically informed DR project is therefore ascertained. We now outline our Design Cycle Activities. Regulatory Intelligence and Regulatory Compliance Change Management Systems Current solutions for regulatory intelligence and regulatory change management rely on highly expensive, labour-intensive analysis of legislative and regulatory text by subject matter experts. Several aspects of these time-consuming tasks could be automated using appropriate sematic technologies. The objective of this design research project is to leverage semantic technologies to assist subject matter experts (SMEs), be they lawyers or GRC officers or banking executives, in making sense of the wide and complex spectrum of legal documents, regulatory texts, and other rulebased sources in order to perform better regulatory change management, more effective governance policies, enhanced risk management, and relevant compliance reporting. More precisely, we are combining several semantically-informed techniques in a system that provides the capability to answer such important but technically elusive questions such as: What are the compliance imperatives (obligations, prohibitions etc.) in a regulation or rule and where do they appear? How can semantic technologies support regulatory change management? Our DR approach consists of creating and populating the Financial Industry Regulatory Ontology (FIRO) in OWL2 consisting of fundamental regulatory and domain concepts using a combination of text analytics techniques and subject matter expertise. OWL, or the Web Ontology Language, is the schema language, or knowledge representation (KR) language, of the Semantic Web. The resulting Knowledge Base is persisted in a Resource Description Framework (RDF) triple store. RDF or the Resource Description Framework is the data modeling language for the Semantic Web. All Semantic Web information is stored and represented in the RDF. The RDF triple store can then be queried using SPARQL, to answer questions such as the ones described above. SPARQL is the Sparql Protocol and RDF Query Language, the query language of the Semantic Web. It has is specifically designed to query data across various systems. Figure 1, illustrates the four phases of our DR methodology. Here several innovative techniques developed by the researchers and related semantic technologies are combined for the purpose of developing regulatory intelligence tools require for a working prototype of an RCMS. First, is the ontology engineering phase. Here SMEs create a regulatory vocabulary and a reuse it to capture the regulatory intent in a rulebook. The output of this stage is then used by applied by Knowledge Engineers, or Semantic Technologies Experts (STEs) as we term them, to create a family of formal ontologies. This phase is supported by the Semantics of Business Vocabulary and business Rules (SBVR), which is being applied using an innovative methodology developed at the GRCTC. 5 Figure 2. Phases of the Design Research on Enhance BI for RCMS The Financial Industry Regulatory Ontology (FIRO) contains the following ontology family members. FIRO-H contains highlevel concepts in the regulatory domain, such as prohibitions, obligations, derogations etc. These are aligned with the SBVR standard grammar. FIRO-S is an ontology that captures the semantics and structure description of legislative and regulative texts according to the Akoma Ntoso standard. FIRO-AML describes the concepts that captures the semantics and axioms that of the Anti-Money Laundering domain. FIRORCM is an operational ontology that captures the entire semantics and axioms of regulatory texts to guide the ontology population process. In the second, phase legal SMEs manually annotate using FIRO-RCM concepts a set of AML documents used as training for the automatic classification algorithms. In the third phase, several classification algorithms are executed in order to populate the ontology, or in other words, tag the regulatory text with concepts from the ontology. This places a semantic structure on such texts that did not previously exist. These are then stored in an RDF triple store. The final phase involves the design of an application that is used to query the content of semantically tagged regulatory texts stored in the knowledge base using a SPARQL endpoint. Regulatory Intelligence in Action Having developed the demonstrator in the first phase of the RCMS, we entered the DSR Relevance Cycle where the prototype application was demonstrated to executives from the financial industry and technology sectors. To explain this, we first illustrate the target text—The 2007 UK Money Laundering Regulation. Here in figure 3 we see an excerpt—Section 15 (tagged in FIRO-S) which has been tagged as an Obligation (FIRO-H), and which covers AML concepts such as Record-keeping, Customer Due Diligence and Ongoing Monitoring. 6 Figure 3 Example of a Structured Regulatory document containing Unstructured Regulatory Data Figure 4 illustrates the application query interface. In this case several prestructured queries are presented. These are submitted to the SPARQL endpoint which then returns the result. The queries have several parameters, such as to list Obligations with certain parameters attached. Obligations that relate to the AML concept of Customer Due Diligence, Monitoring, Reporting, and so on. The location of the Obligation is returned by default (e.g. Section, Sub-section, Page etc.). However, the actual text as in Section 15 above may also be presented. Figure 4 Querying the Semantically Enriched Text 7 Figure 5 Regulatory Intelligence Output in Excel Figure 5 illustrates the result of the query in an Excel spreadsheet for further analysis. Here all the AML categories which carry an Obligation are presented, as is their Section and Sub-section. The text describing the obligation is next displayed. Feedback from the financial industry and technology sector on the first phase prototype was extremely positive. With minimum reTraining the application was applied to the US Bank Secrecy Act, 31 CFR B Chapter X, which covers AML. We were more than please with the query results, which contained a remarkable number of positive and accurate results. This bodes well for the future uptake of our research by industry. Future Work This short working paper described the implementation of an approach to semantic tagging of regulatory documents for the purpose of regulatory change management in the financial industry. Early results are promising, We are currently designing and developing a set of user interfaces for interactive data curation by SMEs. Extending the role of SMEs beyond the preparatory phase and keeping them in the loop at every stage of the prototype execution is the aim of ongoing work. We have found that semantic technologies can be calibrated, thorough the application of domain-specific taxonomies and ontologies, to query (as opposed to simple word search) unstructured texts for specific categories of risk data. We argue that this is a significant development for regulatory intelligence, as vital risk data is often buried as unstructured facts in texts entries or memo fields in databases, Excel spreadsheets and so on. This creates significant problems for risk analysis and compliance reporting. Thus, the output of our research and development enables better regulatory intelligence throughout the regulatory compliance value chain as related working papers in this series outlines. 8 References Badreddin, O., Mussbacher, G., Amyot, D. Behnam, S.A., Rashidi-Tabrizi, R.. Braun, E., Alhaj, M. and G. (2013). Regulation-Based Dimensional Modeling for Regulatory Intelligence. In Requirements Engineering and Law (RELAW), 2013 Sixth International Workshop on, pp. 1-10. IEEE. Bennett, M. (2011). Semantics standardization for financial industry integration. In Collaboration Technologies and Systems (CTS), IEEE, 23-27 May 2011, 439-445. Bennett, M. (2013). The financial industry business ontology: Best practice for big data. Journal of Banking Regulation, 14(3-4), 3-4. Declerck, T., H.-U. Krieger, B. Kiefer, M. Spies and C. Leibold (2007). Integration of semantic resources and tools for business intelligence. International Workshop on Semantic-Based Software Development held at OPSLA 2007. Gomez-Perez, A., Martinez-Romero, M., Rodriguez-Gonzalez, A., Vazquez, G. and VazquezNaya, J. M. (2013). Ontologies in medicinal chemistry: current status and future challenges. Current topics in medicinal chemistry, 13(5), 576-590. Grant, W. and Wilson, G. K. (Eds.). (2012) The Consequences of the Global Financial Crisis: The Rhetoric of Reform and Regulation. OUP Oxford. Hamburg, M. A. (2011). Advancing regulatory science. Science, 331(6020), 987-987. Hevner, A. R. (2007). The three cycle view of design science research. Scandinavian journal of information systems, 19(2), 87-92. Hevner, A.R. March, S.T. and Park, J. (2004). Design Science in Information Systems Research, MIS Quarterly, 28(1), 75 – 105. Hindmoor, A. and McConnell, A. (2013) Why Didn't They See it Coming? Warning Signs, Acceptable Risks and the Global Financial Crisis. Political Studies DOI: 10.1111/j.14679248.2012.00986.x. Kendall, E. (2013). Semantics in Finance: Addressing Looming Train Wreck in Risk Management, Regulatory Compliance and Reporting. Semantic Technology and Business Conference, Oct 2-3, 2013. http:// semtechbiznyc2013.semanticweb.com/ sessionPop.cfm? confid=76&proposalid=5402 KPMG (2012). The Convergence Evolution : Global survey into the integration of governance, risk and compliance http:// www.kpmg.com/ES/es/ ActualidadyNovedades/ ArticulosyPublicaciones /Documents/TheConvergence-Evolution.pdf McLaughlin, P. and Greene, R. (2013). DoddFrank: What It Does and Why It’s Flawed, Mercatus Center, George Mason University. Sartor, G., P. Casanovas and M. Biasiotti (2011). Approaches to legal ontologies: theories, domains, methodologies, Springer. Sesen, M. B., Suresh, P., Banares-Alcantara, R. and Venkatasubramanian, V. (2010). An ontological framework for automated regulatory compliance in pharmaceutical manufacturing. Computers & Chemical Engineering, 34(7), 1155-1169. Sheth, A. (2005) Enterprise Applications of Semantic Web: The Sweet Spot of Risk and Compliance. Invited paper: IFIP International Conference on Industrial Applications of Semantic Web (IASW2005), Jyvaskyla, Finland, August 25-27, 2005. http:// www.cs.jyu.fi/ai/OntoGroup/IASW-2005/ Tudorache, T., Nyulas, C., Noy, N. F., and Musen, M. A. (2013). WebProtege: a collaborative ontology editor and knowledge acquisition tool for the Web. Semantic Web, 4(1), 89-99. Wand, Y. and Weber, R. (1993). On the ontological expressiveness of information systems analysis and design grammars. Information Systems Journal, 3(4), 217-237. Wand, Y. and Weber, R. (1995). On the deep structure of information systems. Information Systems Journal, 5(3), 203-223. Wand, Y. and Weber, R. (2002). Research commentary: information systems and conceptual modeling—a research agenda. Information Systems Research, 13(4), 363-376. West, D. (2009). Object thinking. O'Reilly Media, Inc.. Winter, R. (2008). Design science research in Europe. European Journal of Information Systems, 17(5), 470-475. Zeni, N., Kiyavitskaya, N., Mich, L., Cordy, J. R. and Mylopoulos, J. (2013). GaiusT: supporting the extraction of rights and obligations for regulatory compliance. Requirements Engineering, 1-22. DOI 10.1007/s00766-013 -0181-8. 9 About the Authors Tom Butler, PhD, is Principal Investigator of the Financial Services Governance Risk and Compliance Technology Centre (GRCTC). With funding of 5 million euro from the Irish Government, the GRCTC conducts research on the design, development and implementation of semantic technologies for governance, risk and compliance (GRC) in the financial industry globally. Tom has 111 publications since joining academia in 1998. He is currently ranked 33rd out of the top 100 Association for Information Systems (AIS) Senior Scholars and researchers globally. Elie Abi-Lahoud, PhD, has designed innovative technologies for enterprise solutions. He plays a key role at the GRCTC which is engaged in applying semantic technologies for GRC in financial services. In this role, Elie works with the Object Management Group (OMG), the Enterprise Data Management Council (EDMC), and thought leaders in the financial industry, on a common vocabulary capturing shared domain understanding and on improving regulation-aware decision-making. Angelina Espinoza Limón, PhD, is a Visiting Professor at the GRCTC. A former Software Engineer, Angelina conducts research on the design, development and implementation of semantic technologies for regulatory compliance in the financial industry. Here she builds on her experience in developing RDF/RDFS and OWL ontologies and business rules for supporting semantic interoperability for the Smart Grid. Angelina has over 27 publications. © GRCTC 10