Graph Data Analytics Resolving Complexity at an Enterprise Scale Arka Mukherjee, Ph.D. Global IDs Arka.Mukherjee@globalids.com www.globalids.com Topics 1 2 3 Proprietary © 2013 Global IDs The “Complex Data” Context Current Challenges Governance Methodology 2 The “Complex Data” Context The Big Shift Proprietary © 2013 Global IDs 4 The cost structure is unsustainable The cost of managing information is going up exponentially. Proprietary © 2013 Global IDs 5 The Complexity growth is unmanageable 1. Complex data ecosystems Financial Services Institutions 2. Highly dynamic 3. Limited traceability 4. Systemic Risk : Hard to measure Proprietary © 2013 Global IDs 6 Question How can Enterprises handle the cost and complexity of managing complex data landscapes ? Proprietary © 2013 Global IDs 7 Global IDs Focus To organize enterprise data landscapes Proprietary © 2013 Global IDs 8 Global IDs: Product Suite Global IDs Software Products Metadata Objective 4 Embed Analytics 3 Accelerate Integration 2 Improve Quality 1 Create Transparency Governance Suite Function Master Data Enterprise Data Big Data Governance Suite Governance Suite Governance Suite Visualize 20 Dashboards and Infographics Link 19 Graph Databases with Linked Data Measure 18 KPIs and Trend Metrics Analyze 17 Reporting and Ad-Hoc Analysis Distribute 16 Data Services for Master Data Integrate 15 Integrated Master Data Standardize 14 Enriched Master Data Move 13 Data Repositories in Relational Databases or Hadoop Dashboards 12 Master Data Governance Portals Stewardship 11 RACI Matrix of Data Stewards Validation 10 Data Quality Metrics Rules 9 Rules Repository Monitor 8 Change Monitors, Impact Analysis Model 7 Master Data Models Search 6 Enterprise Search Map 5 Business Ontologies Classify 4 Business Taxonomies Profile 3 Semantic Metadata Repository Ingest 2 Inventory of External Data Assets Discover 1 Comprehensive Data Asset Inventory © Global IDs Inc. (2001-2013) Proprietary © 2013 Global IDs Deliverables Under Development Using Hadoop Stack 9 Challenges The typical Financial Institution’s Proprietary © 2013 Global IDs # Databases > 1000 # Tables > 200,000 # Columns > 2,000,000 11 Question How can we understand the relationships across 2,000,000 attributes? Proprietary © 2013 Global IDs 12 Converging Data Variety Data Content Structured Multi Structured Unstructured Proprietary © 2013 Global IDs 13 Converging Data Ecosystems Data Ecosystems Social Data Machine Data Enterprise Data Proprietary © 2013 Global IDs 14 Current Approaches do not Scale Small # Databases Proprietary © 2013 Global IDs > 1,000 Average > 10,000 Large > 100,000 15 A New Approach is Required Proprietary © 2013 Global IDs 16 5 Utilize Graph Structures for Governance Proprietary © 2013 Global IDs 17 Graph Analytics : Use Cases Key Challenges Proprietary © 2013 Global IDs • Vast diversity and volume of metadata and data • Storage and indexing of metadata to facilitate search and navigation • Understanding the connection between different pieces of metadata (Crosswalk) 19 Utilize Graphs Structures for Storing Complex Data Proprietary © 2013 Global IDs 20 Use Case 1: Enterprise Metadata Search with Hadoop Proprietary © 2013 Global IDs 21 Use Case 2: Unstructured Data Integration Proprietary © 2013 Global IDs 22 Use Case 3: Cross Database Similarity Mapping Proprietary © 2013 Global IDs 23 Use Case 4 : Graph Analytics Proprietary © 2013 Global IDs 24 Demo Methodology What we do 1. Scan 2. Analyze 3. Map / Organize 4. Govern Proprietary © 2013 Global IDs 27 Automation Proprietary © 2013 Global IDs 28 1 : Scan Proprietary © 2013 Global IDs 29 2 : Semantic Analysis Proprietary © 2013 Global IDs 30 3 Automate Semantic Mapping Proprietary © 2013 Global IDs 31 4 Link the Data Landscape Proprietary © 2013 Global IDs 32 Thank You!