Data Virtualization & Information As A Service (IaaS) By Anil Allewar Senior Solutions Architect - Synerzip www.synerzip.com 1 About Me!! Anil Allewar Senior Solutions Architect @ Synerzip Technology Evangelist & speaker Core interests: JEE, EAI, EII Confidential www.synerzip.com 2 Agenda • Use cases • What does it mean? • Architecture explained • Implementation Frameworks • Demo • Questions? www.synerzip.com 3 Why it makes sense? Confidential www.synerzip.com 4 Use Cases Data Mart Data Warehouse ETL ETL Financial Data OLTP Data 3rd Party Data Custom Program ETL Web Service 1 Web Service 2 Legacy Data Excel files Confidential www.synerzip.com 5 Traditional Data Integration Business Applications Enterprise Information System ETL ETL Source System Confidential www.synerzip.com Source System 6 Problems with ETL More than 1 copy of data for staging Intermediate data => Errors Lead time to add new source Domain knowledge for mapping Batch Process => No real time data Confidential www.synerzip.com 7 Problems with DBMS consolidation Alternate approach => Single EIS (say RDBMS) Extensive changes to existing apps Might not satisfy everyone’s requiremets Confidential www.synerzip.com 8 Agenda • Use cases • What does it mean? • Architecture explained • Implementation Frameworks • Demo • Questions? www.synerzip.com 9 Data Virtualization & Federation Single API to access data Only metadata stored at virtualization layer Real time access without copying/moving data Confidential www.synerzip.com Federate data across hetero/homogenous sources 10 Data Virtualization Confidential www.synerzip.com 11 Agenda • Use cases • What does it mean? • Architecture explained • Implementation Frameworks • Demo • Questions? www.synerzip.com 12 Architecture User Application Common Access API Virtual Database Translator 1 Connector 1 Translator 2 Connector 2 RUNTIME & QUERY ENGINE Confidential www.synerzip.com 13 Agenda • Use cases • What does it mean? • Architecture explained • Implementation Frameworks • Demo • Questions? www.synerzip.com 14 Vendors • Commercial Products – Composite Software • http://www.compositesw.com/data-virtualization/ – Denodo • http://www.denodo.com/en/product/overview.php?n=h – IBM • http://www-03.ibm.com/software/products/en/ibminfofedeserv – Informatica • http://www.informatica.com/us/data-virtualization/ – Red Hat • http://www.redhat.com/products/jbossenterprisemiddleware/data-virtualization/ • Open Source – Jboss Teiid • http://teiid.jboss.org/ Confidential www.synerzip.com 15 Selected Platform – JBoss Teiid Open Source JEE standards Number of relational/NoSQL/E RP/CRM data stores Active & responsive community Confidential Add custom EIS support using JEE components Synerzip contribution: Defect discovery, root cause analysis, feature verification www.synerzip.com 16 Teiid Components • Virtual Database – container for components used to integrate data from multiple data sources • Source Models – structure and characteristics of physical data sources • View Models – structure and characteristics of abstract structures you want to expose to your applications • Teiid Designer – Eclipse based UI to dynamically discover data source objects and apply data federation – Generate virtual database from 1 or more sources Confidential www.synerzip.com 17 Teiid Components • Translator – Provides abstraction later between Teiid Query Engine and source system – Convert Teiid SQL commands to source specific execution commands – Convert result data from source system to Teiid specific format • Resource Adapter – Provides connectivity to the physical data source – Integration provided through Java Connector Architecture (JCA) API Confidential www.synerzip.com 18 Teiid – Supported EIS • • • • • • • • • • Amazon SimpleDB Apache Accumulo Apache SOLR Cassandra File Google Spreadsheet JPA LDAP Excel – as file SalesForce Confidential • JDBC – MS access, DB2, derby, excelodbc, greenplum, h2 , hive(for accessing Hadoop), oracle, teradata and most RDBMS • • • • • • www.synerzip.com MongoDB Object OData OLAP Web Services SAP Netweaver Gateway 19 Performance Characteristics • Access same data using Oracle and Teiid drivers No. of rows Vs Time: No Blobs 25,000 ms 20,000 15,000 10,000 Oracle-JDBC 5,000 Teiid-JDBC 0 No. of rows – Retrieval times comparable when accessing tables having no Blobs Confidential www.synerzip.com 20 Performance Characteristics No. of rows Vs Time: Blobs 30,000 25,000 ms 20,000 15,000 Oracle-JDBC 10,000 Teiid-JDBC 5,000 0 0 0 2 42 21,804 32,531 185,454 No. of rows – Confidential Teiid slower when accessing Blob data • Can be tuned www.synerzip.com 21 Agenda • Use cases • What does it mean? • Architecture explained • Implementation Frameworks • Demo • Questions? www.synerzip.com 22 Demo JDBC Client RDBMS Resource Adapter MongoDB Translator JDBC API Federated VDB mySQL Translator mySQL MongoDB Resource Adapter TEIID RUNTIME & QUERY ENGINE Confidential www.synerzip.com 23 Demo-Steps • Pre-requisites – mySQL server 5.5+ installed – MongoDB 2.4.x+ installed • Steps – Load the mySql and MongoDB database with sample data – Setup environment – JBoss, Eclipse – Create Teiid project in Eclipse using Teiid designer • Import source model using JDBC • Create the virtual model and federate data from the source model • Create a virtual database (VDB) and deploy to JBoss – Access data using JDBC client or through browser using OData Confidential www.synerzip.com 24 Demo – Scenario Federated Data Confidential www.synerzip.com 25 Demo – Connection Profile Confidential www.synerzip.com 26 Demo – Source Model Confidential www.synerzip.com 27 Demo - Source Model Generation Confidential www.synerzip.com 28 Demo – Map Source To View Confidential www.synerzip.com 29 Demo - Association Confidential www.synerzip.com 30 Demo – Data Federation Confidential www.synerzip.com 31 Demo – Source Code • Source code –https://github.com/Synerzip/JBossTeiid –Contains • Configuration files • Instructions • “How-to” videos • VDBs, source models and view models Confidential www.synerzip.com 32 Conclusion • Data Virtualization and Federation is a rapidly emerging technology that solves traditional BI/ETL problems. • It provides lower time to market, distributes data across the enterprise as a service and provides real time access to enterprise data. Confidential www.synerzip.com 33 Agenda • Use cases • What does it mean? • Architecture explained • Implementation Frameworks • Demo • Questions? www.synerzip.com 34 Contact Me • anil.allewar@synerzip.com Confidential www.synerzip.com 35 Questions? www.synerzip.com Hemant Elhence hemant@synerzip.com 469.322.0349 www.synerzip.com •36 84 Synerzip in a Nutshell 1. Software product development partner for small/mid-sized technology companies • • • Exclusive focus on small/mid-sized technology companies, typically venture-backed companies in growth phase By definition, all Synerzip work is the IP of its respective clients Deep experience in full SDLC – design, dev, QA/testing, deployment 2. Dedicated team of high caliber software professionals for each client • • • Seamlessly extends client’s local team, offering full transparency Stable teams with very low turn-over NOT just “staff augmentation”, but provide full mgmt support 3. Actually reduces risk of development/delivery • • Experienced team - uses appropriate level of engineering discipline Practices Agile development – responsive, yet disciplined 4. Reduces cost – dual-shore team, 50% cost advantage 5. Offers long term flexibility – allows (facilitates) taking offshore team captive – aka “BOT” option www.synerzip.com Our Clients www.synerzip.com Thanks! Call Us for a Free Consultation! Hemant Elhence hemant@synerzip.com 469.322.0349 www.synerzip.com