Open Source Business Intelligence Case 1 – Open Source BI in the Cloud Presented to TDWI Colorado Chapter February 2009 © OpenBI, LLC 2009 1 Discussion Topics, Part 1 • The Open Source Software Environment – Principles of Open Source – Commercial Open Source – Open Source Business Intelligence • Cloud Computing Introduction • Case Study 1 – Open Source BI on the Cloud – Architecture – ETL – OLAP/Reporting © OpenBI, LLC 2009 2 Business Intelligence in the Age of Collaboration “Billions of connected individuals can now actively participate in innovation, wealth creation, and social development in ways we once only dreamed of. And when these masses of people collaborate they collectively can advance the arts, culture, science, education, government, and the economy in surprising but ultimately profitable ways.” Don Tapscott and Anthony D. Williams © OpenBI, LLC 2009 3 Why Open Source Software? Commercial Open Source is changing the rules • Customer Control – Free, global access to software – Removal of license fee amortization – Annual “proof-of-value” for vendors • Lower Costs – < 50% of the software cost of proprietary alternatives • Better Technology – Modern, open architectures – Global innovation engine © OpenBI, LLC 2009 4 Free Open Source vs. Commercial Open Source Free Open Source differs from Commercial Open Source in a number of ways. • Free Open Source – – – – – Informal support and broad services providers Uneven velocity of change Community-directed roadmap Functional gaps Challenging licensing provisions • Commercial Open Source – – – – – – Formal support with service level agreements (SLAs) Indemnification Professional services and partnerships Product management and roadmaps, and advisory boards Business-friendly subscription models Reference accounts, cases studies, and user groups Source: Pentaho © OpenBI, LLC 2009 5 Commercial Open Source Revenue Streams There are several models for companies to generate revenue in the Open Source market. Subscriptions Cross Selling Education © OpenBI, LLC 2009 Consulting and Services 6 Open Source - Moving Up “The Stack” ERP CRM BI Database Web App Server & Portal Operating System • OSBI is today where Linux, Apache Tomcat, MySQL and other levels in the “stack” recently stood © OpenBI, LLC 2009 7 Sampling of OSBI Related Technologies Open Source Business Intelligence Technologies Business Intelligence Platforms & Technologies BIRT Project Database Platforms Statistical Analysis/Data Mining Software * * WEKA is part of the Pentaho BI Suite © OpenBI, LLC 2009 8 Pentaho and Jaspersoft Two Popular Business Intelligence Platforms The Pentaho suite offers a well integrated set of tools and components deployed through a comprehensive business intelligence server. The Jaspersoft suite offers a series of tools and libraries for integration into other applications, or as a standalone BI application. • Both platforms provide: – ETL – Reporting – OLAP – Dashboards • Each platform has factors that make it unique Graphics Source: Pentaho © OpenBI, LLC 2009 9 Discussion Topics, Part 1 • The Open Source Software Environment – Principles of Open Source – Commercial Open Source – Open Source Business Intelligence • Cloud Computing Introduction • Case Study 1 – Open Source BI on the Cloud – Architecture – ETL – OLAP/Reporting © OpenBI, LLC 2009 10 What Is “Cloud Computing?” • “Cloud computing is ondemand access to virtualized IT resources that are housed outside of your own data center, shared by others, simple to use, paid for via subscription, and accessed over the Web.” –John Foley, Information Week, September 2008 Seven Principles: Off-site Virtual On-demand Subscription Shared Simple Web-Based …or, use how Larry Ellison describes it: "idiocy," "crap," "gibberish," "crazy," and "stupidest" © OpenBI, LLC 2009 11 The Amazon Elastic Computing Cloud Powered By Open Source “If an economic downturn cools IT capital spending, some business technology managers may turn to rent-by-the-hour cloud computing resources… If they turn to Amazon EC2, they're tapping into open source Linux, Apache, and a tweaked Xen open source hypervisor that powers much of the company's cloud's operation.” Information Week, November 2008 © OpenBI, LLC 2009 12 Amazon Elastic Computing Cloud (EC2) • • • • • • • • Most well known “Cloud Computer” Allows customization of Amazon Machine Images that can be started and run on demand Different instance sizes from small 32-bit (1 CPU, 1.7GB RAM equivalent) to extra large 64 bit (8 CPU, 15GB RAM) or extra large 64 bit, high CPU (20 CPU, 7GB RAM) Runs varied operating systems (Linux, Windows) and charged on an hourly basis (Windows is 25-50% more expensive) Can attach persistent storage to an instance, charged by the GB Accessed via command line or web interface Some data charges apply for transfer in and out of Amazon Competitors: – IBM (Computing On Demand), Google (App Engine), AT&T (Synaptic), Microsoft (Azure) – Rackspace, Flexiscale, GoGrid © OpenBI, LLC 2009 13 Discussion Topics, Part 1 • The Open Source Software Environment – Principles of Open Source – Commercial Open Source – Open Source Business Intelligence • Cloud Computing Introduction • Case Study 1 – Open Source BI on the Cloud – Architecture – ETL – OLAP/Reporting © OpenBI, LLC 2009 14 Project Summary: Danone/Nutricia Client Profile Nutricia, a division of Danone, specializes in Baby and Medical Nutrition products. They provide medical nutrition for the management of conditions such as milk protein allergy, inborn errors of metabolism (e.g., PKU), pediatric epilepsy, Alzheimer’s & more. Nutricia markets its products across 19 countries. Project Background Internal order management system was limited in providing analytical insights on products, product groupings, time, customer, or geographic analysis in the aggregate. Scope © OpenBI, LLC 2009 Build a pilot analytical database and web-based business intelligence application to allow business users to see a high level snapshot of business performance, and be able to drill into detailed order and invoice activity to reveal performance trends. 15 Sales Performance Dashboard •Web-based dashboard provides quick analysis on sales activity for products, customers, and sales regions •Allows drilling from dashboard into interactive OLAP analysis sessions for a deeper look at sales activity © OpenBI, LLC 2009 16 Environment Overview • The pilot environment is hosted on an Amazon EC2 Large (Approx 8GB, 4CPU) Instance. This instance contains: – JBoss Web Server 4.2 – Pentaho BI Suite 1.7 • Includes custom web page templates, charts, and OLAP views – Pentaho Data Integration 3.0 • Includes custom ETL Routines to extract, transform and load data from the operational systems. – MySQL Database 5.0 • Stores BI database, Pentaho Repository, and User Database © OpenBI, LLC 2009 17 Environment Overview - Diagram Amazon EC2 Large Instance JBoss Application Server MySQL 5.0 Pentaho BI Suite Data Integration (ETL) Community Dashboard Analysis (OLAP) userdb hibernate bi Encrypted, Secure VPN © OpenBI, LLC 2009 US Order Management Canada Order Management Order Entry Order Entry Web Browser 18 Pentaho Data Integration Overview Pentaho BI Suite Data Integration (ETL) Community Dashboard Analysis (OLAP) • Graphically design data transformations and jobs • Built in 100% Java, cross-platform support • Extensible architecture –Ability to develop and plug in custom connectors –SAP and other ERP connectors available • Repository-based or file-based –Structured management of models, connections, logs and more –Easy re-use of queries and transformation components • Full-featured ETL –Over 100 pre-built objects –Support for all common data sources including leading RDBMSs and a variety of flat file formats –Advanced data warehousing support for Junk and Slowly Changing Dimensions –Can run across multiple servers as a cluster • Integration with Pentaho Open BI Platform –Schedule jobs and transformations –Leverage Pentaho alerting and workflow –Pentaho Reporting and Analysis for delivering information to the enterprise Source: Pentaho © OpenBI, LLC 2009 19 Pentaho BI Suite Data Integration (ETL) Community Dashboard Analysis (OLAP) Demo – Pentaho Data Integration © OpenBI, LLC 2009 20 Community Dashboard Framework Pentaho BI Suite Data Integration (ETL) Community Dashboard Analysis (OLAP) • Demonstrates Pentaho community development in action – Started by Ingo Klose and Pedro Alvares, first community releases were in 2008 • Provides a framework and templates for simple dashboard building – Includes basic selection/filtering objects, including text boxes, multi-select pick lists, calendar date selections, check boxes, etc. • Uses the Pentaho platform’s “guts” to provide data from databases, transforms, etc. – Allows Pentaho reports, charts, OLAP sessions and other objects to be embedded in the dashboard. • Version 3.0 released Jan 2009 © OpenBI, LLC 2009 21 Navigating The Dashboard Pentaho BI Suite Navigation Links move you to three Dashboard Pages (Product, Customer, Geography) and the OLAP solution browser Select Chart Options, and click the Refresh button to update the charts on the right. As you change chart options, the summary of what you’ve selected appears at the bottom. © OpenBI, LLC 2009 Data Integration (ETL) Community Dashboard Analysis (OLAP) Basic Navigation. Home and Logout are the most commonly used links here. Administrators may access the Admin link. Click on an individual bar or slice to see additional detail. Click Explore… to open an OLAP Session to perform more detailed analysis. 22 Drilling Into The Dashboard Pentaho BI Suite Clicking on bars and slices opens an OLAP Session to Drill and Explore Data. Context of analysis is brought over to get started with additional exploration. © OpenBI, LLC 2009 Data Integration (ETL) Community Dashboard Analysis (OLAP) 23 Pentaho Analysis (OLAP) Pentaho BI Suite Overview Data Integration (ETL) • Rich, interactive analysis – Web- or Excel-based access • Standards-based architecture – – – – J2EE architecture JDBC and JNDI connectivity SQL-based data retrieval XML/A and MDX front-end support • Embeddability and extensibility • Performance and scalability – ROLAP with Optimized SQL – Aggregate table support – Aggregation Designer • Pentaho Open BI Suite Integration – Comprehensive auditing of user activity, performance and data access – Integrated security, scheduling, alerting, portal integration, and metadata Source: Pentaho © OpenBI, LLC 2009 Community Dashboard Analysis (OLAP) Pentaho BI Suite Data Integration (ETL) Community Dashboard Analysis (OLAP) Demo –Dashboard & OLAP © OpenBI, LLC 2009 25 Pentaho Reports Overview • Broad range of reporting needs – – • HTML PDF Microsoft Excel/OpenCalc RTF (Microsoft Word/OpenWriter) CSV Provides critical functionality for end users – – – • Simple Columnar or Tabular Charts & Graphs Provide an assortment of output formats – – – – – • Pentaho BI Suite Data Integration (ETL) Web-based Access, Prompting/Parameterized Reports Scheduling, Subscriptions, Bursting/Distribution Web-based Ad Hoc Provides features for developers – Heterogeneous Data Sources • Relational, OLAP (Mondrian), XML, Pentaho Data Integration Transformations, Pentaho metadata – Modular Report Definition – Integration points to applications, portals – Graphical Design Tools • • • • • • Source: Pentaho © OpenBI, LLC 2009 Separates presentation from query JSP, Portlet, Web Service Drag and Drop Integrated Report Design Wizard and Query Builder Report Object Palette Browse Report Structure, Preview Report Community Dashboard Analysis (OLAP) Case Study - Conclusion • Project took approx 6 weeks, including requirements, design, build and deploy to the cloud • Has been operating since July 2008 • Users within and outside of the client’s walls can have secure access to performance metrics • Client plans to invest beyond the pilot in 2009 to include more data sources, data subjects, and sales forecasts © OpenBI, LLC 2009 27 Differing Philosophies on Open Source “ ” I think it addresses a niche market for high-end data analysts that want free, readily available code. We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet. Anne H. Milley, director of technology product marketing at SAS It’s interesting that SAS Institute feels that non-peerreviewed software with hidden implementations of analytic methods that cannot be reproduced by others should be trusted when building aircraft engines. Dr. Frank Harrell, Professor of Biostatistics and Department Chair at Vanderbilt University and R Community Member © OpenBI, LLC 2009 “ ” 28 Thank You! Kevin Haas kevin.haas@openbi.com © OpenBI, LLC 2009 29