Performance Management and Open Source Business Intelligence

advertisement
Open Source Business Intelligence
Case 1 – Open Source BI in the Cloud
Presented to TDWI Colorado Chapter
February 2009
© OpenBI, LLC 2009
1
Discussion Topics, Part 1
• The Open Source Software Environment
– Principles of Open Source
– Commercial Open Source
– Open Source Business Intelligence
• Cloud Computing Introduction
• Case Study 1 – Open Source BI on the Cloud
– Architecture
– ETL
– OLAP/Reporting
© OpenBI, LLC 2009
2
Business Intelligence in the Age of Collaboration
“Billions of connected
individuals can now actively
participate in innovation, wealth
creation, and social development
in ways we once only dreamed
of.
And when these masses of
people collaborate they
collectively can advance the arts,
culture, science, education,
government, and the economy in
surprising but ultimately
profitable ways.”
Don Tapscott and
Anthony D. Williams
© OpenBI, LLC 2009
3
Why Open Source Software?
Commercial Open Source is changing the rules
• Customer Control
– Free, global access to software
– Removal of license fee amortization
– Annual “proof-of-value” for vendors
• Lower Costs
– < 50% of the software cost of proprietary
alternatives
• Better Technology
– Modern, open architectures
– Global innovation engine
© OpenBI, LLC 2009
4
Free Open Source vs. Commercial Open Source
Free Open Source differs from Commercial Open Source in a number of ways.
• Free Open Source
–
–
–
–
–
Informal support and broad services providers
Uneven velocity of change
Community-directed roadmap
Functional gaps
Challenging licensing provisions
• Commercial Open Source
–
–
–
–
–
–
Formal support with service level agreements (SLAs)
Indemnification
Professional services and partnerships
Product management and roadmaps, and advisory boards
Business-friendly subscription models
Reference accounts, cases studies, and user groups
Source: Pentaho
© OpenBI, LLC 2009
5
Commercial Open Source Revenue Streams
There are several models for companies to generate revenue in the Open
Source market.
Subscriptions
Cross Selling
Education
© OpenBI, LLC 2009
Consulting and
Services
6
Open Source - Moving Up “The Stack”
ERP
CRM
BI
Database
Web App Server
& Portal
Operating System
• OSBI is today where Linux, Apache Tomcat, MySQL
and other levels in the “stack” recently stood
© OpenBI, LLC 2009
7
Sampling of OSBI Related Technologies
Open Source Business Intelligence Technologies
Business Intelligence Platforms & Technologies
BIRT Project
Database Platforms
Statistical Analysis/Data Mining Software
*
* WEKA is part of the Pentaho BI Suite
© OpenBI, LLC 2009
8
Pentaho and Jaspersoft
Two Popular Business Intelligence Platforms
The Pentaho suite offers
a well integrated set of
tools and components
deployed through a
comprehensive business
intelligence server.
The Jaspersoft suite offers
a series of tools and
libraries for integration into
other applications, or as a
standalone BI application.
• Both platforms
provide:
– ETL
– Reporting
– OLAP
– Dashboards
• Each platform
has factors
that make it
unique
Graphics Source: Pentaho
© OpenBI, LLC 2009
9
Discussion Topics, Part 1
• The Open Source Software Environment
– Principles of Open Source
– Commercial Open Source
– Open Source Business Intelligence
• Cloud Computing Introduction
• Case Study 1 – Open Source BI on the Cloud
– Architecture
– ETL
– OLAP/Reporting
© OpenBI, LLC 2009
10
What Is “Cloud Computing?”
• “Cloud computing is ondemand access to
virtualized IT resources that
are housed outside of your
own data center, shared by
others, simple to use, paid
for via subscription, and
accessed over the Web.”
–John Foley, Information Week,
September 2008
Seven Principles:
Off-site
Virtual
On-demand
Subscription
Shared
Simple
Web-Based
…or, use how Larry Ellison describes it:
"idiocy," "crap," "gibberish," "crazy," and "stupidest"
© OpenBI, LLC 2009
11
The Amazon Elastic Computing Cloud
Powered By Open Source
“If an economic downturn cools IT
capital spending, some business
technology managers may turn to
rent-by-the-hour cloud computing
resources…
If they turn to Amazon EC2, they're
tapping into open source Linux,
Apache, and a tweaked Xen open
source hypervisor that powers much
of the company's cloud's operation.”
Information Week, November 2008
© OpenBI, LLC 2009
12
Amazon Elastic Computing Cloud (EC2)
•
•
•
•
•
•
•
•
Most well known “Cloud Computer”
Allows customization of Amazon Machine
Images that can be started and run on demand
Different instance sizes from small 32-bit (1 CPU, 1.7GB RAM equivalent)
to extra large 64 bit (8 CPU, 15GB RAM) or extra large 64 bit, high CPU
(20 CPU, 7GB RAM)
Runs varied operating systems (Linux, Windows) and charged on an
hourly basis (Windows is 25-50% more expensive)
Can attach persistent storage to an instance, charged by the GB
Accessed via command line or web interface
Some data charges apply for transfer in and out of Amazon
Competitors:
– IBM (Computing On Demand), Google (App Engine), AT&T (Synaptic),
Microsoft (Azure)
– Rackspace, Flexiscale, GoGrid
© OpenBI, LLC 2009
13
Discussion Topics, Part 1
• The Open Source Software Environment
– Principles of Open Source
– Commercial Open Source
– Open Source Business Intelligence
• Cloud Computing Introduction
• Case Study 1 – Open Source BI on the Cloud
– Architecture
– ETL
– OLAP/Reporting
© OpenBI, LLC 2009
14
Project Summary: Danone/Nutricia
Client Profile
Nutricia, a division of Danone,
specializes in Baby and Medical
Nutrition products. They provide
medical nutrition for the management
of conditions such as milk protein allergy, inborn errors of
metabolism (e.g., PKU), pediatric epilepsy, Alzheimer’s &
more. Nutricia markets its products across 19 countries.
Project
Background
Internal order management system was limited in providing
analytical insights on products, product groupings, time,
customer, or geographic analysis in the aggregate.
Scope
© OpenBI, LLC 2009
Build a pilot analytical database and web-based business
intelligence application to allow business users to see a high
level snapshot of business performance, and be able to drill
into detailed order and invoice activity to reveal performance
trends.
15
Sales Performance Dashboard
•Web-based dashboard provides quick analysis on sales activity
for products, customers, and sales regions
•Allows drilling from dashboard into interactive OLAP analysis
sessions for a deeper look at sales activity
© OpenBI, LLC 2009
16
Environment Overview
• The pilot environment is hosted on an Amazon
EC2 Large (Approx 8GB, 4CPU) Instance.
This instance contains:
– JBoss Web Server 4.2
– Pentaho BI Suite 1.7
• Includes custom web page templates, charts, and OLAP
views
– Pentaho Data Integration 3.0
• Includes custom ETL Routines to extract, transform and
load data from the operational systems.
– MySQL Database 5.0
• Stores BI database, Pentaho Repository, and User
Database
© OpenBI, LLC 2009
17
Environment Overview - Diagram
Amazon EC2 Large Instance
JBoss Application Server
MySQL 5.0
Pentaho BI Suite
Data
Integration
(ETL)
Community
Dashboard
Analysis
(OLAP)
userdb
hibernate
bi
Encrypted, Secure VPN
© OpenBI, LLC 2009
US
Order
Management
Canada
Order
Management
Order Entry
Order Entry
Web
Browser
18
Pentaho Data Integration
Overview
Pentaho BI Suite
Data
Integration
(ETL)
Community
Dashboard
Analysis
(OLAP)
• Graphically design data transformations and jobs
• Built in 100% Java, cross-platform support
• Extensible architecture
–Ability to develop and plug in custom
connectors
–SAP and other ERP connectors available
• Repository-based or file-based
–Structured management of models,
connections, logs and more
–Easy re-use of queries and transformation
components
• Full-featured ETL
–Over 100 pre-built objects
–Support for all common data sources including
leading RDBMSs and a variety of flat file
formats
–Advanced data warehousing support for Junk
and Slowly Changing Dimensions
–Can run across multiple servers as a cluster
• Integration with Pentaho Open BI Platform
–Schedule jobs and transformations
–Leverage Pentaho alerting and workflow
–Pentaho Reporting and Analysis for delivering
information to the enterprise
Source: Pentaho
© OpenBI, LLC 2009
19
Pentaho BI Suite
Data
Integration
(ETL)
Community
Dashboard
Analysis
(OLAP)
Demo – Pentaho Data Integration
© OpenBI, LLC 2009
20
Community Dashboard Framework
Pentaho BI Suite
Data
Integration
(ETL)
Community
Dashboard
Analysis
(OLAP)
• Demonstrates Pentaho community development in
action
– Started by Ingo Klose and Pedro Alvares, first community
releases were in 2008
• Provides a framework and templates for simple
dashboard building
– Includes basic selection/filtering objects, including text boxes,
multi-select pick lists, calendar date selections, check boxes,
etc.
• Uses the Pentaho platform’s “guts” to provide data
from databases, transforms, etc.
– Allows Pentaho reports, charts, OLAP sessions and other
objects to be embedded in the dashboard.
• Version 3.0 released Jan 2009
© OpenBI, LLC 2009
21
Navigating The Dashboard
Pentaho BI Suite
Navigation
Links move
you to three
Dashboard
Pages
(Product,
Customer,
Geography)
and the OLAP
solution
browser
Select Chart
Options, and
click the
Refresh button
to update the
charts on the
right.
As you change
chart options, the
summary of what
you’ve selected
appears at the
bottom.
© OpenBI, LLC 2009
Data
Integration
(ETL)
Community
Dashboard
Analysis
(OLAP)
Basic Navigation.
Home and Logout
are the most
commonly used
links here.
Administrators may
access the Admin
link.
Click on an
individual bar or
slice to see
additional detail.
Click Explore… to
open an OLAP
Session to perform
more detailed
analysis.
22
Drilling Into The Dashboard
Pentaho BI Suite
Clicking on bars and slices opens an OLAP Session to Drill and
Explore Data. Context of analysis is brought over to get started with
additional exploration.
© OpenBI, LLC 2009
Data
Integration
(ETL)
Community
Dashboard
Analysis
(OLAP)
23
Pentaho Analysis (OLAP)
Pentaho BI Suite
Overview
Data
Integration
(ETL)
• Rich, interactive analysis
– Web- or Excel-based access
• Standards-based architecture
–
–
–
–
J2EE architecture
JDBC and JNDI connectivity
SQL-based data retrieval
XML/A and MDX front-end support
• Embeddability and extensibility
• Performance and scalability
– ROLAP with Optimized SQL
– Aggregate table support
– Aggregation Designer
• Pentaho Open BI Suite Integration
– Comprehensive auditing of user activity,
performance and data access
– Integrated security, scheduling, alerting, portal
integration, and metadata
Source: Pentaho
© OpenBI, LLC 2009
Community
Dashboard
Analysis
(OLAP)
Pentaho BI Suite
Data
Integration
(ETL)
Community
Dashboard
Analysis
(OLAP)
Demo –Dashboard & OLAP
© OpenBI, LLC 2009
25
Pentaho Reports
Overview
•
Broad range of reporting needs
–
–
•
HTML
PDF
Microsoft Excel/OpenCalc
RTF (Microsoft Word/OpenWriter)
CSV
Provides critical functionality for end users
–
–
–
•
Simple Columnar or Tabular
Charts & Graphs
Provide an assortment of output formats
–
–
–
–
–
•
Pentaho BI Suite
Data
Integration
(ETL)
Web-based Access, Prompting/Parameterized Reports
Scheduling, Subscriptions, Bursting/Distribution
Web-based Ad Hoc
Provides features for developers
–
Heterogeneous Data Sources
•
Relational, OLAP (Mondrian), XML, Pentaho Data Integration
Transformations, Pentaho metadata
–
Modular Report Definition
–
Integration points to applications, portals
–
Graphical Design Tools
•
•
•
•
•
•
Source: Pentaho
© OpenBI, LLC 2009
Separates presentation from query
JSP, Portlet, Web Service
Drag and Drop
Integrated Report Design Wizard and Query Builder
Report Object Palette
Browse Report Structure, Preview Report
Community
Dashboard
Analysis
(OLAP)
Case Study - Conclusion
• Project took approx 6 weeks, including
requirements, design, build and deploy to the
cloud
• Has been operating since July 2008
• Users within and outside of the client’s walls
can have secure access to performance
metrics
• Client plans to invest beyond the pilot in 2009
to include more data sources, data subjects,
and sales forecasts
© OpenBI, LLC 2009
27
Differing Philosophies on Open Source
“
”
I think it addresses a niche market for high-end data
analysts that want free, readily available code. We
have customers who build engines for aircraft. I am
happy they are not using freeware when I get on a jet.
Anne H. Milley, director of technology product marketing at
SAS
It’s interesting that SAS Institute feels that non-peerreviewed software with hidden implementations of
analytic methods that cannot be reproduced by others
should be trusted when building aircraft engines.
Dr. Frank Harrell, Professor of Biostatistics and
Department Chair at Vanderbilt University and
R Community Member
© OpenBI, LLC 2009
“
”
28
Thank You!
Kevin Haas
kevin.haas@openbi.com
© OpenBI, LLC 2009
29
Download