Business Intelligence and Multidimensional databases

advertisement
Business Intelligence &
Multi-Dimensional Databases
Nirmal Jonnalagedda
Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
BI: History
BI: Overview
Common Functions of BI
BI: What can you do with it?
Multidimensional Databases
Contrast MDD and Relational Databases
When is MDD (In)appropriate?
MDD Features
Pros/Cons of MDD
BI: History



1958 - Term first used by IBM researcher Hans Peter
Luhn
 He defined intelligence as: “the ability to
apprehend the interrelationships of presented
facts in such a way as to guide action towards a
desired goal”
BI is understood to have evolved decision support
systems (DSS) in the 1960’s
In the 80’s DSS concepts evolved and split
 data warehouses, Executive Information Systems,
OLAP
BI : an overview



There are many different opinions
 Depends on where you work
 Generally BA is a subset of BI
BI - Ability for an organization to take its capabilities
and convert these things into knowledge
Often includes the implementation of
 Key Performance Indicators (KPIs), Trending
Analysis, Predictive Modeling



What does BI provide
 historical, current and predictive views of business
operations
Where does BI get this information
 From within your business
 Not necessarily focused on the actions of others
 Called competitive analysis
End Goal : Support better decision making
 BI is sometimes called a decision support system
(DSS)

BI applications can often vary in scope
 Can be enterprise wide, focusing on critical
business applications
 Monitoring the popularity of a product in a
nationwide grocery chain
 Tracking responses to mail offers and only
mailing those who respond
 Can be department or project specific, focused on
individual decisions and how those affect an
organization
 Monitoring employee productivity and
department spending
Common Functions of BI








Reporting
Online Analytical Processing
Analytics
Data, Process, Text Mining
Complex Event Processing
Business Performance Management
Benchmarking
Predictive, Prescriptive Analytics
BI : What can you do with it?






Identify cost cutting ideas and practices
Uncover new business opportunities
React and even predict retail demand
Avoid repeating costly mistakes
 Especially useful in large enterprises with many
departments
Easily correlate and group business information and
metrics into an understandable format
Understand customer behavior
Database Evolution





Flat files
Hierarchical and Network
Relational
Distributed Relational
Multidimensional
MDDB: Why?




No single "best" data structure for all applications
within an enterprise
Organizations have abandoned the search for the
holy grail of globally accepted database
Instead selecting the most appropriate data structure
on a case-by-case basis from a palette of standard
database structures
Multidimensional Databases for OLAP?





From econometric research conducted at MIT in the
1960s, the multidimensional database has matured
into the database engine of choice for data analysis
applications
Inherent ability to integrate and analyze large
volumes of enterprise data
Offers a good conceptual fit with the way end-users
visualize business data
Most business people already think about their
businesses in multidimensional terms
Managers tend to ask questions about product sales
in different markets over specific time periods





Spreadsheets – A 2D database?
Functionalities
What about a stack of similar spreadsheets
for different times?
Limitations?
We can not relate data in different sheets
easily
What is a Multi-Dimensional
Database?
A multidimensional database (MDDB) is a computer
software system designed to allow for the efficient
and convenient storage and retrieval of large
volumes of data that are
(1) intimately related and
(2) stored, viewed and analyzed from different
perspectives.
These perspectives are called dimensions.
A Motivating Example
An automobile manufacturer wants to increase sale
volumes by examining sales data collected throughout
the organization. The evaluation would require viewing
historical sales volume figures from multiple dimensions
such as
 Sales volume by model
 Sales volume by color
 Sales volume by dealer
 Sales volume over time
Contrasting Relational and MultiDimensional Models
SALES VOLUMES FOR GLEASON DEALERSHIP
MODEL
MINI VAN
MINI VAN
MINI VAN
SPORTS COUPE
SPORTS COUPE
SPORTS COUPE
SEDAN
SEDAN
SEDAN
COLOR
BLUE
RED
WHITE
BLUE
RED
WHITE
BLUE
RED
WHITE
The Relational Structure
SALES VOLUME
6
5
4
3
5
5
4
3
2
Multidimensional Structure
Sales Volumes
Dimension
M
O
D
E
L
Mini Van
6
5
4
Coupe
3
5
5
Sedan
4
3
2
Blue
Red
White
COLOR
Positions
Measurement
Dimension
Differences between MDDB and
Relational Databases
Normalized Relational
MDDB
Data reorganized based on
query. Perspectives are placed
in the fields – tells us nothing
about the contents
Perspectives embedded directly
in the structure.
Browsing and data manipulation Data retrieval and manipulation
are not intuitive to user
are easy
Slows down for large datasets
Fast retrieval for large datasets
due to multiple JOIN operations due to predefined structure.
needed.
Flexible. Anything an MDDB can Relatively Inflexible. Changes in
do, can be done this way.
perspectives necessitate
reprogramming of structure.
Contrasting Relational Model
and MDD-Example 2
SALES VOLUMES FOR ALL DEALERSHIPS
MODEL
MINI VAN
MINI VAN
MINI VAN
MINI VAN
MINI VAN
MINI VAN
MINI VAN
MINI VAN
MINI VAN
SPORTS COUPE
SPORTS COUPE
SPORTS COUPE
SPORTS COUPE
SPORTS COUPE
SPORTS COUPE
SPORTS COUPE
SPORTS COUPE
SPORTS COUPE
SEDAN
SEDAN
SEDAN
SEDAN
SEDAN
SEDAN
SEDAN
SEDAN
SEDAN
COLOR
BLUE
BLUE
BLUE
RED
RED
RED
WHITE
WHITE
WHITE
BLUE
BLUE
BLUE
RED
RED
RED
WHITE
WHITE
WHITE
BLUE
BLUE
BLUE
RED
RED
RED
WHITE
WHITE
WHITE
DEALERSHIP
CLYDE
GLEASON
CARR
CLYDE
GLEASON
CARR
CLYDE
GLEASON
CARR
CLYDE
GLEASON
CARR
CLYDE
GLEASON
CARR
CLYDE
GLEASON
CARR
CLYDE
GLEASON
CARR
CLYDE
GLEASON
CARR
CLYDE
GLEASON
CARR
VOLUME
6
6
2
3
5
5
2
4
3
2
3
2
7
5
2
4
5
1
6
4
2
1
3
4
2
2
3
Mutlidimensional Representation
Sales Volumes
M
O
D
E
L
Mini Van
Coupe
Carr
Gleason
Cly de
Sedan
Blue
Red
White
COLOR
DEALERSHIP
Viewing Data - An Example
Sales Volumes
M
O
D
E
L
DEALERSHIP
COLOR
•Assume that each dimension has 10 positions, as shown in the cube above
•How many records would be there in a relational table?
•Implications for viewing data from an end-user standpoint?
Performance Advantages
Volume figure when car type = SEDAN,
color=BLUE, & dealer=GLEASON?





RDBMS – all 1000 records might need to be searched
to find the right record
MDB has more ‘knowledge’ about where the data lies
Maximum of 30 position searches
Average case
15 vs. 500
Total Sales across all colors and dealers when
model = SEDAN?


RDBMS – all 1000 records must be searched to get
the answer
MDB – Sum the contents of one 10x10 ‘slice’




Data manipulation that requires a minute in RDBMS
may require only a few seconds in MDB
MDBs are an order of magnitude faster than RDBMSs
Performance benefits are more for queries that
generate cross-tab views of data
The performance advantages offered by
multidimensional technology facilitates the
development of interactive decision support
applications like OLAP that can be impractical in a
relational environment.
Real World Benefits



Ease of data presentation and navigation
Ease of maintenance
Performance
Ease of Data Presentation and
Navigation



Intuitive spreadsheet like data views are natural
output of MDDBs
Obtaining the same views in a relational
environment, requires either a complex SQL or a SQL
generator against a RDB to convert the table outputs
into a more intuitive format
Even for end users well skilled in SQL, some forms of
output, such as ranking reports (i.e. top ten, bottom
20%), simply cannot be performed with SQL at all!
Ease of Maintenance



Ease of maintenance because data is stored as it is
viewed
No additional overhead is required to translate user
queries into requests for data
To provide same intuitiveness, RDBs use indexes and
sophisticated joins which require significant
maintenance and storage
Performance





Multidimensional databases achieve performance
levels that are difficult to match in a relational
environment.
These high performance levels enable and encourage
OLAP applications
Performance of MDBs can be matched by RDBs
through database tuning
Not possible to tune the database for all possible
adhoc queries
Tuning requires resources of an expensive DB
specialist
Adding Dimensions- An Example
Sales Volumes
M
O
D
E
L
Mini Van
Mini Van
Coupe
Mini Van
Coupe
Carr
Gleason
Clyde
Sedan
Blue
Red
White
COLOR
JANUARY
Coupe
Carr
Gleason
Clyde
Sedan
Blue
Red
White
COLOR
FEBRUARY
Carr
Gleason
Clyde
Sedan
Blue
Red
White
COLOR
MARCH
DEALERSHIP
When is MDD (In)appropriate?
First, consider situation 1
PERSONNEL
LAST NAME
SMITH
REGAN
FOX
WELD
KELLY
LINK
KRANZ
LUCUS
WEISS
EMPLOYEE#
01
12
31
14
54
03
41
33
23
EMPLOYEE AGE
21
19
63
31
27
56
45
41
19
When is MDD (In)appropriate?
Now consider situation 2
SALES VOLUMES FOR GLEASON DEALERSHIP
MODEL
MINI VAN
MINI VAN
MINI VAN
SPORTS COUPE
SPORTS COUPE
SPORTS COUPE
SEDAN
SEDAN
SEDAN
COLOR
BLUE
RED
WHITE
BLUE
RED
WHITE
BLUE
RED
WHITE
VOLUME
6
5
4
3
5
5
4
3
2
1. Set up a MDD structure for situation 1, with LAST NAME
and Employee# as dimensions, and AGE as the measurement.
2. Set up a MDD structure for situation 2, with MODEL and
COLOR as dimensions, and SALES VOLUME as the measurement.
When is MDD (In)appropriate?
MDD Structures for the Situations
Employee Age
Smith
21
Regan
19
Sales Volumes
Fox
M
O
D
E
L
Miini Van
6
5
4
Coupe
3
5
5
4
3
2
Blue
Red
White
Sedan
L
A
S
T
63
Weld
31
Kelly
N
A
M
E
27
Link
56
Kranz
45
COLOR
Lucas
41
Weiss
19
31
41
23
01
14
54
03
12
33
EMPLOYEE #
Note the sparseness in the second MDD representation
When is MDD (In)appropriate?




Our sales volume dataset has a great number of
meaningful interrelationships
Interrelationships more meaningful than individual
data elements themselves.
The greater the number of inherent interrelationships
between the elements of a dataset, the more likely it
is that a study of those interrelationships will yield
business information of value to the company.
Highly interrelated dataset types be placed in a
multidimensional data structure for greatest ease of
access and analysis
When is MDD (In)appropriate?



No last name is matching with more than one emp #
and no emp # is matching with more than one last
name
In contrast, there is a sales figure associated with
every combination of model and color resulting in a
completed filled up 3x3 matrix
Performance suffers (RDB 9 vs. MDB 18)
When is MDD (In)appropriate?



The relative performance advantages of storing
multidimensional data in a multidimensional array
increase as the size of the dataset increases
The relative performance disadvantages of storing
non-multidimensional data in a multidimensional
array increase as the size of the dataset increases.
NO inherent value of storing Non-multidimensional
data (employee data) in multidimensional arrays
When is MDD (In)appropriate?



The relative performance advantages of storing
multidimensional data in a multidimensional array
increase as the size of the dataset increases
The relative performance disadvantages of storing
non-multidimensional data in a multidimensional
array increase as the size of the dataset increases.
NO inherent value of storing Non-multidimensional
data (employee data) in multidimensional arrays
When is MDD Appropriate?
The greater the number of inherent interrelationships
between the elements of a dataset, the more likely it is
that a study of those interrelationships will yield
business information of value to the company.


Most companies have limited time and resources to
devote to analyzing data
It therefore becomes critical that these highly
interrelated dataset types be placed in a
multidimensional data structure for greatest ease of
access and analysis.
When is MDD Appropriate?
Examples of applications that are suited for
multidimensional technology:






Financial Analysis and Reporting
Budgeting
Promotion Tracking
Quality Assurance and Quality Control
Product Profitability
Survey Analysis
MDD Features - Rotation
Sales Volumes
M
O
D
E
L
Mini Van
6
5
4
Coupe
3
5
5
Sedan
4
3
2
Blue
Red
C
O
L
O
R
o
( ROTATE 90 )
White
COLOR
View #1
Blue
6
3
4
Red
5
5
3
White
4
5
2
Mini Van Coupe
Sedan
M ODEL
View #2
•Also referred to as “data slicing.”
•Each rotation yields a different slice or two dimensional table
of data – a different face of the cube.
MDD Features - Rotation
Sales Volumes
M
O
D
E
L
Mini Van
C
O
L
O
R
Coupe
Carr
Gleason
Clyde
Sedan
Blue
Red
Red
Carr
Gleason
Clyde
White
Sedan
White
COLOR
Coupe
D
E
A
L
E
R
S
H
I
P
Mini Van
Coupe
Sedan
White
Red
Blue
COLOR
o
( ROTATE 90 )
MODEL
View #3
Mini Van
Carr
M
O
D
E
L
Gleason
Blue
Red
White
Clyde
Mini Van Coupe Sedan
o
Coupe
Blue
Red
White
Sedan
Clyde Gleason Carr
MODEL
( ROTATE 90 )
o
DEALERSHIP
( ROTATE 90 )
MODEL
View #4
DEALERSHIP
o
View #2
Gleason
Gleason Clyde
DEALERSHIP
View #1
Carr
Mini Van
Coupe
Sedan
( ROTATE 90 )
DEALERSHIP
Clyde
Red
White
Carr
MODEL
o
Blue
Mini Van
( ROTATE 90 )
D
E
A
L
E
R
S
H
I
P
C
O
L
O
R
Blue
COLOR
View #5
COLOR
View #6
MDD Features - Rotation



All the six views can be obtained by simple rotation
In MDBs rotations are simple as no rearrangement
of data is required
Rotation is also referred to as “data slicing”
MDD Features - Ranging



How sales volume of models painted with new
metallic blue compared with the sales of normal blue
color models?
The user knows that only Sports Coupe and Mini Van
models have received the new paint treatment
Also the user knows that only 2 dealers viz, Carr and
Clyde have unconstrained supply of these models
MDD Features - Ranging
Sales Volumes
M
O
D
E
L
Mini Van
Mini Van
Coupe
Coupe
Normal
Blue
Carr
Clyde
Normal
Blue
Metal
Blue
Carr
Clyde
Metal
Blue
DEALERSHIP
COLOR
• The end user selects the desired positions along each dimension.
• Also referred to as "data dicing."
• The data is scoped down to a subset grouping
MDD Features - Ranging




The reduced array can now be rotated and used in
computations in the same was as the parent array
Referred to as “Data Dicing” as data is scoped down
to a subset grouping
Complex SQL query is required in RDB
Performance is better in MDB as less resource
consuming searches are required
MDD Features - Roll-Ups & Drill
Downs






Users want different views of the same data
For eg., Sales Volume by model vs sales volume by
dealership
Many times views are similar
Sales volume by dealership vs. volume by district
Natural relationship between Sales Volumes at the
DEALERSHIP level and Sales Volumes at the
DISTRICT level
Sales Volumes for all the dealerships in a district sum
to the Sales Volumes for that district
MDD Features - Roll-Ups & Drill
Downs




Multidimensional database technology is specially
designed to facilitate the handling of natural
relationships
Define two related aggregates on the same
dimension
One aggregation is dealership and the other district
District is at a higher level of aggregation than
dealership
MDD Features - Roll-Ups & Drill
Downs
ORGANIZATION DIMENSION
M idwe st
REGION
DISTRICT
DEALERSHIP
Chicago
Clyde
Gle ason
St. Louis
Carr
Lev i
Gary
Lucas
Bolton
• The figure presents a definition of a hierarchy within the
organization dimension.
• Aggregations perceived as being part of the same dimension.
• Moving up and moving down levels in a hierarchy is referred to
as “roll-up” and “drill-down.”
MDD Features - Roll-Ups & Drill
Downs
Queries



High degree of structure in MDB makes the query
language very simple and efficient
Query language is intuitive
Output is immediately useful to end user
Queries: Example

Display sales volume by model for each dealership
PRINT TOTAL.(SALES_VOLUME KEEP MODEL
DEALERSHIP)
Queries: Example

Corresponding SQL
SELECT MODEL, DEALERSHIP,
SUM(SALES_VOLUME)
FROM SALES_VOLUME
GROUP BY MODEL, DEALERSHIP
ORDER BY MODEL, DEALERSHIP
Queries: Example
Pros/Cons of MDD





Cognitive Advantages for the User
Ease of Data Presentation and Navigation, Time
dimension
Performance
Less flexible
Requires greater initial effort
?
Download