2/10/2011 The Source of Business Intelligence Contributors to Effective Decision Making

advertisement
2/10/2011
The Source of Business Intelligence
1
Contributors to Effective Decision
Making
 Specific goals
 Measures to assess progress
toward goals
 Specific – not vague
 Able to implement
 Foundational
F d i l information
i f
i
from feedback
 based on implemented
Business
B
i
Intelligence
I t lli
provides
id
foundation information and
feedback information
measures
 used as basis for decision
making
 used to evaluate a decision
made
2
Where does the Information for
Business Intelligence Come From?
 Customer satisfaction surveys
 Market research
 Customer’s behavior (e.g. buying behavior)
 Transactional data
 Holds a wealth of information about customers
3
1
2/10/2011
Transactional Data
Definition
 “Transactional data is the information stored
to track the interactions, or business
transactions, carried out by an
organization.”
 Larson,
L
B.
B (2008).
(2008) Delivering
D li i Business
B i
Intelligence with Microsoft SQL Server
2008. New York: McGraw-Hill Osborne.
4
Online Transaction Processing
Definition
 “Online transaction processing (OLTP)
systems record business interactions as they
happen. They support the day-to-day
operation of an organization.”
 Larson,
L
B.
B (2008).
(2008) Delivering
D li i BBusiness
i
Intelligence with Microsoft SQL Server 2008. New
York: McGraw-Hill Osborne.
5
Why Do We Need Another System for BI
if the Data is Already in the OLTP
Systems?
OLTP
BI
 Optimized for efficiently processing
 Data is not “per transaction”
and storing transactions
 Highly normalized relational designs
 Not designed to efficiently deliver
aggregates of large numbers of data
points
 Can slow the system down
 OLTP systems support day-to-day
business operations
 Some data in OLTP systems is archived
so that the system continues to operate
efficiently
 Transactional data may not be all in one
place (accounting system, inventory
system, etc.)
data
 Aggregated, summarized
 Cumulative transaction
results
 Over time – so need data that
coincides with time in the
past
 Need data across functional
areas for BI
6
2
2/10/2011
Aggregate
Definition
 “An aggregate is a number that is calculated
from amounts in many detail records. An
aggregate may be a sum of many numbers,
but it can also be derived using other
arithmetic operations or even from a count
of the number of items in a group.”
 Larson, B. (2008). Delivering Business
Intelligence with Microsoft SQL Server 2008. New
York: McGraw-Hill Osborne.
7
Solution
Extract data from OLTP systems and put it in a specially designed
data mart.
8
Data Mart
 Definition
 “A data mart is a body of
historical data in an electronic
repository that does not
participate in the daily
operations of the organization
I
Instead,
d this
hi data
d iis usedd to
create business intelligence.
The data in the data mart
usually applies to a specific
aspect of the business.”
 Larson, B. (2008). Delivering
Business Intelligence with
Microsoft SQL Server 2008. New
York: McGraw-Hill Osborne.
http://www.outcomesinc.com/odis_datamart
9
3
2/10/2011
Data Mart vs. Data Warehouse
 Data marts are smaller development projects that focus on a
particular aspect of a business
 Data warehouse is a one-stop repository for all the historical
data of a business
10
Data Mart Features
 Built for speed of access
 Can be a relational database
 Requires fewer table joins when data is retrieved for a view
that is the basis of a report
 Denormalization
D
li i is
i OK when
h ddone to speedd up retrieval
i l
 Design is organized around fact tables and dimension tables
 Result is a “star” schema
 Multiple stars from “snowflakes”
11
How Does Data Get In to the Data
Mart?
 Loaded at regular intervals
 Based on extraction of data from OLTP systems
 Can be from disparate OLTP systems
 Data load = periodic copy of data from OLTP into data mart
 Automated,
A t
t d generally
ll speaking
ki
 Run at off-peak hours
 Once in the data mart, the data is largely static
 There is some latency in the data mart data
 Amount of latency is planned around business intelligence
needs
12
4
2/10/2011
Data Cleansing
Data Mart = OLTP + OLTP
Problems
 Data marts often contain
 Different data types may
data from multiple OLTP
systems
y
 Enables complex measures
to be calculated and
compared
 Data problems must be
resolved before data is
entered into the data mart
exist across systems for the
same qquantities
 Different unique identifiers
for same entitity
 Different time periods
used
13
Data Cleansing
 Definition
 “Data cleansing removes
inconsistencies and errors
from transactional data so
it has the consistency
necessary for
f use ina data
d
mart.”
 Larson, B. (2008).
Delivering Business
Intelligence with Microsoft
SQL Server 2008. NewYork:
McGraw-Hill Osborne.
14
Extract – Transform - Load
ETL
Extract
Transform
(Data
Cleansing)
Load
15
5
2/10/2011
ETL
Definition
 “The Extract, Transform, and Load (ETL)
process:
extracts data to copy from one or more
OLTP systems,
2. Performs anyy required
q
data cleansingg to
transform the data into a consistent
format,
3. And loads the cleansed data by inserting it
into the data mart.”
 Larson, B. (2008). Delivering Business
Intelligence with Microsoft SQL Server 2008.
NewYork: McGraw-Hill Osborne.
1.
16
Data Mart Structure
 Based on four categories of
data used for BI
 Measures
 Dimensions
 Attributes
 Hierarchies
17
Measures
Definition
 “A measure is a numeric quantity
expressing some aspect of the organization’s
performance. The information represented
by this quantity is used to support or
evaluate the decision making and
performance of the organization.”
 Larson, B. (2008). Delivering Business
Intelligence with Microsoft SQL Server
2008. New York: McGraw-Hill Osborne.
 A measure is often also called a fact
18
6
2/10/2011
Dimensions
Definition
 A dimension is a categorization used to
spread out an aggregate measure to reveal
its constituent parts.
 Larson, B. (2008). Delivering Business
Intelligence with Microsoft SQL Server 2008.
2008 New
York: McGraw-Hill Osborne.
19
Dimensions
Total Sales
(a fact)
Total Sales for 2005
(by year adds time
as a dimension)
Total Sales of Mythic
World for 2005
(by product by year
adds product as a
second dimension)
20
Find the Total Sales by Product by Year
and by Region
Region
South
Central
Mythic
World
Product
2005
21
Year
7
2/10/2011
The Star Schema
 A relational database schema that holds measures and
dimensions in a data mart
 Fact tables contain measures
 Dimension tables
 One
O table
t bl per di
dimension
i
22
Star
Schema
3 dimensions
1 fact table
23
Fact Table
 Has a column for the measure
 Could be multiple measures
 Has fields for dimension keys (FKs)
 PK is a composite key made up of all the dimension FKs
24
8
2/10/2011
Dimension Tables
 More than one because you usually want to be able to see the
data from more than one perspective
 Field for unique identifier
 Field for description
25
Attributes
Definition
 “An attribute is an additional piece of
information pertaining to a dimension
memberthat is not the unique identifier or
the description of the member.”
 Larson,
L
B.
B (2008).
(2008) Delivering
D li i BBusiness
i
Intelligence with Microsoft SQL Server 2008. New
York: McGraw-Hill Osborne.
 Often included as filtering agents
 Become additional fields in the corresponding
dimension table
26
Hierarchies
Definition
 “A hierarchy is a structure made up of two or
more levels of related dimensions. A dimension
at an upper level of the hierarchy completely
contains one or more dimensions from the next
lower level of the hierarchy.”
 Larson, B. (2008). Deliveringg Business Intelligence
g
with Microsoft SQL Server 2008. NewYork: McGrawHill Osborne.
 Example:Year – Quarter – Month
 Example: ProductType – ProductSubType -
Product
 Enable drill-down-roll-up access to more
granularity or more aggregation as needed
27
9
2/10/2011
Structuring Hierarchies
 PK is at the lowest level of the hierarchy
 FK in fact table must point to the lowest level of the
hierarchy
 Fact table will have one row = one set of measures for each
unique combination of members at the lowest level of all the
hierarchies
 Measures at higher levels of the hierarchy are calculated
rather than stored
 Aggregates: Sum, Average
28
Snowflake Schema
 Each level of a hierarchy is stored in a separate dimenion
table
 Advantages
 Good relational design
 Easier to maintain
 Disadvantages
 Requires a number of table joins when performing aggregation
operations
 Performance issue in large databases
 Business intelligence  make information readily available to
decision makers
29
The Source of Business Intelligence
30
10
Download