Business Intelligence
Unit 1
Important Concepts
INTRODUCTION
• Organizations need business intelligence
• Business intelligence (BI) – knowledge about
your customers, competitors, business
partners, competitive environment, and
internal operations to make effective,
important, and strategic business decisions
3-3
INTRODUCTION
• IT tools help process information to create
business intelligence according to:
– OLTP
– OLAP
3-4
INTRODUCTION
• Online transaction processing (OLTP) – the
gathering of input information, processing
that information, and updating existing
information to reflect the gathered and
processed information
– Databases support OLTP
– Operational database – databases that support
OLTP
3-5
INTRODUCTION
• Online analytical processing (OLAP) – the
manipulation of information to support
decision making
– Databases can support some OLAP
– Data warehouses only support OLAP, not OLTP
– Data warehouses are special forms of databases
that support decision making
3-6
INTRODUCTION
3-7
What Is a Data Warehouse?
• Data warehouse – logical collection of
information – gathered from operational
databases – used to create business intelligence
that supports business analysis activities and
decision-making tasks
“A data warehouse is simply a single, complete, and
consistent store of data obtained from a variety of
sources and made available to end users in a way
they can understand and use it in a business
context.”
-- Barry Devlin, IBM Consultant
3-8
What Is a Data Warehouse?
3-9
What Is a Data Warehouse?
 Multidimensional
 Rows and columns
 Also layers
 Many times called hypercubes
3-10
Data Warehouses
 a record of an enterprise's past transactional and operational
information
 designed to favor efficient data analysis and reporting
 data warehousing is not meant for current "live" data
Data Warehouses
 large amounts of data – sometimes subdivided into smaller
logical units (dependent data marts)
What Are Data-Mining Tools?
 Data-mining tools – software tools that you use to query
information in a data warehouse
 Query-and-reporting tools
 Intelligence agents
 Multidimensional analysis tools
 Statistical tools
3-13
Data Warehouses
Components of a data warehouse:
 Sources -> Data Source Interaction
 Data Transformation
 Data Warehouse (Data Storage)
 Reporting (Data Presentation)
 Metadata
Data Warehouses
ADVANTAGES
complete control over the four main areas of data management systems:
 Clean data
 Query processing: multiple options
 Indexes: multiple types
 Security: data and access
Data Warehouses
DISADVANTAGES
 Adding new data sources takes time and associated high cost
 Data owners lose control over their data, raising ownership, security and
privacy issues
 Long initial implementation time and associated high cost
 Difficult to accommodate changes in data types and ranges, data source
schema, indexes and queries
OLTP vs. OLAP
 OLTP: On Line Transaction Processing
 Describes processing at operational sites
 OLAP: On Line Analytical Processing
 Describes processing at warehouse
OLTP Database
vs.
Data Warehouse
 relational databases - groups data using common attributes
found in the data set
 objectives are different
OLTP database
Designed for real
time business
operations
Data Warehouse
Designed for analysis of
business measures by
categories and attributes
OLTP database
Data Warehouse
 Mostly updates

Mostly reads
 Many small transactions

Queries are long and
complex

Gb - Tb of data
 Mb - Gb of data
OLTP database
Data Warehouse
 Current snapshot

History
 Raw data

Summarized, reconciled data

Hundreds of users (e.g.,
decision-makers, analysts)
 Thousands of users (e.g.,
clerical users)
SUMMARY
four questions for you 
1
Designed for analysis of
business measures by
categories and attributes
2
Designed for real
time business
operations
Data Warehouse
Designed for analysis of
business measures by
categories and attributes
OLTP database
Designed for real
time business
operations
1
Optimized for a common set
of transactions, usually adding
or retrieving a single row at a
time per table.
2
Optimized for bulk loads and
large, complex,
unpredictable queries that
access many rows per
table.
OLTP database
Optimized for a common set
of transactions, usually adding
or retrieving a single row at a
time per table.
Data Warehouse
Optimized for bulk loads and
large, complex, unpredictable
queries that access many rows
per table.
1
Optimized for validation of
incoming data during
transactions; uses validation
data tables.
2
Loaded with
consistent, valid
data; requires no
real time validation.
OLTP database
Data Warehouse
Optimized for validation of
incoming data during
transactions; uses validation
data tables.
Loaded with
consistent, valid
data; requires no
real time validation.
1
Supports few concurrent
users relative to OLTP.
2
Supports thousands of
concurrent users.
Data Warehouse
Supports few concurrent
users relative to OLTP.
OLTP
database
Supports thousands of
concurrent users.
Data, Information & Knowledge
 Data is just symbols
 Information is data that are processed to
be useful; provides answers to "who",
"what", "where", and "when" questions
 Knowledge is application of data and
information; answers "how" questions
Data
 Data is raw. It simply exists and has no
significance beyond its existence (in and of
itself). It can exist in any form, usable or
not. It does not have meaning of itself.
 In computer parlance, a spreadsheet
generally starts out by holding data.
Information
 Information is data that has been given
meaning by way of relational connection.
This "meaning" can be useful, but does
not have to be.
 In computer parlance, a relational database
makes information from the data stored
within it.
Knowledge
 Knowledge is the appropriate collection of
information, such that it's intent is to be
useful.
 Summaries of information in a database
for example. Or modeling and simulation
tools exercise some type of stored
knowledge.
Examples - Supermarket
 OLTP
 Event is 3 cans of soup and 1 box of crackers
bought; update database to reflect that event
 OLAP
 Last winter in all stores in northeast, how
many customers bought soup and crackers
together?
 Data Mining
 Are there any interesting combinations of
foods that customers frequently bought
1-36 together?
Copyright © 2005 Pearson AddisonWesley. All rights reserved.
11 Database designing rules
 Rule 1: What is the nature of the application
(OLTP or OLAP)?
 Rule 2: Break your data in to logical pieces,
make life simpler
 Rule 3: Do not get overdosed with rule 2
 Rule 4: Treat duplicate non-uniform data as
your biggest enemy
 Rule 5: Watch for data separated by
separators
Database designing rules
 Rule 6: Watch for partial dependencies
 Rule 7: Choose derived columns preciously
 Rule 8: Do not be hard on avoiding
redundancy, if performance is the key
 Rule 9: Multidimensional data is a different
beast altogether
 Rule 10: Centralize name value table design
 Rule 11: For unlimited hierarchical data
self-reference PK and FK
Normal form examples
 1 NF : First Name, Middle name , Surnamedifferent columns
 2 NF : Syllabus column of 5th standard
should depend on both primary keys roll
no. & standard
 3 NF : Average column depends on marks &
no. of subjects
 Normalization
rules are important
guidelines but taking them as a mark on
stone is calling for trouble.
Rule 1: What is the nature of the
application (OLTP or OLAP)?
 Transactional: End user is more interested in CRUD,
i.e., creating, reading, updating, and deleting records.
The official name for such a kind of database is OLTP.
 Analytical: End user is more interested in analysis,
reporting, forecasting, etc.
- less number of inserts and updates.
- main intention here is to fetch and analyze data as fast
as possible.
- The official name for such a kind of database is OLAP.
Rule 1: What is the nature of the
application (OLTP or OLAP)?
In other words if you think inserts, updates, and deletes
are more prominent then go for a normalized table
design, else create a flat denormalized database
structure.
Rule 2: Break your data into logical
pieces, make life simpler
 The first rule from 1st normal form.
 If your queries are using too many string parsing
functions like substring, charindex, etc apply this
rule
 E.g . Query- student names having “Koirala” and
not “Harisingh”, very complex query
 The better approach would be to break this field
into further logical pieces to write clean and
optimal queries.
Rule 3: Do not get overdosed with rule 2
 Decomposing, is it needed? The decomposition
should be logical.
 It’s rare that you will operate on ISD codes of phone
numbers separately (until your application demands
it). So it would be a wise decision to just leave it as it
can lead to more complications.
Rule 4: Treat duplicate non-uniform data as
your biggest enemy
 Focus and refactor duplicate data, it creates
confusion.
 For instance, in the below diagram, you can see
“5th Standard” and “Fifth standard” means the
same.
Rule 4: Treat duplicate non-uniform data as your
biggest enemy
 One of the solutions -move the data into a different
master table altogether and refer them via foreign
keys. E.g. new master table called “Standards” and
linked the same using a simple foreign key.
Rule 5: Watch for data separated by separators
 The 2nd rule of 1st normal form says avoid repeating groups.
 Too much data stuffed in syllabus column. These fields are
termed as “Repeating groups”. To manipulate this data, the
query would be complex and the performance of the queries
degrades.
Rule 5: Watch for data separated by separators
 Columns which have data stuffed with separators need
special attention and a better approach would be to move
those fields to a different table and link them with keys for
better management.
Rule 6: Watch for partial dependencies
 Watch for fields which depend partially on primary
keys.
 E.g Primary key is created on roll number and
standard.
 The syllabus is associated with the standard in which
the student is studying and not directly with the
student.
 Move the syllabus field and attach it to the Standards
table.
 This rule is the 2nd normal form: “All keys should
depend on the full primary key and not partially”.
Rule 7: Choose derived columns
preciously
Rule 7: Choose derived columns preciously
 OLTP applications: getting rid of derived columns
would be a good
 OLAP :a lot of summations, calculations, these
kinds of fields are necessary to gain performance.
 The 3rd normal form: “No column should depend
on other non-primary key columns”. See the
situation and then decide if you want to implement
the 3rd normal form.
Rule 8: Do not be hard on avoiding
redundancy, if performance is the key
 Need for performance: think about denormalization.
 Normalization: make joins with many tables
 Denormalization: the joins reduce and increase
performance.
Rule 8: Do not be hard on avoiding
redundancy, if performance is the key
Rule 9: Multidimensional data is a different
beast altogether
 OLAP projects mostly deal with multidimensional
data.
 E.g. get sales per country, customer, and date,
where sales figures have three intersections of
dimension data.
Rule 10: Centralize name value table
design
 Name and value tables :has key and some data
associated with the key.
 E.g. currency table and a country table.
 Have only a key and value.
 For such kinds of tables, creating a central table and
differentiating the data by using a type field makes
more sense.
Rule 11: For unlimited hierarchical data selfreference PK and FK
 Unlimited parent child hierarchy.
 E.g. A multi-level marketing scenario where a sales
person can have multiple sales people below them.
For such scenarios, using a self-referencing
primary key and foreign key will help to achieve
the same
Business Models
 Depends on business requirements
 E.g. E-commerce business model
Data Marts
 Data warehouses can support all of an organization’s information
 Data marts have subsets of an organizationwide data warehouse
 Data mart – subset of a data warehouse in which only a focused
portion of the data warehouse information is kept
3-66
Assignment 1
 Differentiate between OLTP and OLAP.
 Explain the design aspects of OLTP &
OLAP
 What is BI & what are its components?
References
 OLTP Vs OLAP ppts
 Notes by Shivprasad koirala