Introduction to Database - Gonzaga Student Web Server

advertisement
Chapter 9:
Data Warehousing
Jason C. H. Chen, Ph.D.
Professor of MIS
School of Business Administration
Gonzaga University
Spokane, WA 99258
chen@jepson.gonzaga.edu
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Objectives
• Definition of terms
• Reasons for information gap between information
needs and availability
• Reasons for need of data warehousing
• Describe three levels of data warehouse architectures
(ETL)
• Describe two components of star schema
• Estimate fact table size
• Design a data mart
• Develop requirements for a data mart
• OLAP, data mining and its applications
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-2
A Solution to the Information Gap
• A solution to bridging the information
data warehouses
gap is the ______
_________ which
consolidate and integrate information
from many different sources and
arrange it in a meaningful format for
making accurate business decisions.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-3
Two issues need to know about D.W.
• 1. A major factor drives the need for data
warehousing
– Businesses need an integrated view of company
information.
• 2. Which of the following organizational trends does
not encourage the need for data warehousing?
–
–
–
–
–
a) Multiple, nonsynchronized systems
b) Focus on customer relationship management
c) Downsizing
d) Focus on supplier relationship management
Downsizing
Answer: ______________
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-4
Need for Data Warehousing
• Integrated, company-wide view of high-quality
information (from disparate databases)
• Separation of operational and informational systems and
data (for improved performance)
Table 9-1 – Comparison of Operational and Informational Systems
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-5
DATA WAREHOUSE
FUNDAMENTALS
• Data warehouse – a logical collection of
information – gathered from many different
operational databases – that supports business
analysis activities and decision-making tasks
• The primary purpose of a data warehouse is to
aggregate information throughout an organization
into a single repository for decision-making
purposes
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-6
Definition
• Data Warehouse:
A subject-oriented, integrated, time-variant, non-updatable
collection of data used in support of management decisionmaking processes
– Subject-oriented: e.g. customers, patients, students, products
• DW is organized around key high-level entities of the enterprise
– Integrated: Consistent naming conventions, formats, encoding
structures; from multiple data sources
– Time-variant: Can study trends and changes
• data in the warehouse contain a time dimension so that they may be
used to study trends and changes.
– Non-updatable: Read-only, periodically refreshed
• Data Mart:
– A data warehouse that is limited in scope
– contains a subset of data warehouse information
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-7
History Leading to Data
Warehousing
• Improvement in database technologies, especially
relational DBMSs
• Advances in computer hardware, including mass
storage and parallel architectures
• Emergence of end-user computing with powerful
interfaces and tools
• Advances in middleware, enabling heterogeneous
database connectivity
• Recognition of difference between operational and
informational systems
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-8
Need for Data Warehousing
• Integrated, company-wide view of highquality information (from disparate
databases)
• Separation of operational and informational
systems and data (for improved
performance)
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-9
Issues with Company-Wide View
•
•
•
•
•
Inconsistent key structures
Synonyms
Free-form vs. structured fields
Inconsistent data values
Missing data
See figure 9-1 for example
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-10
Figure 9-1
Examples of
heterogeneous
data
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-11
Database vs. Datawarehouse
DBMS
???
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Database
Data Warehouse
TM 9-12
Database vs. Datawarehouse
DBMS
Data Mining
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Database
Data Warehouse
TM 9-13
Database vs. Datawarehouse
DBMS
???
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Database
Datawarehouse
TM 9-14
Data Marts and the Data Warehouse
Legacy
systems feed
data to the
warehouse.
The
warehouse
feeds
specialized
information
to
departments
(data marts).
Legacy Systems
Finance
Data Mart
Sales
Data Mart
Operational Data
Store
Marketing
Data Mart
ETL
Operational Data
Store
Accounting
Data Mart
ETL
Operational Data
Store
Organizational
Data
Warehouse
Operational Data
Store
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-15
The Data Mart is More Specialized
The data
mart serves
the needs of
one business
unit, not the
organization.
Organizational Data
Warehouse
Corporate
Highly granular data
Normalized design
Robust historical data
Large data volume
Data Model driven data
Versatile
General purpose DBMS
technologies
Finance
Data Mart
Sales
Data Mart
Marketing
Data Mart
ETL
Accting
Data Mart
Data Marts
Organizational
Data
Warehouse
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Departmentalized
Summarized, aggregated data
Star join design
Limited historical data
Limited data volume
Requirements driven data
Focused on departmental needs
Multi-dimensional DBMS
technologies
TM 9-16
Organizational Trends
Motivating Data Warehouses
• No single system of records
• Multiple systems not synchronized
• Organizational need to analyze
activities in a balanced way
• Customer relationship management
• Supplier relationship management
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-17
Separating Operational and
Informational Systems
• Operational system – a system that is used to run
a business in real time, based on current data; also
called a system of record
• Informational system – a system designed to
support decision making based on historical pointin-time and prediction data for complex queries or
data-mining applications
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-18
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-19
19
Position of the Data Warehouse Within
the Organization
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-20
DATA WAREHOUSE
FUNDAMENTALS (cont.)
• Extraction, transformation, and loading
(ETL) – a process that extracts information
from internal and external databases,
transforms the information using a common
set of enterprise definitions, and loads the
information into a data warehouse
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-21
Data Warehouse Architectures
• Independent Data Mart
• Dependent Data Mart and Operational
Data Store
• Logical Data Mart and Real-Time Data
Warehouse
• Three-Layer architecture
All involve some form of extraction, transformation and loading (ETL)
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-22
Figure 9-2 Independent data mart
data warehousing architecture
Data marts:
Mini-warehouses, limited in scope
L
T
E
Separate ETL for each
independent data mart
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Data access complexity
due to multiple data marts
23
TM 9-23
Figure 9-3 Dependent data mart with
ODS provides option for
operational data store: a three-level architecture obtaining current data
L
T
E
Single ETL for
enterprise data warehouse (EDW)
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Simpler data access
Dependent data marts
loaded from EDW
24
TM 9-24
Figure 9-4 Logical data mart and real
time warehouse architecture
ODS and data warehouse
are one and the same
L
T
E
Near real-time ETL for
Data Warehouse
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Data marts are NOT separate databases,
but logical views of the data warehouse
 Easier to create new data marts
25
TM 9-25
The ETL Process –
another perspective and example
•
•
•
•
Capture/Extract - E
Scrub or data cleansing
Transform - T
Load and Index - L
ETL = Extract, transform, and load
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-26
Capture/Extract…obtaining a
snapshot of a chosen subset of
the source data for loading
into the data warehouse
Static extract = capturing a
snapshot of the source data at
a point in time
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Incremental extract =
capturing changes that have
occurred since the last static
extract
TM 9-27
Scrub/Cleanse…uses pattern
recognition and AI techniques to
upgrade data quality
Fixing errors: misspellings,
Also: decoding, reformatting, time
erroneous dates, incorrect field usage,
mismatched addresses, missing data,
duplicate data, inconsistencies
stamping, conversion, key generation,
merging, error detection/logging,
locating missing data
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-28
Transform = convert data from format
of operational system to format of data
warehouse
Record-level:
Selection – data partitioning
Joining – data combining
Aggregation – data summarization
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Field-level:
single-field – from one field to one field
multi-field – from many fields to one, or
one field to many
TM 9-29
Load/Index= place transformed data
into the warehouse and create indexes
Refresh mode: bulk rewriting of
Update mode: only changes in
target data at periodic intervals
source data are written to data
warehouse
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-30
Information Cleansing or
Scrubbing
• An organization must maintain high-quality
data in the data warehouse
• Information cleansing or scrubbing – a
process that weeds out and fixes or discards
inconsistent, incorrect, or incomplete
information
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-31
Information Cleansing or
Scrubbing
• Standardizing Customer name from Operational Systems
Information Cleansing or
Scrubbing
Information Cleansing or
Scrubbing
• Accurate and complete information
Representation of Data in DW
• Dimensional Modeling – a retrieval-based system that supports
high-volume query access
– Not only accommodate but also boost the processing of complex
multidimensional queries.
• Two means
Star
– 1. ______schema
– the most commonly used and the simplest style of
dimensional modeling
• Contain a fact table surrounded by and connected to several dimension
tables
• Fact table contains the descriptive attributes (numerical values) needed
to perform decision analysis and query reporting, and foreign keys are
used to link to dimension table.
• Dimension tables contain classification and aggregation information
about the values in the fact table (i.e., attributes describing the data
contained within the fact table).
Snowflakes schema – an extension of star schema where the diagram
– 2. ___________
resembles a snowflake in shape
Fact Table vs. Dimensional Table
Many to Many Relationship (M:N)
pk
Dimensional
Table
cpk
fk
fk
Fact Table
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
pk
Dimensional
Table
TM 9-36
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-37
Figure 9-5 Three-layer data architecture for a data warehouse
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-38
Figure 9-6
Example of DBMS
log entry
Data Characteristics
Status vs. Event Data
Status
Event = a
database action
(create/ update/
delete) that
results from a
transaction
Status
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-39
Figure 9-7
Transient
operational data
Data Characteristics
Transient vs. Periodic Data
With transient
data, changes
to existing
records are
written over
previous
records, thus
destroying the
previous data
content
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-40
Figure 9-8 Periodic
warehouse data
Data Characteristics
Transient vs. Periodic Data
Periodic
data are
never
physicall
y altered
or
deleted
once they
have
been
added to
the store
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-41
Other Data Warehouse Changes
•
•
•
•
•
•
New descriptive attributes
New business activity attributes
New classes of descriptive attributes
Descriptive attributes become more refined
Descriptive data are related to one another
New source of data
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-42
Data Reconciliation
• Typical operational data is:
– Transient – not historical
– Not normalized (perhaps due to denormalization for
performance)
– Restricted in scope – not comprehensive
– Sometimes poor quality – inconsistencies and errors
• After ETL, data should be:
–
–
–
–
–
Detailed – not summarized yet
Historical – periodic
Normalized – 3rd normal form or higher
Comprehensive – enterprise-wide perspective
Timely – data should be current enough to assist decisionmaking
– Quality controlled – accurate with full integrity
TM 9-43
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
Derived Data
• Objectives
–
–
–
–
–
Ease of use for decision support applications
Fast response to predefined user queries
Customized data for particular target audiences
Ad-hoc query support
Data mining capabilities
• Characteristics
– Detailed (mostly periodic) data
– Aggregate (for summary)
– Distributed (to departmental servers)
Most common data model = star schema
(also called “dimensional model”)
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-44
Figure 9-9 Components of a star schema
Fact tables contain factual
(descriptive) or quantitative
data (numerical values)
1:N relationship between
dimension tables and fact
tables
Dimension tables are denormalized
to maximize performance
Dimension tables contain descriptions
about the subjects of the business
(values in the fact table)
Excellent for ad-hoc queries, but bad for online transaction processing
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-45
Figure 9-10 Star schema example
Fact table provides statistics for sales
broken down by product, period and
store dimensions
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-46
Figure 9-11 Star schema with sample data
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-47
Surrogate Dimension Keys
• Dimension table keys should be surrogate (nonintelligent and non-business related), because:
– Business keys may change over time
– Helps keep track of nonkey attribute values for
a given production key
– Surrogate keys are simpler and shorter
– Surrogate keys can be same length and format
for all keys
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-48
Grain of the Fact Table
• Granularity of Fact Table–what level of detail do you
want?
– Transactional grain–finest level
– Aggregated grain–more summarized
– Finer grains  better market basket analysis
capability
– Finer grain  more dimension tables, more rows
in fact table
– In Web-based commerce, finest granularity is a
click
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-49
Duration of the Database
– Natural duration–13 months or 5 quarters
– Financial institutions may need longer duration
– Older data is more difficult to source and
cleanse
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-50
Size of Fact Table
• Depends on the number of dimensions and the grain of
the fact table
• Number of rows = product of number of possible values
for each dimension associated with the fact table
• Example: assume the following for Figure 9-11:
• Total rows calculated as follows (assuming only half the
products record sales for a given month):
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-51
Break ! (Ch. 9)
Exercise
# 5 – a, b, c (p. 422)
With the following
assumptions:
HW
#3 (p.422) – a, b, c
Assume one
professor per course
section
1. The length of a fiscal period is
one month
2. The data mart will contain five
years of historical data
3. Approximately 5 percent of the
policies experience some type of
change each month
4. There are 8 fields in each record
(row)
ALL computations for b & c should be shown to get credits .
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-52
Figure 9-12 Modeling dates
Fact tables contain time-period data
 Date dimensions are important
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-53
Variations of the Star Schema
• Multiple Facts Tables
– Can improve performance
– Often used to store facts for different combinations of
dimensions
– Conformed dimensions
• Factless Facts Tables
– No nonkey data, but foreign keys for associated dimensions
– Used for:
• Tracking events
• Inventory coverage
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-54
Figure 9-13 Conformed dimensions
Two fact tables  two (connected) start schemas.
Conformed
dimension
Associated
with multiple
fact tables
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-55
Figure 9-14a Factless fact table showing occurrence of
an event
No data in fact
table, just keys
associating
dimension records
Fact table forms
an n-ary
relationship
between
dimensions
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-56
56
Normalizing Dimension Tables
• Multivalued Dimensions
– Facts qualified by a set of values for the same business
subject
– Normalization involves creating a table for an associative
entity between dimensions
• Hierarchies
– Sometimes a dimension forms a natural, fixed depth
hierarchy
– Design options
• Include all information for each level in a single denormalized table
• Normalize the dimension into a nested set of 1:M table
relationships
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-57
Figure 9-15 Multivalued dimension
Helper table is an associative entity that
implements a M:N relationship between
dimension and fact.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-58
Figure 9-16 Fixed product hierarchy
Dimension hierarchies help to provide
levels of aggregation for users wanting
summary information in a data
warehouse.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-59
Slowly Changing Dimensions
(SCD)
• Need to maintain knowledge of the past
• One option: for each changing attribute,
create a current value field and many oldvalued fields (multivalued)
• Better option: create a new dimension table
row each time the dimension object
changes, with all dimension characteristics
at the time of change
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-60
Figure 9-18 Example of Type 2 SCD Customer dimension table
The dimension table contains several records for the same
customer. The specific customer record to use depends on the
key and the date of the fact, which should be between start
and end dates of the SCD customer record.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-61
Figure 9-19 Dimension segmentation
For rapidly changing attributes (hot attributes), Type 2 SCD
approach creates too many rows and too much redundant
data. Use segmentation instead.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-62
10 Essential Rules for Dimensional
Modeling
• Use atomic facts
• Create single-process fact
tables
• Include a date dimension
for each fact table
• Enforce consistent grain
• Disallow null keys in fact
tables
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
• Honor hierarchies
• Decode dimension
tables
• Use surrogate keys
• Conform dimensions
• Balance requirements
with actual data
TM 9-63
Other Data Warehouse Advances
• Columnar databases
– Issue of Big Data (huge volume, often unstructured)
– Columnar databases optimize storage for summary data of
few columns (different need than OLTP)
– Data compression
– Sybase, Vertica, Infobright,
• NoSQL
– “Not only SQL”
– Deals with unstructured data
– MongoDB, CouchDB, Apache Cassandra
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-64
The User Interface
Metadata (data catalog)
• Identify subjects of the data mart
• Identify dimensions and facts
• Indicate how data is derived from enterprise data
warehouses, including derivation rules
• Indicate how data is derived from operational data store,
including derivation rules
• Identify available reports and predefined queries
• Identify data analysis techniques (e.g. drill-down)
• Identify responsible people
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
65
TM 9-65
Online Analytical Processing (OLAP) Tools
• The use of a set of graphical tools that provides
users with multidimensional views of their data
and allows them to analyze the data using simple
windowing techniques
• Relational OLAP (ROLAP)
– Traditional relational representation
• Multidimensional OLAP (MOLAP)
– Cube structure
• OLAP Operations
– Cube slicing–come up with 2-D view of data
– Drill-down–going from summary to more detailed
views
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-66
Multidimensional Analysis
• Databases contain information in a series of
two-dimensional tables
• In a data warehouse and data mart,
information is multidimensional, it contains
layers of columns and rows
– Dimension – a particular attribute of
information
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-67
Figure 9-21 Slicing a data cube
REGION
CUSTOMER
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-68
Multidimensional Analysis
• Cube – common term for the representation of
multidimensional information
Figure 9-22:
Example of
drill-down
Starting with summary
data, users can obtain
details for particular
cells
Summary report
Drill-down with color added
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-70
Business Performance Mgmt (BPM)
Figure 9-25
Sample Dashboard
BPM systems allow
managers to measure,
monitor, and manage
key activities and
processes to achieve
organizational goals.
Dashboards are often
used to provide an
information system in
support of BPM.
Charts like these are examples of data visualization, the representation
of data in graphical and multimedia formats for human analysis.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-71
OLAP and its Applications
• What software and function that enable you
to create OLAP and its applications?
• ANSWER
– EXCEL with
– Pivot Table
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-72
Multidimensional Analysis
• Data mining – the process of analyzing data to
extract information not offered by the raw data
alone
• To perform data mining users need data-mining
tools
– Data-mining tool – uses a variety of techniques to find
patterns and relationships in large volumes of information
and infers rules that predict future behavior and guide
decision making
• An example
– Grocery Store in UK (see next slide)
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-73
CRM and Data Mining (BI)Example
•
•
•
•
A Grocery store in U.K. with the following “patterns” found:
Every Thursday afternoon
Young Fathers (why?) shopping at store
Two of the followings are always included in their shopping list
– Diapers and
– Beers
• What other decisions should be made as a store manager (in terms
of store layout)?
• Short term vs. Long term
– This is an example of cross-selling
– Other types of promotion: up-sell, bundled-sell
• IT (e.g., BI) helps to find valuable information then decision
makers make a timely/right decision for improving/creating
competitive advantages.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-74
More on OLTP vs. OLAP
Fig. Extra-a: A simple
database with a relation
between two tables.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
• The figure depicts a relational
database environment with two
tables.
• The first table contains information
about pet owners; the second,
information about pets. The tables
are related by the single column
they have in common: Owner_ID.
• By relating tables to one another,
redundancy of
we can reduce ____________
data and improve database
performance.
• The process of breaking tables
apart and thereby reducing data
redundancy is called
normalization
_______________.
TM 9-75
OLTP vs. OLAP (cont.)
• Most relational databases which are designed to handle a high number
of reads and writes (updates and retrievals of information) are referred
OLTP (OnLine Transaction Processing) systems.
to as ________
• OLTP systems are very efficient for high volume activities such as
cashiering, where many items are being recorded via bar code scanners
in a very short period of time.
• However, using OLTP databases for analysis is generally not very
efficient, because in order to retrieve data from multiple tables at the
joins must be used.
same time, a query containing ________
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-76
OLTP vs. OLAP (cont.)
• In order to keep our transactional databases running quickly and smoothly,
we may wish to create a data warehouse. A data warehouse is a type of
large database (including both current and historical data) that has been
denormalized
_____________ and archived.
• Denormalization is the process of intentionally combining some tables
into a single table in spite of the fact that this may introduce duplicate
data in some columns.
Fig. Extra-b: A combination of the tables into a single dataset.
• The figure depicts what our simple example data might look like if it
were in a data warehouse. When we design databases in this way, we
reduce the number of joins necessary to query related data, thereby
speeding up the process of analyzing our data.
OLAP
• Databases designed in this manner are called __________
(OnLine
Analytical Processing) systems.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-77
OLTP vs. OLAP (cont.)
• Transactional systems and analytical systems have conflicting
purposes when it comes to database speed and performance. For this
reason, it is difficult to design a single system which will serve both
purposes. This is why data warehouses generally contain archived
data. Archived data are data that have been copied out of a
transactional database.
• Denormalization typically takes place at the time data are copied
out of the transactional system. It is important to keep in mind that
if a copy of the data is made in the data warehouse, the data may
synch . This happens when a copy is made in the
become out-of-______
data warehouse and then later, a change to the original record is
made in the source database.
• Data mining activities performed on out-of-synch records may be
useless, or worse, misleading.
• An alternative archiving method would be to move the data out of
the transactional system. This ensures that data won’t get out-ofsynch, however, it also makes the data unavailable should a user of
the transactional system need to view or update it.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-78
Data Mining
• Knowledge discovery using a blend of
statistical, AI, and computer graphics
techniques
• Goals:
– Explain observed events or conditions
– Confirm hypotheses
– Explore data for new or unexpected relationships
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
79
TM 9-79
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-80
DATA MINING
• Data-mining software includes many forms
of AI such as neural networks and expert
systems
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-81
Data Mining Examples
• A telephone company used a data mining tool to
analyze their customer’s data warehouse. The data
mining tool found about 10,000 supposedly
residential customers that were expending over
$1,000 monthly in phone bills.
• After further study, the phone company
discovered that they were really small business
owners trying to avoid paying business rates
*
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-82
Data Mining Examples (cont.)
• 65% of customers who did not use the credit
card in the last six months are 88% likely to
cancel their accounts.
• If age < 30 and income <= $25,000 and credit
rating < 3 and credit amount > $25,000 then
the minimum loan term is 10 years.
• 82% of customers who bought a new TV 27" or
larger are 90% likely to buy an entertainment
center within the next 4 weeks.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-83
Sustainable Competitive Advantages
• Any sustainable competitive advantages?
• How can an organization sustain its
competitive advantage?
• Firms may create/improve their competitive
advantages only if they:
– have capacity to learn,
– employ revenue management approach,
– learning to learn and learning to
change (life-long learning
environment)
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
84
TM 9-84
BUSINESS INTELLIGENCE
• Business intelligence – information that
people use to support their decision-making
efforts
• Principle BI enablers include:
– Technology
– People
– Culture
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-85
Working Smarter , Not Harder
• Overlapping Human/Organizational (Culture, Process)/
Technological factors in BI/KM:
PEOPLE
ORGANIZATIONAL
PROCESSES
Knowledge
TECHNOLOGY
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
N
TM 9-86
Essential Value Propositions for
a Successful Company
• Business Model
• Core Competency
• Execution
– Set corporate goals and get executive
sponsorship for the initiative
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-87
Relationship between the Organizational Knowledge
and Core Competency
Core
competenc
y
A specific
business
context
Can be transferred and
reused efficiently and
effectively across
functional areas
(sharing and
collaboration)
Best
Practices
IT
People
Culture
Organizational
knowledge
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-88
BI: Big Data And Data Warehousing
• Two paradigms in BI:
Data __________
Warehouse and ___
Big_____.
Data
– _____
– Both are competing each other for turning data into
actionable information.
• However, in recent years, the variety and
complexity of data made data warehouse incapable
of keeping up the changing needs.
• Big Data
– A new paradigm that the world of IT was forced to
volume of the structured data
develop, not because the _______
variety and the _______
velocity .
but the ______
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-89
Introduction to Big Data Analytics
• Big Data?
–
–
–
–
–
Not just big!
olume
V______
ariety
V______
elocity
V______
structured, unstructured, or in a stream
• Two aspects for studying “Big Data”
storing and __________
processing /analyzing “Big Data”
– _______
computation to the data instead of pushing
• Push ____________
data to a computing mode.
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems
TM 9-90
Download