CHAPTER ONE

advertisement
CHAPTER SIX
Databases and Data Warehouses
Information Granularity

Refers to the level of detail of
information


Detailed (POS transaction)
Course (Global sales totals)
Transactional vs. Analytical
Information

Transactional information comes
from a business process



A bank deposit
A credit card charge
Analytical information uses
transactional data for the purposes
of decision making


Account balance trends
Using credit card history to detect fraud
Transactional vs. Analytical
Information
Information Dimensions

Information timeliness



Information quality




Obsolete information is useless
Today’s information needs to be
provided in real time or near real time
Wrong information is useless
Redundant information can be the
cause of errors
Information must be complete
Data inconsistency and data
integrity
Database Management

Characteristics





Complex
Databases often spread across multiple
servers
Databases often spread across multiple
physical disks
Fault tolerance is critical
Databases may be distributed
Database Vendors


The industry has consolidated
IBM



Oracle
Microsoft



DB2 Universal
SQL Server
Access
Sun (MySQL)

Is now Oracle
Database Performance





Transaction Processing Performance
Council provides standard
benchmarks
TPC-C – Online transaction
processing
TPC-E – Online brokerage
transactions
TPC-H – Ad-hoc decision support
TPC-W – Web / E-commerce
Database Performance (TPC-C)




Multiple transaction types
Independent of software and
hardware
Scalable
Basis is online transaction
processing (OLTP)
1960s Data Management

These are legacy systems



Characterized by traditional file
processing
Data processing was sequential


Batch processing
Not possible to directly locate a particular file
record
Data dependent on the programs that
used the data

Program data dependence
1970s Data Management

Batch processing gives way to on line
transaction processing



Technologies



Files stored on disk rather than tape
Any record can be located in the same amount
of time
Indexed Sequential Access Method (ISAM)
Virtual Sequential Access Method (VSAM)
Direct Access files

Use a hashing function to derive record keys
1980s Data Management


Databases are becoming
commonplace
Personal computer databases are
evolving


DBase
R-Base
1990s Data Management




Huge data stores and transaction
processing capabilities
Distributed databases
Object-oriented databases
6 Million+ transactions per second
Realities of a DBMS






Data centric rather than application centric
Can be a repository for all an organization’s
data
Databases tend to be centralized
Queries get data from a DBMS
 SQL is the standard query language
Report generators create printed and Webbased reports
Applications interface with DBMS
Types of Databases

Database models include:




Hierarchical database model – A treebased structure
Network database model –
Mathematically, a directed graph
Relational database model – stores
information in the form of logically
related two-dimensional tables
Object-oriented databases
Elements of a Database

Logical view and physical view


Users see and work with the logical
view
Physical view is controlled by the
database management system itself
Entities and Attributes

Relational databases store
information in tables (entities)


Customer / order / product
Tables contain fields (attributes)

Customer name, address
Keys

Each table has a primary key that
uniquely identifies each record



Natural keys have some meaning
(stock symbol)
Artificial keys have no intrinsic meaning
(your R number)
Foreign keys are used to link tables
in one-to-many relationships
Database Interaction
Advantages of an RDMS
(Scalability)

Database can scale to the terabyte
or petabyte range


NSA maintains 1.9 trillion telephone
call records
Large databases can span several
servers and storage devices
Advantages of an RDBMS
(Redundancy)

Databases can be configured to
write duplicate (redundant)
information


Citibank
Journaling and checkpointing are
supported
Advantages of an RDBMS
(Integrity)


Relational integrity constraints are
rules that apply to the relationships
between tables
Business integrity constraints
enforce business rules

Not really a part of the DBMS itself
Advantages of an RDBMS
(Information Security)

A DBMS supports advanced access
rights




By
By
By
By
table and fields
time of day
location
row information
Data-driven Web Sites

Nearly all transactional Web sites
rely on a database





Amazon
Your bank
Any shopping cart application
Ebay or Craig’s List
Facebook and You Tube
Database Integration

Databases often need to be
integrated



Because of mergers and acquisitions
Because of organizational changes
We are referring to connections to
multiple databases
Data Warehouses (Introduction)


Central source for clean data
May contain internal or external data




Use to spot hidden patterns in data
May be integrated with operational
database
Parts of a data warehouse are called data
marts
Data warehouses contain an analytical
component
Cleansing Data

Data is often obtained from a
myriad of sources




External lists
Internal databases
Other databases
This data must be cleansed and
sanitized to remove

Redundancy / errors / etc…
Data Warehouses (Illustration)
Multidimensional Analysis


Data are often analyzed as 3dimensional cubes
Cubes are then ‘sliced and diced’ to
look at various layers
Multidimensional Analysis
(Illustration)
The cost of Perfect Information
Database Design (Introduction)

In the systems process, we design
before we implement




Requirements specification
Conceptual design
Logical design
Physical Design
Database Design Tools

Unified Modeling Language (UML)




Visio
Rational Rose
Entity relationship diagrams
describes relationships between
data
Normalization eliminates redundant
data
Database Management HR




Database administrators
Data managers
Programmers and systems analysts
Data security
BUSINESS INTELIGENCE /
DATA MINING
Business Intelligence
(Introduction)




Simply put, it’s internal and
external data used to support better
decision making
It’s challenging to sift through the
mountains of data
It requires cross-functional
collaboration between systems
More in the next chapter but we use
ERP systems to improve business
intelligence
Business Intelligence (Industries)


BI applies to all industries
Retail and sales


Banking


Understanding procurement and
distribution (SCM) / customers (CRM)
Understand credit worthiness / fraud
behavior
Insurance

Forecast claim risk and understand at –
risk customers
Business Intelligence (Industries)

Airlines


Routing planes / minimize turnaround
time (Southwest)
Marketing



Demographics
Sell based on known customer behavior
(Harrah’s)
Amazon
Business Intelligence (Levels)

Operational


Tactical


Short term (Dell ordering supplies)
Strategic


Day-to-day operations (building a Dell)
Long term organizational goals
The systems that provide BI
typically do so at all levels
BI Levels (Illustration)
BI and Latency


From the time of acquisition, how
long does it take to analyze
(analysis latency)
Time to make a decision based on
the analysis

E-transactions significantly reduce
latency
Data Mining (Introduction)



Data gets mined (analyzed) from
data contained in a data warehouse
or data mart
Specialized tools are used to
analyze data for ‘interesting
nuggets’
Ways to mine


Drill down (general to specific)
Drill up (specific to general)
Data Mining (Clustering)


Cluster analysis groups data by trait
or traits
Examples


Don’t drink the water in Fallon
Segment customers by zip codes
Data Mining (Association)

Answers the question “What traits
are associated with other traits”

When I stay at Harrah’s,
I gamble
 I eat at the Sage room


When I stay in Vegas,

I gamble more
Data Mining (Statistical Analysis)

It’s basic statistics



Analysis of variance
Correlation coefficients
Etc…
BI Benefits

We can understand what’s
happening inside and outside a
department




Sales knows about product inventory
levels and production schedules
Production knows about sales and sales
forecasts
Finance knows about the sales
forecasts too
This information is provided in near
real time
Quantifying BI

Some benefits can be clearly
quantified






Costs went down
Productivity increased
Inventory levels were optimized 10%
Some are indirectly quantified
Some benefits are intangible
Sometimes, we get unexpected
results
Download