Acxiom Visit: Data Warehousing Infrastructure

advertisement
Data Warehousing at
Acxiom
Paul Montrose
Agenda
• Acxiom Overview
• Data warehouses
• Transactional databases
• Hybrid databases
• What’s new/future innovations
• Summary
• Questions and answers
Acxiom Overview
At Acxiom, we create and deliver
Customer and Information Management
Solutions that enable many of the largest,
most respected companies in the world to
build great relationships with their
customers. Acxiom achieves this by
blending data, technology and services to
provide the most advanced customer
information infrastructure available in the
marketplace today.
Acxiom Overview
Acxiom customizes industry-specific solutions to solve the
unique business issues of the Automotive, Financial Services,
Government Services, Healthcare, Insurance, Media, Retail,
Technology, Telecommunications, as well as Travel and
Leisure industries.
Every solution that Acxiom offers is built from our core
competencies:
• CDI/Technology
• Data
• Database
• Consulting and Analytics
• Privacy Leadership
• IT Outsourcing
Acxiom Overview
Customer and Information Management Solutions for
marketing, risk and IT help companies:
• Improve acquisition, retention, cross sell, up sell and
channel management
• Improve authorization, increase collections and
reduce fraud
• Increase operational efficiencies and improve enduser satisfaction
Data Warehouses
The characteristics of an Acxiom data
warehouse generally are...
•
•
•
•
•
•
•
•
Large multi-terabyte databases
Large periodic sequential data loads
Denormalized database schema
Sequential reads/full table scans
Little or no indices
Little or no transaction logging
Robust periodic backup solutions
Performance measured using
megabytes/gigabytes per second (MBPS,
GBPS)
Data Warehouses
IBM
The processing platform is
generally a large global
class server or cluster of
servers running UNIX.
The database is;
A large vertical
database that is
denormalized with few
tables but very long
with sorted data and
are sometimes
several billion rows.
The data is striped
across the storage in
a manner that
prevents physical hot
spots and takes
advantage of the wide
bandwidth.
Database
The storage subsystem is very fast
with wide bandwidth
and high levels of
redundancy which
permits the ability to
move large amounts
of sequential data in
a very short time.
Data Warehouses
IBM
Transactional Databases
The characteristics of an Acxiom transactional
database generally are...
• Small, usually no larger than a few terabytes
• Random and simultaneous inserts, updates,
deletes, and queries
• Random reads and writes
• Normalized database schema
• Transaction logging and archiving with
incremental and periodic backup solutions
• Generally sub-second response required per
transaction taking into account concurrency
• Performance measured using transactions per
second (TPS) and I/O latency
Transactional Databases
IBM
The processing platform is
generally a medium/enterprise
class server
The database is;
A normalized database
that utilizes lookup
tables.
The data is stored
randomly within a table
but striped across the
storage to prevent
physical hot spots.
Database
The storage subsystem is very fast
with low latency and
nominal bandwidth
and high levels of
redundancy which
permits the ability to
move small amounts
of selected data
quickly.
Transactional Databases
IBM
Hybrid Databases
The characteristics of an Acxiom hybrid
database generally are...
• Medium sized, usually three to ten terabytes
• Random and simultaneous inserts, updates, deletes, and
queries
• Random and sequential reads and writes
• Loosely normalized database schema
• Indices used sparingly
• Usually a batch maintenance process
• Transaction logging and archiving with incremental and
periodic backup solutions
• Generally sub-second response required per transaction
taking into account concurrency
• Performance measured using TPS, I/O latency, and MBPS
Hybrid Databases
IBM
The processing platform is
generally a medium sized
global class server
The database is;
A large vertical
database that is loosely
normalized with few
tables but very long
with sorted data and
are sometimes more
than a billions rows.
The data is striped
across the storage in a
manner that prevents
physical hot spots and
takes advantage of the
wide bandwidth.
Database
The storage subsystem is very fast
with wide bandwidth
and high levels of
redundancy which
permits the ability to
move large amounts
of random and
sequential data in a
very short time.
Hybrid Databases
IBM
What’s New/
Future Innovations
Grid or scale-out environments...
• Utilize low cost commodity based servers
• Low cost/no cost operating systems
• Many servers can be working on one problem with the
aggregate processing power being more that one large
server for less money
• Not locked into a single vendor or supplier
• When adding a new node, able to use current
technology at a lower price
• Need to understand and factor in peripheral costs such
as network, administration, data center etc.
Parallel
Grid
Clustered
Grid
IBM
server
IBM
server
IBM
server
IBM
server
IBM
server
IBM
server
pSeries
pSeries
pSeries
pSeries
pSeries
DB
DB
DB
DB
DB
DB
OS
OS
DB
pSeries
Distributed Grid Database
• Shared nothing environment, each partition has its own
resources allowing unlimited scalability (up to 999
partitions).
Any partition can receive connections and
• Centralized
management
of partitioned
environment.
distribute
queries among
the other
nodes.
• Data is equally distributed across all partitions.
Summary
• Understand the process in which the database is
to be used and fashion a solution to meet the
requirements and customer expectations
• Even though a DBA may only be responsible for
the database, many factors such as operating
system and hardware configuration affect the
functionality of the database and thus are a
concern to the DBA. A DBA must relate the
database to its environment to achieve an
optimized solution.
• A large multi-terabyte database is not a scary
monster, it is the same as dealing with a smaller
database, just add a few more zeros.
Questions?
Download