Teradata Technical Overview

advertisement
Teradata
Leaders in Enterprise Data Warehousing
John Tulley
Vice President, Teradata Canada
Email: John.tulley@ncr.com
Office: 905-478-8997
NCR Corporate Overview
2004 Revenue
by Business Unit
• Fortune 500 company
• Global operations in more than 100
countries & territories
• 28,500 employees
Teradata
Financial
Retail
Systemedia
Customer Service
Payment & Imaging
Other
• 2004 Revenue $5.984B
• 1999-2004 >51% revenue growth
Retail
Solutions
Teradata
Data Warehouse
Financial
Solutions
Systemedia
Worldwide
Customer
Services
2
Top Industry Leaders Rely on Teradata
Teradata Top 10
80% of Top 10
Global Telco Firms
60% of Top 10
Most Admired
Global Companies
60% of Top 10
Global Airlines
50% of Top 10
Global Retailers
50% of the Top 10
Transportation
Logistic Firms
FORTUNE Global Rankings, July 2005
• Leading industries
> Banking
> Government
> Insurance & Healthcare
> Manufacturing
> Retail
> Telecommunications
> Transportation Logistics
> Travel
• World class customer list
> More than 800 customers
> Over 1200 installations
• Global presence
> Over 100 countries
• 4,000 world-wide professionals
dedicated to data warehousing
3
The Teradata Difference
What We Do….
•
•
•
•
•
•
•
Enterprise data warehouse
Windows 2003/Unix/Linux scales from Intel laptop to MPP
Analytic capabilities transform data into information.
Extreme high availability
Industry leader in analytical applications
Integration with SAP, Siebel, Hyperion
Partnerships include Accenture, Bearingpoint,
CAPGemini, Deloitte, EDS, Lockheed Martin
• Strong customer references
All we do is Data Warehousing!
4
Teradata - the recognized leader in data warehousing
and high-performance decision analytics.
….Gartner ASEM
IBM
S/390
OS/390
DB2 EEE
Sun
Enterprise
Solaris
Oracle
HP
HP9000
HP-UX
Oracle
IBM SP
RS/6000
AIX
DB2 EEE
Compaq
Alpha
Tru64
Oracle
Teradata
Generic
Unisys
Intel IA-32
ES7000
Win2000
Win2000
SQL Server SQL Server
Data Mgmt.
Data Admin.
Scalability
and Suitability
Concurrent
Query Mgmt.
DW Track
Record
Query Perform.
Source: Gartner ASEM Ratings 2004
Worst
Best
5
Industry Leadership Recognition
• Gartner - “Dominant Lead” – 5th Consecutive Year
> “DBMS is surely the place where NCR Teradata sets the gold standard. As
in previous years, the Teradata score was 98%, leaving little scope (and need) for
improvement.”
–
Gartner's [Application Server Evaluation Model] ASEM Data Warehouse Server Update, A. Butler, K. Strange, J.
Enck, M. Chuba, November 2004
> Teradata[database management system] DBMS capabilities remain
unchallenged by its competitors in the market.”
–
Gartner’s Magic Quadrant for Data Warehouse DBMSs, 2004, Kevin H. Strange, June 2004
> “Teradata continues to drive a strong vision.”
–
Gartner Research, MarketScope: Customer Relationship Marketing, 1Q04, G. Herschel, J. Radcliffe, Feb 2004
> Gartner Dataquest recognized Teradata as the growth leader in the RDBMS
market, with above market growth of 17.4%. 2005
> Teradata is rated “Positive” in Gartner’s MarketScope for Campaign Management,
the highest rating awarded 2005
• META Group
> “Teradata has displayed unmatched (but often copied) strength of vision
and focus in the [enterprise data warehouse] EDW market.”
–
METAspectrum Market Summary, Enterprise Data Warehouse METAspectrumSM Evaluation, 2004
6
Industry Awards and Recognition - 2005
BI Excellence Award
Sponsor: Gartner Group
•Continental Airlines - winner
•Cardinal Health - finalist
Technology Leadership
Award
Sponsor: Frost & Sullivan
•Teradata selected for
Leadership Award – CRM
Analytics
TDWI Best Practices
Award
•sunrise TDC Switzerland AG
– winner - Customer
Relationship Management
1to1 Impact Award
Sponsor: Peppers & Rogers
Continental Airlines recognized
as Technology Optimization winner
Editors’ Choice Awards
Sponsor: Intelligent Enterprise
•Teradata selected for the
“Dozen” Most Influential
BI Companies
•Winner, Customer Analytics category
NEXUS Awards
NEXUS
Sponsor: New Zealand
Awards
Direct Marketing Association
•Bank of New Zealand,
silver award - data mining & analytics;
bronze award - data management
7
Government Agencies with Teradata Presence
• US Air Force
• US Navy
• US Transportation
Command
• Defense Commissary
Agency
• Army, Air Force
Exchange
• Intelligence
Community
• US Postal Service
• Italian Post Office
• Dept. of Justice
• Dept. of Housing and
Urban Development
• Dept. of Agriculture
• Arizona, Iowa, Florida,
Texas, Illinois, New
York, Utah, Michigan
• RAMQ – Quebec
• Australian Tax Office
• South African Tax
Office
8
Teradata Solutions Methodology
Project Management
Strategy
Research
Analyze
Design
Equip
Build
Integrate
Manage
Opportunity
Assessment
Business
Value
Application
Requirement
System
Architecture
Hardware
Platform
Physical
Database
Components
for Testing
Help Desk
Enterprise
Assessment
EDW
Roadmap
Logical Model
Package
Adaptation
Software
Platform
ECTL
Application
System Test
Capacity
Planning
Information
Sourcing
Data Mapping
Custom
Component
Support
Management
Information
Exploitation
Production
Install
System
Performance
Infrastructure
& Education
Test Plan
Operational
Mentoring
Operational
Applications
Initial Data
Business
Continuity
Education
Plan
Technical
Education
Backup &
Recovery
Acceptance
Testing
Data
Migration
User
Curriculum
User
Training
HW/SW
Upgrade
Value
Assessment
Availability
SLA
Technology Neutral Services
System DBA
Teradata’s success is the combination of hardware, software and
methodology
Solution
Architect
9
Workload Complexity
Data Warehouse Needs Will Evolve
•
•
•
•
•
•
•
ACTIVATING
MAKE it happen!
Query complexity grows
Workload mixture grows
Data volume grows
Schema complexity grows
Depth of history grows
Number of users grows
Expectations grow
OPERATIONALIZING
WHAT IS happening?
PREDICTING
WHAT WILL
happen?
Event-Based
Triggering
Takes Hold
ANALYZING
WHY
did it happen?
REPORTING
WHAT
happened?
Batch
Analytical
Modeling
Grows
Increase in
Ad Hoc Analysis
Ad Hoc
Analytics
Continuous Update/Short Queries
Event-Based Triggering
Primarily Batch &
Some Ad Hoc Reports
Data Sophistication
10
Enterprise Analytical Topologies
Virtual,
Distributed,
Federated
Data Mart
Centric
Sources
Hub-andSpoke Data
Warehouse
Sources
Sources
Enterprise
Data
Warehouse
Sources
ODS
Middleware
Marts
Users
Users
DW
DW
Marts
Users
Users
Independent Data
Marts
Leave Data Where it
Lies
Dependent Data
Marts
Centralized
Integrated Data
With Direct Access
P • Easy to Build
Organizationally
r
o • Easy to Build
Technically
s
• No need for ETL
• No need for separate
platform
• Allows easier
customization of user
interfaces & reports
• Enterprise view
• Design consistency &
data quality
• Data reusability
C • Business Enterprise
view unavailable
o
n • Redundant data costs
s • High ETL costs
• No ETL
• Meta data issues
• Network bandwidth and
join complexity issues
• Only viable for low
volume
• Business Enterprise
view challenging
• Redundant data costs
• High DBA and
operational costs
• Data latency
• ODS duplication
• Requires vision
• Requires Data Owners
to willingly participate
• High App costs
• High DBA and
operational costs
11
Typical Data Warehouse Architecture
What’s wrong with
this picture?
1. There are too many
copies of the data.
Will they all be the
same?
Transaction Systems
Operational Data Stores
Central store, Hub, Clearing house
2. There is too much
latency - too long to
get the data to the
people who need it.
Everyone sees
different inconsistent
points in time
Data Marts
3. The solution is too
complex. Every line
on the chart
represents an ETL
process that
requires $$ for Life
Cycle Maintenance
4. The solution is too
expensive. There
are numerous
components that
lead to increased
costs. Costs often
hidden in
distributed
organization.
12
Teradata’s Enterprise Data Warehouse
An Integrated, Centralized Data Warehouse Solution
Single version of data
ORDER
ORDER NU M BER
ORDER DA T E
ST AT US
“Enterprise”
Data Warehouse
ORDER IT E M BACKORDERED
QUANT IT Y
CUST OM ER
CUST OM ER
CUST OM ER
CUST OM ER
CUST OM ER
CUST OM ER
CUST OM ER
CUST OM ER
CUST OM ER
NUM BER
NAM E
CIT Y
POST
ST
ADDR
PHONE
FAX
ORDER IT E M SHIPPED
QUANT IT Y
SHIP DAT E
IT EM
IT EM NUM B ER
QUANT IT Y
DESCRIPT I ON
PRODUCT
PERIOD
Data Replication
PERIOD KEY
DATE
DAY
MONTH
YEAR
QUARTER
TRIMESTER
SALES
PERIOD KEY
PRODUCT KE Y
CUSTOMER K EY
MARKET KEY
DOLLARS
UNITS
CUSTOMER
Data Marts
CUSTOMER K EY
CUSTOMER NAME
CUSTOMER CITY
CUSTOMER P OST
CUSTOMER S T
CUSTOMER A DDR
CUSTOMER P HONE
CUSTOMER FAX
PRODUCT KE Y
PRODUCT NA ME
DISTRIBUTOR
PRODUCT DE SCRIPTION
PRODUCT HE IGHT
PRODUCT WIDTH
PRODUCT DE PTH
PRODUCT WE IGHT
Logical
(Views)
Application
MARKET
MARKET KEY
CITY
STATE
ZIP
ZIP4
DISTRICT
REGION
COUNTRY
Dimensional
Co-Located
Dependent DM
Optional
Virtual Views
Business & Technology – Consultation
Support & Education Services
Optional ELT
Enterprise, System, & Database Management
Optional
Logical Data Model
Operational
Data Store (ODS)
Optional
ETL Hub
Metadata
Data Transformation
Middleware/Enterprise Message Bus
Transactional Data
Physical
Data Base Design
Transactional Users
Decision Users
Strategic
Users
Tactical
Users
Reporting
OLAP Users
Data
Miners
Event-driven/
Closed Loop
13
TERADATA is an Open System
Virtually
any application
or middleware
framework can be
integrated with
TERADATA !!!
Messages
JMS
JSP
IIOP
ASP
JAVA
EJB
TAP Appl
CORBA
.NET
JDBC
JDBC
JDBC
ODBC
OLE-DB
TERADATA
Utilities
Adapter(s)
TERADATA
TERADATA
Utilities
Adapter(s)
Message Bus
JMS
Publish & Subscribe
WEB
Queues
14
Teradata Active Data Warehouse in action
Front
Base
Line
Supply
DOD Supplier
Secure Wireless
Warfighter
Support
5.Warfighter receives alert via
Secure Blackberry, adjusts Battle
Plans to align with rush replenishment
1.Continuous Transaction
feeds on supplies usage
Secure
DOD
Network
Enterprise Application
Integration
Web Services
WebTibco
.NET
Sphere
(EAI)
Strategic
& Tactical
Queries
4. and or DOD
Vendor notified
and reorders
Secure
DOD
Network
Business Services
OLAP
Rules Event
Intel
Queries Agents Engine Engine
2. Conditioning &
Ascential
Loading of trans Informatica
data
Information Exchange
MQ Adapter
T-Pump, MQ Adapter
Fast Export
Legacy
Systems
Direct Data Access
Data Acquisition
T-Pump, MQ Adapter
Fast Load, Multi Load
Transactional Environment
3.Stored
Procedures
trigger based
event
detection
TERADATA
sends alert
Stored Procedures
to
Q Tables
Warfighter,
UDF, Triggers
Warfighter
Support, &
DOD Supplier
via MSTR
Narrowcaster
Decision Making Environment
16
So what is Teradata ?
What is Teradata?
• RDBMS designed to run the world’s
largest databases
• Latest Intel technology nodes
• UNIX-MP-RAS, Windows 2003
• Linux in Fall 2005
• Scales linearly from Laptop to MPP
• Has a parallel aware optimizer that
allows multiple complex queries to run
concurrently
• Standard access language (SQL)
• Uses a “Shared-Nothing” architecture
• Unlimited, unconditional parallelism
• Linear Scalability allows for increased
workload without decreased throughput.
18
Teradata Hardware Architecture
• SMP Nodes
> Latest Intel SMP CPUs
> Configured in 2 to 8 node
cliques
> Windows, Unix or Linux
• BYNET Interconnect
> Fully scalable bandwidth
> 1 to 1024 nodes
BYNET Interconnect
SMP Node1
PE
PE AMP
AMP AMP AMP
SMP Node2
PE
PE AMP
AMP AMP AMP
SMP Node3
PE
PE AMP
AMP AMP AMP
SMP Node4
PE
PE AMP
AMP AMP AMP
• Connectivity
> Fully scalable
> Channel - ESCON
> LAN, WAN
• Storage
> Independent I/O
> Scales per node
• Server Management
> One console to view
the entire system
Server Management
19
Teradata Shared Nothing Architecture
P
P
P
FSB
Memory
P
FSB
I/O
P
I/O
Memory
P
FSB
Memory
P
P
FSB
I/O
I/O
Memory
• Similar to Large SMP, except Interconnect runs at I/O Rates and not
Memory Rates
• Longer Lifetime: I/O Interfaces have a 3-5 Year Lifetime
• Scaling Is By Increasing Link Data Rates and Parallel Links
20
SMP vs. MPP: The Teradata Advantage
• 2-Way SMP
>
>
>
>
>
1.8 Relative CPU’s
4 GB Memory
3.2 GB/Sec BUS
3.2 GB/Sec Memory
1.5 GB/Sec I/O
• 4-Way SMP
>
>
>
>
>
3.1 Relative CPU’s
4 GB Memory
3.2 GB/SEC BUS
3.2 GB/Sec Memory
1.5 GB/Sec I/O
• 2 2-Way Teradata Nodes
> 3.6 Relative CPU’s
> 8 GB Memory
> 6.4 GB/Sec BUS
> 6.4 GB/Sec Memory
> 3 GB/Sec I/O
• 32 2-Way Teradata Nodes
> 57.6 Relative CPU’s
> 128 GB Memory
> 102.0 GB/Sec BUS
> 102.0 GB/Sec Memory
>
48 GB/Sec I/O
21
Teradata Data Distribution
Dividing the Work
• Rows are distributed evenly by hash partitioning
>
>
Done in real-time as data are loaded, appended, or changed.
No reorgs, repartitioning, space management
• Shared nothing software:
>
>
>
Table A Table B Table C
Each VAMP owns an equal slice of the data.
Each VAMP works exclusively & independently on its rows
Nothing centralized: No single point of control for any operation (I/O,
Buffers, Locking, Logging, Dictionary)
Prime Index
Teradata Parallel Hash Function
VAMP1
VAMP2
VAMP3
P
P
P
M
D
M
D
M
RowHash (Hash Bucket)
VAMP4 ………………………………………………………VAMPn
P
D
Data Fields
M
P
D
M
P
D
M
P
D
M
P
D
M
P
D
M
D
22
File System
• File system architecture is fundamentally different
>
>
>
>
Broke all the rules
No Pages, BufferPools, TableSpaces, Extents,...
Data location and management are entirely automatic
Space allocation is entirely dynamic
• Absolutely minimal labor required
> No reorgs
– Don’t even have a reorg utility
>
>
>
>
>
No index rebuilds
No re-partitioning
No detailed space management
Easy database and table definition
Minimum ongoing maintenance
– All performed automatically
23
Self Managing Architecture
• Teradata’s self-managing philosophy provides the lowest
total cost of ownership of any RDBMS
>
>
>
>
>
>
Automatic, random and even data distribution
Parallel-aware optimizer eliminates query tuning
Parallel utilities with low setup and checkpoint restart
Single operational view of entire MPP complex (AWS)
Single point of control for the DBA (Teradata Manager)
SQL-ready database management information (log files)
Teradata DBAs Don’t Worry About!
1.
2.
3.
4.
Install the Database
Understand, monitor and tune extensive operating system
parameters
Understand, monitor and tune extensive database parameters
Determine the size and physical location and/or space allocations
of tables and index partitions
5.
Perform periodic table and index re-orgs
6.
Manually restart multi-step load process when failure occurs
7.
Ability to run queries and data maintenance 24x7
8.
Sort data before loading
9.
Calculate and configure fail-over plans in a clustered
multiprocessing environment
10. Spend a lot of time planning and expanding the system
11. Query tuning for decision support
25
Teradata High Availability
• Teradata software
provides high availability
beyond other databases
> Compensates for
hardware failures:
– Automatic failover for
dynamic
workload rebalancing
(migrating VPROCS)
– Online, continuous
backup
(Fallback)
BYNET Interconnect
SMP Node1
PE
PE AMP
AMP AMP AMP
SMP Node2
PE
PE AMP
AMP AMP AMP
SMP Node3
PE
PE AMP
AMP AMP AMP
SMP Node4
PE
PE AMP
AMP AMP AMP
> Recycles before
the operating system
completes its reboot
(multi-node system)
26
Teradata’s Multidimensional Scalability
(It’s more than just big data)
Amount of Detailed Data
Concurrent Users
Multiple Subject Areas
Sophisticated Queries
• Simple Direct at the start
ORDER
ORDER NUMBER
ORDER DA TE
STATUS
• Moderate Multi-table Join
ORDER ITE M BACKORDERED
QUANTITY
CUSTOMER
CUSTOMER
CUSTOMER
CUSTOMER
CUSTOMER
CUSTOMER
CUSTOMER
CUSTOMER
CUSTOMER
NUMBER
NAME
CITY
POST
ST
ADDR
PHONE
FAX
ORDER ITE M SHIPPED
• Regression analysis
• Query tool support
QUANTITY
SHIP DATE
ITEM
ITEM NUMB ER
QUANTITY
DESCRIPTI ON
28
EDW Requires Multi-dimensional
Scalability
Data Volume
(Raw, User Data)
Mixed
Workload
Query
Concurrency
Data
Freshness
Query
Freedom
Query
Complexity
Query Data Volume
Schema
Sophistication
29
The Teradata Difference
“Multi-dimensional Scalability”
Data Volume
(Raw, User Data)
Mixed
Workload
Query
Concurrency
Teradata can Scale
Simultaneously Across
Multiple Dimensions
Driven by Business!
Competition Scales
One Dimension at the
Expense of Others
Limited by Technology!
Data
Freshness
Query
Freedom
Query
Complexity
Query Data Volume
Schema
Sophistication
30
The Teradata Difference
“Multi-dimensional Scalability”
Data Volume
(Raw, User Data)
Mixed
Workload
Teradata can Scale
Simultaneously Across
Multiple Dimensions
Driven by Business!
Data
Freshness
Query
Concurrency
The
Teradata
Competition Scales
One Dimension at the
Expense of Others
Limited by Technology!
Query
Complexity
Difference!
Query
Freedom
Query Data Volume
Schema
Sophistication
31
The Teradata Difference
“Multi-dimensional Scalability”
Data Storage
(raw, user data)
Teradata
Others
20 TB
100’s TBs +
Multiple, Integrated
Stars and Normalized
15 TB
1,000’s
Schema
Sophistication
Normalized
10 TB
Multiple,
Integrated
Stars
5 TB
Simple
Star
3-5 Way
Joins
15+ way Joins +
OLAP operations +
Aggregation +
Complex “Where”
constraints +
Views
Parallelism
5-10 Way
Joins
# of
Concurrent
Queries
MBs
Batch Reporting,
Repetitive Queries
“Iterative”, Ad Hoc Queries
Data Analysis/Mining
Near Real Time Data Feeds
Active Data Warehousing
GBs
Query
Complexity
TBs
Query Data
Volumes
Workload
Mix
32
State of Michigan, Department
of Community Health (DCH)
Customer Profile
Teradata Customer Since 1991
As the largest department in the State of Michigan, DCH is responsible for managing delivery of health care
services to more than 1.2 million clients and overseeing an annual budget of $9.5 billion. DCH administers many of
the state’s most critical programs, including Medicaid, WIC, and child immunizations.
Business Solutions
• Data warehouse integrates
claims/encounters; beneficiary
eligibility data; provider data; birth
records; death records; long-term
care assessments; WIC data;
immunizations; lead screening;
newborn screening; & notifiable
diseases.
• Fraud & abuse
• Contract management with health
plans
• Healthcare cost & quality
assessment
• Overpayment & COB analysis
• Program effectiveness
• Predict State’s healthcare needs
• Prioritize health initiatives
for future
Implementation Summary
• Integrated data from nine separate health-related agencies
• Managed and used by agency subject matter/programmatic
experts, not by the IT department
• Over 200 users in Medicaid and 8,000 state-wide
Realizations and ROI
• Estimated annual savings of $75 million–$100 million due to
advanced health care analysis
• Medicaid administrative costs have been reduced by 25 percent
• Recoveries for Medicaid Fraud has doubled
• Maximized Medicaid program savings while sustaining quality
care
• Warehouse helped Michigan go from “last to first” in child
immunization rates
• Track and substantiate savings in Medicaid pharmacy costs
• 2004 TDWI Best Practice Award Winner – Government and
Non-Profit Category
33
The New York State
Department of Health (DoH)
Teradata Customer Since 1999
Customer Profile
New York’s Medicaid program provides critical health care services to more than 3.7 million participants – 2.4
million in New York City alone. To serve this constituency, the state processes and analyzes more than 300 million
claims totaling more than $38 billion annually. It is the largest Medicaid program in the US.
Business Solutions
New York is making more rapid, informed
decisions about programs, policies, and
people across its vast Medicaid system.
• Fraud & abuse
• Tracking bio-terrorism indicators daily by
pharmaceutical purchases with acute
illness data from hospital emergency
rooms
• Determining disease patterns and trends
and the best possible treatment
• Tracking drug pattern usage to prevent
abuse
• Program effectiveness
• Service delivery effectiveness
• Enhanced audit control
• Forecasting the cost and utilization of
expensive prescription drugs
• Identification of overpayments
• Responding quickly to legislative
inquiries
Implementation Summary
• More than five years of History
• 1.3 Billion Claims
• 650 users from 17 counties that is expected to grow to
thousands
Realizations and ROI
• First year in operation paid for entire implementation of
the DW!
• Better analysis of integrated data resulted in recoveries
in the millions!
• $16m - Coordination of Benefits, $5m - duplicate
payments, $1 million - overpayments
• $187 million saved due to better policy decisions based
on medical and pharmaceutical analysis
• Millions saved due to efficiency of analysis such as Audit
process reduced to 2 hours from 8 weeks
• 2004 NASCIO Award – Best Information Architecture
Category
34
Iowa Department of Revenue
Tax Compliance
• Have more accurate leads because of better information
• Experienced substantial savings; staff can -> Analyze greater volumes of data
> Manage a greater number of cases
> Exercise a higher level of control over taxpaying behavior
> Before the EDW, this additional work would have caused
for a 20-25% increase of the audit staff
• Generated $69.7M in incremental collections and refund
reductions in 2003
> $30.6M through office examinations
> $17.4M in refund reductions
> $ 9.1M from tax gap revenues
> $ 7.5M in out-of-state audits of multi-state businesses
> $ 5.1M from in-state field audits Business Benefits
35
The Teradata Mission
Teradata Active Data Warehousing
strategic
tactical
event-driven
decision making in a single
centralized
mission-critical
up-to-date
version of the enterprise data
Sources
tactical
strategic
Active Data Warehouse
Users
“Any Question, By Any User, At Any Time”
All Decision Making…from One Copy of the Data.
36
The Industry Leader in Data Warehousing
john.tulley@ncr.com
37
Download