Practical Distributed Database Case Study: Raytheon

advertisement
Handling Distributed Data
Case Study: Raytheon
Ross Scott
CS561
04/01/04
Agenda
•
•
•
•
•
Description the problem space
Attempt 1: Distributed Database
Attempt 2: Data Warehouse
Attempt 3: Federation
Q&A
Problem Space
Program
Management
Engineering
Supply Chain
Finance
Manufacturing
Several functional disciplines that need to communicate
with one another.
Problem Space: Engineering
SFDM
ScanCenter
PRO/I
RDS
WebCitis
WSTR
RCT
Rasters
Optegra
ECMS
RVT
Mentor
ACMS
Mears
PRACS
PEPS
AIMS
HAWK
EDIS
PATRIOT
Portsmouth
Vendor/
SCD CR
DB
TENS
StandardH/W
ization
Classification
File
DB
TeamPort
POWER
(SAP)
WebView
Procurement
D
Sherpa
Portsmouth
BEIMS
Sherpa
SanDiego
ClearCase
Shop
Order
Patriot
Obsolesence
DB
WINS
AutoCAD
Doors
Homogeneous Data Stored in Heterogeneous Systems
Problem Space
Program
Management
SFD
M
PR
O/I
ScanCenter
RD
S
WS
TR
WebCitis
RVT
EC
MS
RCT
Rast
ers
Optegra
Mentor
AC
MS
Mea
rs
PR
AC
S
AIM
S
HAWK
PATRIOT
TEN
S
Vendor/
SCD CR
DB
PEP
S
EDIS
Portsmouth
Standardization
File
TeamPort
H/W
Classification
DB
POWER
(SAP)
WebView
Sherpa
Portsmouth
BEI
MS
Sherpa
SanDiego
Procurement
D
Supply Chain
Finance
ClearCase
Sho
p
Ord
er
Patriot
Obsolesence
DB
WIN
S
AutoCAD
Doo
rs
Manufacturing
Heterogeneous Data Stored in Heterogeneous Systems
National Problem Space
Program
Management
Engineering
Supply
Chain
Program
Management
Engineering
Program
Management
Supply
Chain
Engineering
Supply
Chain
Finance
Manufacturing
Finance
Manufacturing
Finance
Manufacturing
Program
Management
Engineering
Supply
Chain
Program
Management
Finance
Manufacturing
Program
Management
Engineering
Supply
Chain
Engineering
Supply
Chain
Finance
Finance
Manufacturing
Manufacturing
Homogeneous Data Stored in Heterogeneous Systems
How Do We Handle This?
Monolithic
Server
Distributed
Database
Monolithic Server
• What is a monolithic server?
- One single server for each primary business discipline
• Pros
- Simplified Domain Space
- Common Data Model / Common Process
- Lower system maintenance costs
Master
Server
• Cons
- Cost of migration
- Loss of local control
- Slow to make changes to system
Cost too prohibited to create a monolithic server.
Distributed Databases
• What is a distributed database?
- A collection of several different databases that looks
like a single database to the user.
• Pros
- Leverage existing investment
- Local control, enterprise visibility
• Cons
- Complex to do data mapping
- Potential network latency issues
What is the difference between Distributed Database vs Replication?
System Requirements
• Realtime
– Manufacturing
– Work in Process
• Prototype Engineering Release Data
Keep it simple. Prove the technology.
Distributed Vaults
• Start simple by combine homogeneous data in disparate systems
Sudbury
Tewksbury
Optegra
Optegra
• Pros
• Cons
- Distributed vault successful
- Network Latency slows
- Each system has visibility
entire system performance
into each system
- Slowdown makes system
- Both systems look like one
unusable
Network Latency creates too much downtime.
Now What Do We Do?
Data Warehouse
Data Warehouse
• What is a data warehouse?
– “A data warehouse is a structured extensible data
environment designed for the analysis of non-volatile data,
logically and physically transformed from multiple source
applications to align with business structure, updated and
maintained for a long time period, expressed in simple
business terms, and summarized for quick analysis.”
Data Warehouse
Intralink
FILESYSTEM
Optegra CIMMS
Sherpa
EDMS
Sherpa
Works
Metaphase
RDM
Data Warehouse contd.
Data Warehouse
Florida
Site ID: 87
Texas
Site ID: 50
Arizona
Site ID: 30
California
Site ID: 40
Northeast
Site ID: 20
Data Warehouse
Initial Search
Detailed Page
View Drawing
Obtain an
Account
Data Warehouse
• Pros
- Leverage off of existing investment
- Inexpensive to stand up
- 6 months to build
- 8 hours to bring on new site
-Network Latency not as critical
• Cons
- Cannot support real-time Delete and Update
31 million unique rows loaded nightly!
Federation
• What is Federation?
– The linking together of information management
systems for the purpose of distributed, collaborative,
product development.
– The creation and use of a heterogeneous network of
data and processes where portions of the network are
managed by different systems, yet the user sees a
unified whole
• What is a distributed database?
– A collection of several different databases that looks
like a single database to the user.
Federation: Data Network
• What is the definition of a Data Network?
• A network consisting of associations between data in
distributed systems
– Associations can be built using three techniques:
• Links
• Proxies
• Replicas
Federation by Link
Corporate Systems
Mfg Center
Sourcing Report
Inventory
Client request
Client request
Windchill Server
Master
File Vault
Cost A
DB
Cost A
Cost A
Link
Firewall
Cost A
DB
A
Firewall
Part A
ERP Server
Client redirect
Proxy
Replica
Run time
Generated
Federation by Link
• Best use of links
– When remote system has a Web-based UI and supports URL
references to its objects and user is comfortable with
switching between applications
Federation by Proxy
Design Center
Design Collaborator
Structure Browser
Client requestDisplay attributes
Windchill Server
DB
A
Master
Return
Part
B attributes
where
Name
=B
PartSelect
B Part
Run time
Part B
File Vault
DB
B
B
Link
Firewall
Proxy B
Firewall
Part A
PDM System
Proxy
Replica
Run time
Generated
Federation by Proxy
• Best use of proxies
– For consistent user interface to data residing in multiple
systems
– To be assured that you are looking at the latest up-to-date data
– When data is loosely coupled, infrequently accessed
– When you need to modify remote data without conflicts
– For relating dissimilar data types residing in external systems
– For composing data from multiple external systems (composite
types)
– When data is frequently accessed in a read-only mode
Federation by Replica
Design Center
Supplier
Structure Browser
Structure Browser
Display
part or stream content file
Client
request
Display
part or stream content file
Client
request
Windchill Server
Master
B
Link
Replicate Part B and Doc C
File Vault
DB
A
Doc C
Replica
Firewall
Part B
Replica
Firewall
Part A
Windchill Server
C
C
Proxy
Doc C
DB
Doc C Content replicated
Replica
Part B
Run time
Generated
B
C
File Vault
C
Federation by Replica
• Best use of replication
– For consistent user interface to data residing in multiple
systems
– To maximize end-user system performance
– To overcome availability, security, restrictions
Federation: Process Networks
– Federated data is of limited use without having
the business processes involving the Federated
systems also being able to interact
– Process networks should support:
• Workflow interaction
– Activity in local workflow, initiates a workflow in a remote system
– Remote approval of activities
• Change process interaction
– Local change order initiates change activity in remote system
• Federated task lists containing items from multiple
systems
• Event-based interaction
– Each system both publishes events and responds to events in other
systems (JMS, MQ Series, etc)
Questions?
How would a middleware vendor define Federation?
How would an end solution provider define Federation?
Any Questions?
Bibliography
• Distributed Databases
Ann Tai, and John Meyer, Performability Management in Distributed Database Systems: An Adaptive
Concurrency Control Protocol, 4th IEEE International Workshop: Modeling, Analysis, and Simulation of
Computer and Telecommunications Systems, 1996
Ramesh Gupta,, Jayant Haritsa, Krithi Ramamritham, S. Seshadri, Commit Processing in Distributed RealTime Database Systems, 17th IEEE Real-Time Systems Symposium, 1996
Maitrayi Sabaratnam,, Oystein Torbjornsen, Svein-Olaf Hvasshovd, Cost of Ensuring Safety in Distributed
Database Management System, Pacific Rim International Symposium on Dependable Computing, 1999
Carlos Perez Leguizamo, Shohei Kato, Kinji Mori, Autonomous Consistency Coordination Technique
among Distributed Database Systems for Achieving High Reliability, Proceedings of the IEEE Seventh
International Symposium on Computers and Communication (2002)
• Data Warehouses
Eva Kuhm, The Zero-Delay Data Warehouse: Mobilizing Heterogeneous Databases, VLDB, 2003
•Federation
Geer, David, Federated Approach Expands Database Access Technology, Computer, May 2003
• Network Performance
Sujata Banerjee, Panos K. Chrysanthis: Network Latency Optimizations in Distributed Database Systems.
ICDE 1998: 532-540
Victor C.S. Lee, Kam-yiu Lam*, Sheung-lun Hung**, C. M. Wong, Performance Studies of Concurrency
Control in Distributed Real-time Database Systems on ATM Networks, Proceedings of the IEEE 7th
Euromicro Workshop on Real-Time Systems, 1995
Download