Handling Distributed Data Case Study: Raytheon Ross Scott CS561 04/01/04 Agenda • • • • • Description the problem space Attempt 1: Distributed Database Attempt 2: Data Warehouse Attempt 3: Federation Q&A Problem Space Program Management Engineering Supply Chain Finance Manufacturing Several functional disciplines that need to communicate with one another. Problem Space: Engineering SFDM ScanCenter PRO/I RDS WebCitis WSTR RCT Rasters Optegra ECMS RVT Mentor ACMS Mears PRACS PEPS AIMS HAWK EDIS PATRIOT Portsmouth Vendor/ SCD CR DB TENS StandardH/W ization Classification File DB TeamPort POWER (SAP) WebView Procurement D Sherpa Portsmouth BEIMS Sherpa SanDiego ClearCase Shop Order Patriot Obsolesence DB WINS AutoCAD Doors Homogeneous Data Stored in Heterogeneous Systems Problem Space Program Management SFD M PR O/I ScanCenter RD S WS TR WebCitis RVT EC MS RCT Rast ers Optegra Mentor AC MS Mea rs PR AC S AIM S HAWK PATRIOT TEN S Vendor/ SCD CR DB PEP S EDIS Portsmouth Standardization File TeamPort H/W Classification DB POWER (SAP) WebView Sherpa Portsmouth BEI MS Sherpa SanDiego Procurement D Supply Chain Finance ClearCase Sho p Ord er Patriot Obsolesence DB WIN S AutoCAD Doo rs Manufacturing Heterogeneous Data Stored in Heterogeneous Systems National Problem Space Program Management Engineering Supply Chain Program Management Engineering Program Management Supply Chain Engineering Supply Chain Finance Manufacturing Finance Manufacturing Finance Manufacturing Program Management Engineering Supply Chain Program Management Finance Manufacturing Program Management Engineering Supply Chain Engineering Supply Chain Finance Finance Manufacturing Manufacturing Homogeneous Data Stored in Heterogeneous Systems How Do We Handle This? Monolithic Server Distributed Database Monolithic Server • What is a monolithic server? - One single server for each primary business discipline • Pros - Simplified Domain Space - Common Data Model / Common Process - Lower system maintenance costs Master Server • Cons - Cost of migration - Loss of local control - Slow to make changes to system Cost too prohibited to create a monolithic server. Distributed Databases • What is a distributed database? - A collection of several different databases that looks like a single database to the user. • Pros - Leverage existing investment - Local control, enterprise visibility • Cons - Complex to do data mapping - Potential network latency issues What is the difference between Distributed Database vs Replication? System Requirements • Realtime – Manufacturing – Work in Process • Prototype Engineering Release Data Keep it simple. Prove the technology. Distributed Vaults • Start simple by combine homogeneous data in disparate systems Sudbury Tewksbury Optegra Optegra • Pros • Cons - Distributed vault successful - Network Latency slows - Each system has visibility entire system performance into each system - Slowdown makes system - Both systems look like one unusable Network Latency creates too much downtime. Now What Do We Do? Data Warehouse Data Warehouse • What is a data warehouse? – “A data warehouse is a structured extensible data environment designed for the analysis of non-volatile data, logically and physically transformed from multiple source applications to align with business structure, updated and maintained for a long time period, expressed in simple business terms, and summarized for quick analysis.” Data Warehouse Intralink FILESYSTEM Optegra CIMMS Sherpa EDMS Sherpa Works Metaphase RDM Data Warehouse contd. Data Warehouse Florida Site ID: 87 Texas Site ID: 50 Arizona Site ID: 30 California Site ID: 40 Northeast Site ID: 20 Data Warehouse Initial Search Detailed Page View Drawing Obtain an Account Data Warehouse • Pros - Leverage off of existing investment - Inexpensive to stand up - 6 months to build - 8 hours to bring on new site -Network Latency not as critical • Cons - Cannot support real-time Delete and Update 31 million unique rows loaded nightly! Federation • What is Federation? – The linking together of information management systems for the purpose of distributed, collaborative, product development. – The creation and use of a heterogeneous network of data and processes where portions of the network are managed by different systems, yet the user sees a unified whole • What is a distributed database? – A collection of several different databases that looks like a single database to the user. Federation: Data Network • What is the definition of a Data Network? • A network consisting of associations between data in distributed systems – Associations can be built using three techniques: • Links • Proxies • Replicas Federation by Link Corporate Systems Mfg Center Sourcing Report Inventory Client request Client request Windchill Server Master File Vault Cost A DB Cost A Cost A Link Firewall Cost A DB A Firewall Part A ERP Server Client redirect Proxy Replica Run time Generated Federation by Link • Best use of links – When remote system has a Web-based UI and supports URL references to its objects and user is comfortable with switching between applications Federation by Proxy Design Center Design Collaborator Structure Browser Client requestDisplay attributes Windchill Server DB A Master Return Part B attributes where Name =B PartSelect B Part Run time Part B File Vault DB B B Link Firewall Proxy B Firewall Part A PDM System Proxy Replica Run time Generated Federation by Proxy • Best use of proxies – For consistent user interface to data residing in multiple systems – To be assured that you are looking at the latest up-to-date data – When data is loosely coupled, infrequently accessed – When you need to modify remote data without conflicts – For relating dissimilar data types residing in external systems – For composing data from multiple external systems (composite types) – When data is frequently accessed in a read-only mode Federation by Replica Design Center Supplier Structure Browser Structure Browser Display part or stream content file Client request Display part or stream content file Client request Windchill Server Master B Link Replicate Part B and Doc C File Vault DB A Doc C Replica Firewall Part B Replica Firewall Part A Windchill Server C C Proxy Doc C DB Doc C Content replicated Replica Part B Run time Generated B C File Vault C Federation by Replica • Best use of replication – For consistent user interface to data residing in multiple systems – To maximize end-user system performance – To overcome availability, security, restrictions Federation: Process Networks – Federated data is of limited use without having the business processes involving the Federated systems also being able to interact – Process networks should support: • Workflow interaction – Activity in local workflow, initiates a workflow in a remote system – Remote approval of activities • Change process interaction – Local change order initiates change activity in remote system • Federated task lists containing items from multiple systems • Event-based interaction – Each system both publishes events and responds to events in other systems (JMS, MQ Series, etc) Questions? How would a middleware vendor define Federation? How would an end solution provider define Federation? Any Questions? Bibliography • Distributed Databases Ann Tai, and John Meyer, Performability Management in Distributed Database Systems: An Adaptive Concurrency Control Protocol, 4th IEEE International Workshop: Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 1996 Ramesh Gupta,, Jayant Haritsa, Krithi Ramamritham, S. Seshadri, Commit Processing in Distributed RealTime Database Systems, 17th IEEE Real-Time Systems Symposium, 1996 Maitrayi Sabaratnam,, Oystein Torbjornsen, Svein-Olaf Hvasshovd, Cost of Ensuring Safety in Distributed Database Management System, Pacific Rim International Symposium on Dependable Computing, 1999 Carlos Perez Leguizamo, Shohei Kato, Kinji Mori, Autonomous Consistency Coordination Technique among Distributed Database Systems for Achieving High Reliability, Proceedings of the IEEE Seventh International Symposium on Computers and Communication (2002) • Data Warehouses Eva Kuhm, The Zero-Delay Data Warehouse: Mobilizing Heterogeneous Databases, VLDB, 2003 •Federation Geer, David, Federated Approach Expands Database Access Technology, Computer, May 2003 • Network Performance Sujata Banerjee, Panos K. Chrysanthis: Network Latency Optimizations in Distributed Database Systems. ICDE 1998: 532-540 Victor C.S. Lee, Kam-yiu Lam*, Sheung-lun Hung**, C. M. Wong, Performance Studies of Concurrency Control in Distributed Real-time Database Systems on ATM Networks, Proceedings of the IEEE 7th Euromicro Workshop on Real-Time Systems, 1995