Data Replication Grid Federated Data Management Projects

advertisement
Grid Federated Data Management Overview
Grid Federated Data Management Projects
Data Replication
OGSA Early Adopters Workshop
30th May 2002
Steve Glanville (deglanvs@us.ibm.com)
Inderpal Narang (narang@almaden.ibm.com)
Vijayshankar Raman (shankar@almaden.ibm.com)
Steven Beckhardt (consultant)
Grid Federated Data Management Overview
Replication Mission
To ensure data location is optimized to ensure best possible
Quality of Service
To achieve this through the extension of standard Grid
services
To render enhanced data replication services through OGSA,
making them available through compliant Grid toolkits
To further enhance this model to cope with heterogeneous
platforms and data types
Grid Federated Data Management Overview
Overall Project
•
•
•
•
To Develop Grid-Federated Data Management (GFDM) Services
GFDM must handle multiple, heterogeneous data sources
GFDM services are rendered through OGSA
GFDM services may be a consumer of, and/or a provider for other
OGSA rendered services
Grid Federated Data Management Overview
Data Replication Value Proposition
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Scalability
Availability
Consistency (QoS)
Disaster Recovery
Utility Model for DB eUtilities, e.g. Analytics eUtilities
Collaboration
Dynamic Data Distribution is key in Grid computing
Data Replication services would be a core service to facilitate this
Data Replication includes Database, Files, Binaries,…
Grid Federated Data Management Overview
Data Replication Business Scenarios
The nomenclature used here describes a replication ‘Service’ as having
multiple ‘Subscriptions’ and enables the following scenarios:
†
†
†
†
†
Replication of data-sets from place to place, selectively or in entirety to optimize
for the collaborative resource model in, say, life-sciences research
Aggregation of multiple data-sources to a single target, such as might occur to
populate a data-warehouse. This can be used to consolidate information for
reporting purposes. A common QoS is assumed.
Replication of a single source to multiple targets. May need to make multiple
local copies of centralized data.
As 2 and 3 with transformations for different targets, to support Unit-ofMeasure (UoM) or language localizations. May need to convert to a single
currency and UoM for head-quarters consolidation purposes. Equally may need
to budget ‘out’ from HQ currency to local-currency systems at remote locations.
As 3 with transformations of different QoS, to support the needs of different
target environments. Business functions such as sales-analysis may be quite
happy with a daily copy of sales-data taken at the close of business each day,
however, Disaster Recovery (DR) support might needs near real-time data
replication to minimize data outage in event of a catastrophic server or storage
failure.
Grid Federated Data Management Overview
Data Replication: System Abstraction
Capture Service
Replication Control Service - Sheperd
ShePeRD
•Service Interface for
•ScHeduler
•Subscriptions (Static and Dynamic)
•PatrollEr
•QoS (latency, security,..)
•and
•Notifications (subscriber, metadata, …)
•Restart
• Billing, Auditing..
•Director
∆
Source
Control flow, Data Flow
Psvcs
∆
Csvcs
Apply Service
Target
Capture Program
•Data producer
Apply Program
•Data consumer
Grid Federated Data Management Overview
Replication Subscription Catalog (OGSA)
Replication Schema Objects
•Subscription Pairing (source-target)
•Requestor Privilege Determination
•Object Type (table, index, xls, txt etc)
•Object Location
•Object Security (encryption, keys etc)
•(Replica Catalog, audit, billing)
Grid Federated Data Management Overview
Replication Subscription Schema (Policy, Life-cycle)
Replication Schema Objects
•Subscription Policy (QoS etc.)
•Location Resource Quotas
•Transformation Details (UoMs, currency, lang
•Replica Trigger Conditions (once, time, tx or size based)
•Replica Life-Cycle (once, repeat, continuous)
•(Checkpoint restart data)
Grid Federated Data Management Overview
Critical Components to Support Replication
Control-Node
• SHEPERD code drop
• Catalog Schema
• Transaction table for audit / billing support
Data-Node
• Capture-Producer code drop
• Transaction table for latency
Grid Federated Data Management Overview
Replication Architecture
Source
D
D
Apply
Service
D
Target
D
Apply
Service
D
Target
Capture
Service
disk buffer
initiate,
monitor QoS
initiate,
monitor QoS
SHEPERD
subscriptions
Src
Tgt
QoS
Requestor
…
Capture
Service
SHEPERD
subscriptions
new subscription
Src
Tgt
QoS
Requestor
…
D
Target
he
ar
tb
ea
t
it ia
e
he
ar
tb
ea
t
te
Apply
Service
Disk buffer
in
D
initiate
Source
subscription
thread
at
i ti
in
Grid Federated Data Management Overview
Normal Replication
Grid Federated Data Management Overview
Role of SHEPERD (ScHEduler/PatrollEr/Restart Director)
• Manage persistent state of processes
• Initiate / Schedule / Update subscriptions
• Handle crash recovery
Grid Federated Data Management Overview
Role of the Capture Service
D
Source D
Capture
Service
D
Stream data to be replicated with checkpoints
Flow Control
†
eject lagging subscriptions
Heartbeat to SHEPHERD
†
†
operational
progress of D capture (e.g. log reader position)
È speeds up crash restart
È avoids persistent state at source
Transformations, Filters ( )
subscription
hear
tbea
t
subscription
SHEPERD
Grid Federated Data Management Overview
Role of the Apply Service
Pick D from source channel
and give to target
Maintain persistent list
of applied Ds
†
†
†
detect D drops
ensure one behavior for all ‘drops’
Near realtime capability avoids
persistent
queues and two-phase commit
Heartbeat
†
†
Operational
Enhanced operational services…
Apply
Service
Target
D
transac. tbl.
transac.
tbl. On
disk
heartbeat
SHEPHERD
Grid Federated Data Management Overview
Crash Recovery Example
Source
† Heartbeat
termination indicates failure to SHEPERD
† SHEPERD tries to re-initiate, supplies checkpoint
È If
cannot restart due to node failure, attempts restart on
another local shared data node
– If restart not possible, will assume node in recovery and
periodically check for a heartbeat
† Reads
disk-buffer to establish restart checkpoint
† Restarts Capture process
Grid Federated Data Management Overview
Crash Recovery Example
Target
† Heartbeat
termination indicates Apply failure to
SHEPERD
† SHEPERD tries to re-initiate
È If
cannot restart due to node failure, attempts restart on
another local shared data node
– If restart not possible, will assume node in recovery and
periodically check for a heartbeat
† SHEPARD
restarts Apply process
† SHEPARD resets Capture process to last
consistent checkpoint
Grid Federated Data Management Overview
Data Grid Engagement Model
Work with a consortium of initial contributors (IBM, Oracle…), UK eScience, DBTF (DAIS Working Group) etc and Globus to determine the
organizational structure, methods and timescales to permit the inclusion
and support of non-Globus developed code into Globus.
It is clear that many organizations from IBM to Oracle to Globus to Cern
are all thinking along the same lines in terms of increasing the common
infrastructure capabilities within Globus to enable security and privileges
associated with data replication. It is time to share our ideas in this area.
Hopes:
• Share initial designs with Globus, UK Grid, DBTF folks before next
GGF
• Meet at next GGF or shortly thereafter to agree processes for code
inclusion and support
• Integration into GDS, GDSF, DSR, GDTS (catalog schema) if it
becomes the de facto mechanism for rendering GDS
Grid Federated Data Management Overview
Data Replication Milestones (Very Tentative)
Agree initial design by end 5/2002
Complete the Technology Preview by end 7/2002
Iterate to Reference-level implementation by end 12/2002
Present Data Replication enhancements to GGF at the 10/2002
conference
Define relationships with and dependencies on other enhanced
OGSA services by 12/2002
Validate enhanced design’s use of DB2 and Oracle Replication
Services by 12/2002, SQL-Server phase-2 (2003)
Download