WP3 The status of the EU DataGrid's R-GMA system Steve Fisher / RAL 24/4/2003 <s.m.fisher@rl.ac.uk> Who we are • Heriot-Watt, Edinburgh WP3 – Andrew Cooke, Werner Nutt • IBM-UK – James Magowan, (Manfred Oevers), Paul Taylor • INFN – Roberto Barbera, Giuseppe Save, Gennaro Tortone • Queen Mary, University of London – Roney Cordenonsi, (Ari Datta) • CCLRC/PPARC – Rob Byrom, Laurence Field, Steve Hicks, Manish Soni, Antony Wilson, (Xiaomei Zhu), Jason Leake – Linda Cornwall, Abdeslem Djaoui, Steve Fisher, Robin Middleton • SZTAKI, Hungary – Peter Kacsuk, Norbert Podhorszki • Trinity College Dublin – Brian Coghlan, Stuart Kenny, David O’Callaghan, (John Ryan) R-GMA Steve Fisher/RAL - 24/4/2003 2 GMA WP3 • From GGF • Very simple model • Does not define: Producer execute or stream Consumer R-GMA Registry – Data model – How data are moved from Producer to Consumer – What registry looks like Steve Fisher/RAL - 24/4/2003 3 R-GMA WP3 • Use the GMA from GGF • A relational implementation Producer execute or stream Consumer R-GMA Registry – Powerful data model and query language • Applied to both information and monitoring • Creates impression that you have one RDBMS per VO Steve Fisher/RAL - 24/4/2003 4 Relational Data Model WP3 • Not a general distributed RDBMS system, but a way to use the relational model in a distributed environment where global consistency is not important. • Producers announce: SQL “CREATE TABLE” publish: SQL “INSERT” • Consumers collect: SQL “SELECT” • Some producers, the Registry and Schema make use of RDBMS as appropriate – but what is central is the relational model. R-GMA Steve Fisher/RAL - 24/4/2003 5 Producer Consumer WP3 • Consumer can issue one-off queries – Similar to normal database query • Consumer can also start a continuous query – Requests all data published which matches the query • Can be seen as an alert mechanism R-GMA Steve Fisher/RAL - 24/4/2003 6 Registry choices Registry (of Producers and Consumers) WP3 Schema (descriptions of tables) • Decided early to keep them separate • In fact they have different requirements for distribution/replication • Each implemented with one RDBMS per instance R-GMA Steve Fisher/RAL - 24/4/2003 7 Virtual RDBMS WP3 • Creates impression that you have one RDBMS per VO – This makes it very easy to use – 1 integrated system – 1 query language • Users like it • But how will it fit in with GridServices? R-GMA Steve Fisher/RAL - 24/4/2003 8 Producers • DataBaseProducer – Supports History Queries WP3 – Information not lost – Supports joins – Clean up strategy • StreamProducer – Supports Continuous Queries – In memory data structure – Can define minimum retention period • ResilientStreamProducer – Supports Continuous Queries – Like the StreamProducer but won’t lose data if system crashes – So slightly slower • LatestProducer – Supports Latest Queries – Just holds the latest information for any “primaryish” key – Supports joins • CanonicalProducer – Supports anything – Offers anything as relations R-GMA Steve Fisher/RAL - 24/4/2003 9 Archiver (Re-publisher) WP3 • It is a combined Consumer-Producer • You just have to tell it what to collect and it does so on your behalf • Re-publishes to any kind of “Insertable” (i.e. not to the CanonicalProducer) R-GMA Steve Fisher/RAL - 24/4/2003 10 Canonical Producer WP3 • Allows user defined code to be invoked to respond to SQL query • Developed in collaboration with CrossGrid CreateTable, Port, Protocol, Security, SQL Support, Multiple Query Support CP API Security Insert User Code Query Register Canonical Producer Servlet Port Files R-GMA Steve Fisher/RAL - 24/4/2003 11 Functionality - mediator WP3 • Queries posed against a virtual data base • The Mediator must: – find the right Producers – combine information from them • Hidden component – but vital to R-GMA • Can now merge information from several producers • The final mediator will take “any” SQL statement and do the right thing R-GMA Steve Fisher/RAL - 24/4/2003 12 Topologies SP A SP WP3 A LP SP SP A SP A • Normally publish via SP • Archivers instantiated with a Producer and a Predicate • Must avoid cycles in the graph HP SP R-GMA Steve Fisher/RAL - 24/4/2003 13 Schema & Contributions WP3 CPULoad (Global Schema) Country Site Facility Load Timestamp UK RAL CDF 0.3 19055711022002 UK RAL ATLAS 1.6 19055611022002 UK GLA CDF 0.4 19055811022002 UK GLA ALICE 0.5 19055611022002 CH CERN ALICE 0.9 19055611022002 CH CERN CDF 0.6 19055511022002 CPULoad (Producer 2) CPULoad (Producer 1) UK RAL CDF 0.3 19055711022002 UK RAL ATLAS 1.6 19055611022002 UK GLA CDF 0.4 19055811022002 UK GLA ALICE 0.5 19055611022002 CPULoad (Producer 3) R-GMA CH CERN ATLAS 1.6 19055611022002 CH CERN CDF 0.6 19055511022002 Steve Fisher/RAL - 24/4/2003 14 Contributions are Views WP3 CPULoad (Producer 1) UK RAL CDF 0.3 19055711022002 UK RAL ATLAS 1.6 19055611022002 SELECT * FROM cpuLoad WHERE country = ’UK’ AND site = ’RAL’ CPULoad (Producer 2) UK GLA CDF 0.4 19055811022002 UK GLA ALICE 0.5 19055611022002 SELECT * FROM cpuLoad WHERE country = ’UK’ AND site = ’GLA’ R-GMA Steve Fisher/RAL - 24/4/2003 15 R-GMA Tools WP3 • R-GMA CLI – Command Line Interface (similar to MySQL) – Supports single query and interactive modes – Can perform simple operations with Consumers, Producers and Archivers • R-GMA Browser – JSP application dynamically generating web pages – Supports pre-defined and user-defined queries • Pulse – R-GMA Java client-based GUI – Supports streaming and simple graphical displays R-GMA Steve Fisher/RAL - 24/4/2003 16 GIN and GOUT (Gadget IN and Gadget OUT) LDAP InfoProvider GLUE Schema Archiver Consumer (CE) Consumer (SE) WP3 DataBase Producer ConsumerA PI GIN Consumer (SiteInfo) CircularBuffer Producer R-GMA RDBMS GOUT CircularBuffer Producer GIN LDAP Server R-GMA Consumers LDAP InfoProvider R-GMA Steve Fisher/RAL - 24/4/2003 17 R-GMA – How? WP3 • Currently based on servlet technology – Behind every API there is a Servlet – Multiple hand crafted APIs • Java, C++, C, Python and Perl – Tomcat – Soft state registration – Uniform exception handling • To ensure that useful messages and stack traces are preserved. R-GMA Steve Fisher/RAL - 24/4/2003 18 OGSIfication WP3 • Have recently started the migration to web and grid services – Apache axis – WSDL generated APIs – Will provide a wrapper for backwards compatibility R-GMA Steve Fisher/RAL - 24/4/2003 19 OGSIfied R-GMA Application Consumer API Producer API Consumer Factory Consumer Instance Producer Instance WP3 Registry Schema Sensor • • • • Producer Factory All Grid Services OGSA Factories, GSH, GSR Registry includes HandleMapper SQL as Service Data Element Query Language R-GMA Steve Fisher/RAL - 24/4/2003 20 OGSIfication issues WP3 • Consider XML as internal representation of service data elements – Depends on other developments • Consider XQuery as service data elements query language – Depends on how XQuery develops • X-GMA ?? – Will this be distinguishable from what is in GT3 R-GMA Steve Fisher/RAL - 24/4/2003 21 Resilience - Registry Producer1 Registry1 Info mastered by Registry1 Copy of info from Registry2 • • Copy of info from Registry3 • Registry2 Info mastered by Registry2 • Copy of info from Registry1 Copy of info from Registry3 Producer2 Registry3 Info mastered by Registry3 Copy of info from Registry1 Copy of info from Registry2 R-GMA • • • • WP3 Will have one logical registry and schema per VO Each logical registry will have multiple physical “copies” Each entry in registry has 3 possible states Transmit new records and deleted records and checksum after records deleted locally Self healing even supports new registry instances Consumer uses any instance Fail over mechanism not yet implemented Schema more tricky Steve Fisher/RAL - 24/4/2003 22 Soft-state Registration and the Registry WP3 • Registry records existence of Producers and Consumers • Registry holds last contact time and ‘expiry’ time • Producers and Consumers periodically refresh their time stamps • Producer and Consumer servlets avoid unnecessary traffic to Registry • Scheduled removal of entries that have timed-out R-GMA Steve Fisher/RAL - 24/4/2003 23 Resilience Testing WP3 • Taking 7 components – – – – – – Schema 2 registry instances Producer API Consumer API Producer Servlet with other APIs Consumer Servlet with other APIs • Consider each component in turn – Break the network and bring it back – Close the component down and bring it back – Crash the component and bring it back • Will also consider real life scenarios R-GMA Steve Fisher/RAL - 24/4/2003 24 Performance WP3 • By design: – Very flexible - to avoid bottlenecks – Powerful queries allow a single query to be made • Performance and Optimisation – Will use NetLogger and profiling tools to identify possible bottlenecks • Internally not high speed because of XML etc R-GMA Steve Fisher/RAL - 24/4/2003 25 Summary WP3 • R-GMA is a combined Grid information and monitoring system • Supports notion of Virtual Database • Recently deployed in the EDG development testbed • Now focusing on reliability, stability and performance http://hepunx.rl.ac.uk/edg/wp3/ Thanks to the EU and our national funding agencies for their support of this work R-GMA Steve Fisher/RAL - 24/4/2003 26 And finally GGF8… WP3 • RGIS-RG – The two short sessions will be held: • Session 1: Database use cases and best practices in the grid environment (outside the traditional data areas) » » » » Using databases to store application metadata Using databases to store monitoring information Using databases as a grid registry Creating grid registries for locating relational and XML databases • Session 2: Data discovery in the grid environment – We will also discuss our milestones and future directions. (e.g should we include XML as well as Relational models.) – See http://hepunx.rl.ac.uk/ggf/rgis-rg • A GMA BOF is planned for GGF8 R-GMA Steve Fisher/RAL - 24/4/2003 27