GPIR GridPort Information Repository Tomislav Urban Texas Advanced Computing Center TEXAS ADVANCED COMPUTING CENTER Origins • HotPage Informational Data – – – – Load, MOTD, Node Map, etc. Obtained from customized data gathering scripts MDS 2.0 where available “Static” VO configuration data • Identified interest in recording historical grid data in support of – Workflow/ Decision-making – Job schedulers/Brokers – Histograms • Sought to move towards a web services model using – XML schema – Removes the need to write customized implementations for each new resource 2 GridPort Information Repository (GPIR) • Implementation of web service enabled information service • Evolved from various HotPage, GridPort, TACC and GCE-RG information and web services projects (IAWS) • Concept demonstrated at SC 02 for TeraGrid, PACI (NPACI/Alliance) resources – – – – Called Information Archival Web Service (IAWS) Based on XML documents stored on a file server Thin clients (Java / Perl) pushed data into repository Contained XML documents for current grid status as well as archived historical data (HotPage information other) – The IAWS was conceptualized in collaboration with SDSC and NCSA 3 Design Philosophy • “Aggressive Practicality” – Works today with what’s available today – Comprehensive Portal-centric data set – Intended to support the GridPort GCE framework and it’s data requirements. As web service, can be repurposed to any grid data needs. • Follow Standards – OGSI (Grid Services) – Emerging Data Schema (GLUE?) • Scalable – Relational Database back-end • Extensible – Easy to add new XML Queries, format as needed 4 Architecture Resources Information Providers dB Clients Portals Perl Client Ingester WS Java Client edu.tacc.GPIR Query WS MDS GPIR Web Scraping MySQL PostgreSQL OGSA (Future) Other SOAP-XML HTTP JDBC 5 Other Middleware Portlets Architecture A single GPIR instance may support multiple portals serving various VOs VO Portal VO Portal VO Portal GPIR 6 VO Portal Current Data Sources • Thin Clients – Java – Perl • MDS • GMS – http://www.tacc.utexas.edu/grid/gms • NWS • “Web Scraping” – Cron jobs run periodically on HPC resources compiling text files that are then accessed via HTTP 7 • • • • • • Data Load - aggregated CPU Jobs – individual and aggregated queue MOTD Nodes - job usage for each machine node NWS - based on VO and Click model Grid Monitoring (GMS) – Based on NCSA • Machine Status – “Static” Resource data (query only) • Extensible through the addition of XML data from any recognized source – Need schema – Need query 8 Web Services • Ingester WS – Accepts XML documents containing updates to Grid status • Query WS – Provides XML containing query specific information 9 Current Work • Migration to PostgreSQL – Full feature set Transactionality Etc. – Better future J2EE support CMP CMR • Administration Client – Allowing web-based administration of “static” data for all supported VOs would be a huge productivity boost 10 Supported VOs • Current – The PACI: NPACI, Alliance – TACC/University of Texas: – TIGRE / State of Texas University of Texas, University of Houston Texas A&M, Texas Tech, Rice Baylor College of Medicine – IPG • Planned – ETF 11 Deployment • Code available at: http://www.tacc.utexas.edu/grid/gpir • Consists of: – Web Service – Example Clients – JavaDocs – DDL Script for MySQL – XML Schema Documents (XSDs) – XML Document Examples 12 Future Directions • Integration into GridPort 3.0 – J2EE Implementation – Treat GPIR Entities as real objects rather than table rows • Significant expansion to the data being gathered • Administration Client • Reporting and decision making based on historical data 13 Grid Services • Intend to implement GPIR as a grid service • Inherit OGSI Security model • GT 3.0 GSI • OGSI Compliance • OGSA Compliance • Will support WC3 and GGF standards – Web Services – Grid Services 14 Outstanding Issues • Inflexibility – Relational Database Changes – XML Schema Changes – Support for Dynamic Queries (Waiting for standards) • Inefficiency of dynamic data storage – Sampling vs. Events – Example: The Job Table • Data Format Standards – MDS/GLUE Schema – INCA? • Security – GSI based authentication 15