GPIR GridPort Information Repository Tomislav Urban Texas Advanced Computing Center

advertisement
GPIR
GridPort Information Repository
Tomislav Urban
Texas Advanced Computing Center
TEXAS ADVANCED COMPUTING CENTER
Origins
• HotPage Informational Data
–
–
–
–
Load, MOTD, Node Map, etc.
Obtained from customized data gathering scripts
MDS 2.0 where available
“Static” VO configuration data
• Identified interest in recording historical grid data in
support of
– Workflow/ Decision-making
– Job schedulers/Brokers
– Histograms
• Sought to move towards a web services model using
– XML schema
– Removes the need to write customized implementations for each
new resource
2
GridPort Information Repository (GPIR)
• Implementation of web service enabled information
service
• Evolved from various HotPage, GridPort, TACC and
GCE-RG information and web services projects (IAWS)
• Concept demonstrated at SC 02 for TeraGrid, PACI
(NPACI/Alliance) resources
–
–
–
–
Called Information Archival Web Service (IAWS)
Based on XML documents stored on a file server
Thin clients (Java / Perl) pushed data into repository
Contained XML documents for current grid status as well as
archived historical data (HotPage information other)
– The IAWS was conceptualized in collaboration with SDSC and
NCSA
3
Design Philosophy
• “Aggressive Practicality”
– Works today with what’s available today
– Comprehensive Portal-centric data set
– Intended to support the GridPort GCE framework and it’s data
requirements.
 As web service, can be repurposed to any grid data needs.
• Follow Standards
– OGSI (Grid Services)
– Emerging Data Schema (GLUE?)
• Scalable
– Relational Database back-end
• Extensible
– Easy to add new XML Queries, format as needed
4
Architecture
Resources
Information
Providers
dB
Clients
Portals
Perl
Client
Ingester WS
Java
Client
edu.tacc.GPIR
Query WS
MDS
GPIR
Web
Scraping
MySQL
PostgreSQL
OGSA
(Future)
Other
SOAP-XML
HTTP
JDBC
5
Other
Middleware
Portlets
Architecture
A single GPIR instance may
support multiple portals serving
various VOs
VO Portal
VO Portal
VO Portal
GPIR
6
VO Portal
Current Data Sources
• Thin Clients
– Java
– Perl
• MDS
• GMS
– http://www.tacc.utexas.edu/grid/gms
• NWS
• “Web Scraping”
– Cron jobs run periodically on HPC resources
compiling text files that are then accessed via HTTP
7
•
•
•
•
•
•
Data
Load - aggregated CPU
Jobs – individual and aggregated queue
MOTD
Nodes - job usage for each machine node
NWS - based on VO and Click model
Grid Monitoring (GMS)
– Based on NCSA
• Machine Status
– “Static” Resource data (query only)
• Extensible through the addition of XML data from any
recognized source
– Need schema
– Need query
8
Web Services
• Ingester WS
– Accepts XML documents containing updates to Grid
status
• Query WS
– Provides XML containing query specific information
9
Current Work
• Migration to PostgreSQL
– Full feature set
 Transactionality
 Etc.
– Better future J2EE support
 CMP
 CMR
• Administration Client
– Allowing web-based administration of “static” data for
all supported VOs would be a huge productivity boost
10
Supported VOs
• Current
– The PACI: NPACI, Alliance
– TACC/University of Texas:
– TIGRE / State of Texas
 University of Texas, University of Houston
Texas A&M, Texas Tech, Rice
 Baylor College of Medicine
– IPG

• Planned
– ETF
11
Deployment
• Code available at:
http://www.tacc.utexas.edu/grid/gpir
• Consists of:
– Web Service
– Example Clients
– JavaDocs
– DDL Script for MySQL
– XML Schema Documents (XSDs)
– XML Document Examples
12
Future Directions
• Integration into GridPort 3.0
– J2EE Implementation
– Treat GPIR Entities as real objects rather than table
rows
• Significant expansion to the data being gathered
• Administration Client
• Reporting and decision making based on
historical data
13
Grid Services
• Intend to implement GPIR as a grid service
• Inherit OGSI Security model
• GT 3.0 GSI
• OGSI Compliance
• OGSA Compliance
• Will support WC3 and GGF standards
– Web Services
– Grid Services
14
Outstanding Issues
• Inflexibility
– Relational Database Changes
– XML Schema Changes
– Support for Dynamic Queries (Waiting for standards)
• Inefficiency of dynamic data storage
– Sampling vs. Events
– Example: The Job Table
• Data Format Standards
– MDS/GLUE Schema
– INCA?
• Security
– GSI based authentication
15
Download