Web Portals, Expert Systems and Integration of Grid-enabled HPC Applications Robert Allan e-Science Centre and UK Grid Support Centre, Daresbury Laboratory 22nd October 2001 CLRC Web Portals Portals Original Key Challenges Accessibility Access to Grid resources from desktop and familiar environments. Increase usability of applications to widen HPC community. Development of HPCGrid, Data and Visualisation Portals. Resource Discovery Seek out appropriate resources with the required software installed Distributed Management and Accounting Register users, install applications, implement charging/ accounting mechanisms. Security Who can have access to what ? Authenticate users via peer review and inform local system managers. Provide X.509 certificates. Common interface to Data repositories common meta-data and query interface, Advanced Visualisation Tools GUI and Visualisation for multi-dimensional data structures Integrated Architecture Making the Grid Persistent Web Services model ? and Pervasive! 22nd October 2001 CLRC Web Portals 22nd October 2001 CLRC Web Portals HPCGrid What we have done ? Installation of Globus and Condor on a small testbed at Daresbury: Similar testbeds at other HEC sites (Manchester, Edinburgh, RAL) User Guides: “Globus Guide”, “Portal Guide”, “Globus Evaluation Report” Architecture papers: “Accounting and Billing for the Grid”, “HPCGrid Portal Architecture”, “Integrated CLRC Portal Architecture”, “UK Skeleton Grid” Installation of Apache server, Perl and CGI s/w for Portal development Evaluation of HotPage and GridPort from SDSC, Java CoG kit from Argonne and GPDK from NCSA HPCGrid Portal: http://esc.dl.ac.uk/HPCPortal Web pages: http://www.e-science.clrc.ac.uk EPSRC has an extensive range of HPC applications Will need Grid middleware beyond the “data grid” model. Aim to develop domain-specific Problem Solving Environments similar to ECCE. 22nd October 2001 CLRC Web Portals Daresbury Grid Testbed IBM PPC (AIX, MyProxy, server) IBM PPC cluster (4xPPC, AIX, Web server) Beowulf1 (32xPII, Linux, PBS) SP (48xPower, AIX, Loadleveler) LoadLeveler Loki (64xAlpha, Linux, PBS) Globus Condor Condor SUN cluster (2xUltra, Solaris, GIIS server) 22nd October 2001 CLRC Web Portals Collaborations USA SDSC (Reagan Moore, Mary Thomas) SRB, HotPage and GridPort Argonne (Steve Tueke, Gregor von Laszewski) - Globus NCSA and NLANR (Randy Butler, Doru Marcusiu) - GPDK, ClearingHouse and distributed resource management Oak Ridge (Al Geist) - parallel applications USC (Carl Kesselman) - Globus Europe Regional Centres U. Portsmouth (Mark Baker) clusters, Grid and Portals U. Salford (Nick Avis) visualisation and VR U. Lecce, Italy (Giovanni Aloisio) - GRB, Web portals ZIB Berlin - Meta-computing and visualisation Pittsburgh Supercomputing Center parallel applications Pacific Northwest National Laboratory (Karen Schuchardt) - problem solving environments (ECCE) 22nd October 2001 + many others! e.g. EuroGrid, DataGrid, DataTAG... CLRC Web Portals Security - Based on GSI • • • GSI (Globus) used for user-authenticated connections: – allows for transparent access to Grid resources through Globus infrastructure – provides feature-rich toolkit for developer to access the different platforms in an abstract way – API to Globus toolkit rather than specific architecture Single login to Portal provides access to all Grid resources Users must have a digital certificate signed by a known Certificate Authority (CA) ca@grid-support.ac.uk – Accounts for UK e-Science and GridPP users may be obtained via an on-line Web forms in the future, currently uses Globus tools and e-mail 22nd October 2001 CLRC Web Portals MyProxy Server • • • Repository for proxy credentials uses GSI delegation mechanism Secure server 22nd October 2001 CLRC Web Portals SDSC Grid Portal Toolkit (GridPort) • • • • SDSC are generalising HotPage infrastructure into a re-usable toolkit Developing multiple components/APIs: – Portal services – Portal developer API (Perl/CGI, Java in version 2) – User API (www) Toolkit will facilitate connectivity to the computational Grid infrastructures by providing middle-tier layer to Grid services e.g.: • Globus, Legion • Apples, NWS • Other tools just need Perl/CGI interfaces/wrappers Portals Services can be based on – Application runs on SDSC portals and Web servers – Apps run on local webserver, and use portal services – Sites can run own portal services by installing GridPort toolkit • Key advantage: automatically hook into Grid 22nd October 2001 CLRC Web Portals SDSC Grid Portal Toolkit (GridPort) • • Current application development efforts include: – Pharmacokinetic Modeling: the development of algorithms used to model drug behaviors in human systems – GAMESS: Computational 3-D molecular modeling/ visualization tool – National Biomedical Computation Resource - Cardiac Physiology – Tomography (Ellisman, Haddida-Hassan) Key goal: application development based the GridPort toolkit is a task that can be done by general users – toolkit will allow users to create own web pages – run web pages on home system – facilitates development of applications by the user: • simple web technologies such as HTML, JavaScript and Perl/CGI (XML) 22nd October 2001 CLRC Web Portals HPCGrid uses NPACI HotPage • • • • HotPage initiated by Jay Boisseau (now at U. Texas) in 1987 – before term “portal” was popularised - nevertheless it is a user portal for monitoring comput resources – Expanding to include applications, etc. HotPage makes performing most computational tasks easier, thus increasing effectiveness of scientists: – Initial target is individual user or small application – Scaleable and customisable HotPage software is highly portable: – incorporation of other Grid toolkits and applications typically requires only an interface script (includes object-based services) Design philosophy of the HotPage is similar to the "bag of services" model adopted by the Globus project: – HotPage Web sites can use any subset of features, provided that the required infrastructure is running on the system (server, perl) 22nd October 2001 CLRC Web Portals HPCGrid Services Portal Portals can simplify: User session management Computational batch services How to specify the target hardware How to deploy user code on target Availability of licenses on demand Need: • Observation/ Expt. instrument access • Visualisation and VR • Link to resource-brokering middleware • Link to accounting middleware Data access Choose appropriate centre or cache Validity of data Some technology inherited from Web Self-describing data formats for inter-disciplinary projects 22nd October 2001 CLRC Web Portals HotPage Architecture and Implementation • • • HotPage is based on simple commodity Web technologies: – Interface based on infrastructure provided by World Wide Web, rather than command line or application such as a Java applet • Key requirement: HotPage runs anywhere, on any browser – Web interface is common, well understood, and pervasive Simple to implement, support, and develop: – HotPage scripts will run on any system capable of supporting a Web server, Perl, CGI, and SSH connections to remote hosts • key advantage to Perl: it runs on most systems (including CRAY) – SSI and JavaScript used to add dynamic features – Requires secure encryption through the SSL protocol (HTTPS) Ease of Use – Simple to learn, and once learned, the tools run the same way for all HPC systems. 22nd October 2001 CLRC Web Portals HotPage Information Services • • Designed to provide a user-oriented interface to Grid resources and services – on-line documentation, static informational pages, and links to events within a virtual organisation (Grid), including basic user information such as Simple tools: – Application search, systems information, batch script generator – Status bar: live updates of operational status and utilisation of all compute resources – Machine Usage: displays summary of machine status, load, and batch queues – Batch Queues: displays currently executing and queued jobs – Node Maps: displays graphical map of how running applications are mapped to nodes – Network Weathering System: provides connectivity information between a user’s local host and grid resources 22nd October 2001 CLRC Web Portals HotPage Services Schema 22nd October 2001 CLRC Web Portals HPCGrid Resource Information Uses HotPage software from SDSC Augmented with Globus GIIS and GRIS search interfaces Access to information on Grid systems accessible from chosen GIIS server (“micro-Grid” or “meso-Grid”) 22nd October 2001 CLRC Web Portals Interactive Services • • • Interactive services are those that enable users access to accounts on resources Secure access to compute and storage resources: – Single entry point to all Grid resources on which a user has accounts/allocations – Requires login/authentication – Multiple grid services and toolkits can be used Menus allow user to perform common Unix tasks: – submit, monitor, and delete jobs in queues – view output – compile and execute code – manipulate and view files, navigate through file systems – use system commands: chmod, mv, ls, cat, mkdir, cp, rm – perform file transfer: – manage accounts and allocations 22nd October 2001 CLRC Web Portals HPCGrid/ GridPort Interactive Services Schema 22nd October 2001 CLRC Web Portals HPCGrid/ GridPort Overall Architecture 22nd October 2001 CLRC Web Portals HPCGrid Active Grid Services Web-based interfaces to generic Grid services using the Globus C API • Login/ logout • file movement • temporary workspace • job submission • user profile management Compare to commercial Web services for B2B 22nd October 2001 CLRC Web Portals DataPortal Architecture XML wrapper Common metadata catalogue database 22nd October 2001 CLRC DataPortal Server XML wrapper Local metadata Local data Facility 1 CLRC Web Portals Data Server Architecture USER Key: User input interpreter Query Generator User output generator Response Generator XML Parser pre-set XSL Script Internal http module XML Schema External agent Central metadata repository 22nd October 2001 XML File XML File Wrapper for other Catalogues Ascii file CLRC Web Portals Architecture for integrating existing Catalogues DataPortal Server XML Wrapper Request file(s) Key: Internal Internal Internal ANSI or SQL RAS SQL Response Generator SQL input translator Http XML Http SQL XML output generator module Local Metadata Catalogue 22nd October 2001 RasDaMan SRB External External agent agent CLRC Web Portals Possible Integration of SRB RasDaMan BADC Key: DataPortal Internal In Internal two-way SRB Server MCAT External two-way module SRB Agent SRB Agent SRB Agent SRB Agent External agent DB2, Oracle, Illustra, ObjectStore - HPSS, Unitree - Unix, ftp Ongoing work 22nd October 2001 CLRC Web Portals CLRC Metadata example <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE CLRCMetadata SYSTEM "clrcmetadata.dtd"> <CLRCMetadata><MetadataRecord metadataID="N000001"> <Topic> <Discipline>Chemistry</Discipline> <Subject>Crystal Structure</Subject> <Subject>Copper</Subject>... <Experiment> <StudyName>Crystal Structure: Copper : Palladium: :complex: 150K ... <Investigator><Name><Surname>Porter...<Institution>University of Peebles ... <Funding>EPSRC ... <TimePeriod><StartDate><Date>21/04/1999…. <Purpose><Abstract> To study the structure of Copper and Palladium co-ordination complexes at a 150K. <DataManager><Name><Surname>Teat... <Instrument>SRS Station 9.8, BRUKER AXS SMART 1K... <Condition>...Wavelength...<Units>Angstrom...<ParamValue>0.6890... <Condition>…Crystal-to-detector distance<Units>cm...<ParamValue>5.00... <AccessConditions>The user has to be one of: Prof. F. Porter…. 22nd October 2001 CLRC Web Portals Integrated Portal Architecture Design for CLRC Portals using Globus and Web Services Data Systems DataPortal GSI Web Services Web Services GridFTP HPCPortal Web Services HPC Systems Globus 22nd October 2001 Visualisation Working with GGF Grid Computing Environments Research Group CLRC Web Portals Current Portal Work Improved user and session profiles Persistant and pervasive Grid computing using cookies and server-based “desktop” file structure. Access from anywhere. Expert system Introduce domain-specific knowledge to guide users through decisions as part of the data production and analysis process Data management Access to data services via integration with Data Portal and/ or SRB Application-specific GUI and post-processing visualisation tools Plug-in or run as a remote process ? Client-side services using Java What additional functionality can be used ? XML schemas for services and events There is unlikely to be ever just one Web programming standard, so Portal services need to interact and exchange information 22nd October 2001 CLRC Web Portals Long-term Challenges (1) Portability/ mobility of applications for a heterogeneous distributed environment need to promote standard-adhering and portable/ re-usable codes using state-of-art methods with good performance Check-pointing, restart and job migration (e.g. LSF, Condor) Locate “best”/ cheapest service provider - open market ? possible problems with scientific integrity, other countries may provide cheaper. Increasing need for high-end funding. Data storage and movement, must include experimental and archive data combine data with XML interface and metadata descriptions. Implement data searching, fusion and mining tools. 22nd October 2001 CLRC Web Portals Long-term Challenges (2) Development of new algorithms to “de-couple” traditional schemes this includes algorithms which do not have strict update dependency… May be a route to evolving highly scaleable algorithms for applications both on the Grid and HEC systems. Can networks cope with required traffic ? need QoS ? Visualisation and collaborative working needs higher bandwidth. Intelligent Problem Solving Environments Expert systems encapsulate scientific domain knowledge for novices. 22nd October 2001 CLRC Web Portals Publications and URLs Publications R.J. Allan “Survey of Computational Steering, Meta-computing and Network Information Tools” DL-TR-99-002 (Daresbury Laboratory, 1999 and 2000) UKHEC “Grid-based High Performance Computing” (2000) UKHEC “A Review of UK HEC Grid Infrastructure: State-of-the-art and Next Steps” (2000) R.J. Allan et al. “Evaluation of Globus and associated Middleware” (CLRC, 2001) R.J. Allan et al. “A Globus Developer’s Guide” (2001) R.J. Allan “Developing a Web Portal for the Computational Grid” (2001) URLs www.ukhec.ac.uk www.dl.ac.uk/TCSC/UKHEC/GridWorkshop www.dl.ac.uk/TCSC/UKHEC/WG www.dl.ac.uk/TCSC/HPCI/reports.html www.grid-support.ac.uk www.e-science.clrc.ac.uk 22nd October 2001 CD-ROM Grid Starter Kit More stuff available, please call us! CLRC Web Portals