Building Science Gateways with EnginFrame Life Science example Maurizio Melato e-mail: maurizio@nice-software.com At the beginning… At the beginning command NFS line… Aliases was the FTP Scripts Users Repositor y DOE Scripting Library Versionin g CLI Compute-/Data-Grid CRAS Middlewares H! Linux IP Protection LSF …but the complexity handled arose and arose Diskby users Window quota s Queue Convert Resource Distributed and heterogeneous Data Sources Passwor Distributeddand heterogeneous Computing Resources FlexLM (Grid/Compute/Visualization Farm) Restart At first glance, simple tools and technology, Teamwork light… Executio n host Working directory The Web (r)evolution… Users Web interface to the Grid: Grid Portals Grid Portal Scripting CLI Compute-/Data-Grid Middlewares Distributed and heterogeneous Data Sources Distributed and heterogeneous Computing Resources (Grid/Compute/Visualization Farm) At first glance, the allpurpose-every-day-doeverything solution Portals as glue-technology integrate services, tools and applications Users may have various level of customizations on both layout and contents They are *general* purpose and any specific need requires to be addressed and developed. The Science Gateway perspective… Users A community-developed set of tools, applications, and data that is integrated via a portal or a suite of applications Science Gateway Portal Scripting CLI Applications & Tools Compute-/Data-Grid Middlewares Distributed and heterogeneous Data Sources Distributed and heterogeneous Computing Resources (Grid/Compute/Visualization Farm) Other Community specific Data Sources SGs are specializations of Portals for specific scientific communities. SG is customized to meet the needs of the targeted community SG provides a a common interface configured for optimal use. SG allows researchers to focus on their research and fostering collaborations The Science Gateway perspective… Gateways are independent projects, each of which has its own guidelines, requirements and constraints. But they have similar technological challenges: – – – – – Compute-/Data-Grid integration Authentication/Authorization Collaboration mechanisms Tools & Application integration … Does the wheel need to be reinvented every time?? Need of Scientific Gateway Framework technology Science Gateway Capabilities Depend on the needs of the specific community Authentication and Authorization Job Execution Services Domain-Specific Computational Applications Resource Discovery Access to Data Collections Data Movement Tools Visualization Hardware and Software Workflows SG: Authentication and Authorization Satisfy the authentication and authorization security constraints of the community Integrating with the target authentication technology Providing the proper authorization mechanism Configurable authentication mechanims – – – – – – – NIS PAM LDAP Windows ActiveDirectory MyProxy X509 Certificates Krb5 Built-in Authorization system with extension points – e.g. custom inheritance of group definitions SG: Job Execution Services Preparation, submission, monitoring and result retrieval Born as abstraction layer and interface on the underlying Job Scheduler Supports many Job Schedulers SG: Domain-Specific Computational Applications Provide high-level vertical services “Computing Portal” was initially adopted by Industrial “communities” – – – – – – Automotive Manufacturing Electronics Oil & Gas Telecommunication Life Sciences …and Research Institutions – INFN - National Institute of Nuclear Physics – CILEA – Lombard Inter-university Consortium for Automatic Computation – CERN A growing number of customers… 10 Energy & Utilities Addax Petroleum, AECL, Amerada Hess, British Gas, CC of Water Resources, Chevron, Conoco-Phillips, DSC-Libya, ENI/Agip, GazPromNeft, Marathon Oil, Nexen, Rosneft, Schlumberger, Sibneft, Sinopec, Slavneft, Sonatrach, Statoil, Talisman Energy, Telecom Italia, TNK-BP, TNNC, TOTAL, TyumenNIIGaz, VNIIGaz, Xinjiang Oil Life Sciences LitBio project, DEISA project, Biolab, Swiss Institute for Bioinformatics, Partners Healthcare, M.D. Anderson Cancer Center High Tech STMicroelectronics, Accent, Samsung SDI, SensorDynamics, Motorola Aerospace & Manufacturing AIRBUS, Air Products and Chemicals, Procter&Gamble, Galileo Avionica, Hamilton Sunstrand, Kimberly Clark, Magellan Aerospace, MTU, Northrop Grumman, P&W, Raytheon, Simpson Strong-Tie Automotive & Industrial Equipment Audi, ARRK, Bridgestone, Bosch, Corus Automotive, Delphi, Elasis/CRF, Ferrari, Brawn GP, Jaguar-LandRover, Lear, Magneti Marelli, McLaren, P+Z, PSA, RedBull Engineering, Swagelok, Suzuki, Toyota, TRW, Volkswagen Research & Education ASSC, CCLRC, CERN, CILEA, CINECA, CNR, CNRS/IN2P3, ENEA, FzU, ICI, IFAE, INFN, ITEP, Harvard Business School, SSCRussia, SDSC, Ferrara Uni, ITU, T.U.Dresden, Trinity College Dublin, Huazhong Normal Uni, Yale University Which applications are used in EnginFrame? EnginFrame snapshots & Technology Overview Services are XML description defining – Input parameters – The action to accomplish (Unix/Windows script, Java, …) EnginFrame Customizable Job Submission 13 User friendly, Application-oriented Job submission Flexible and efficient Input file management Ties in with dynamic enterprise data Such as databases Interactive job submission Hide complexity of Underlying scheduler Monitoring & control Global Job monitoring Cluster & host monitoring Job details & control Output management Data lifecycle managemnet Comprehensive output File manipulation (view, edit, delete, zip, …) Follow-up actions support RESUBMIT jobs – Rapidly edit input files and re-submit with same parameters/settings SG: Resource Discovery The ability to dynamically discover resources and available services To build an indexed collections of the resources New defined services are dynamically published according to authorization settings EF relies on the underlying Grid middlewares for query the availability of new hardware or software resources In A-WARE EU Project custom functionality for dynamic discovering of third party services. SG: Access to Data Collections The ability to access, query and retrieve data collections and their metadata EF plugins provide integration with – gLite Storage and AMGA metadata system – SRB / iRODS datagrid middlewares Functionalities – – – – – – Browse data collections search metadata Integrated file-system view Read and search various audit data Seamless authentication and user mapping Define and run rules SG: Data Movement Tools The capability to provision the required data to a specific location considering network, performance, caching concerns Browsing of local or remote Grid filesystem can be transparent to users Specific services can move data accordingly to user’s needs No analysis is currently performed on performance or network latency concerns EF Data Management Flexible and efficient Input file management EF Data Management Data lifecycle management View or stream Output files SG: Workflows The possibility to design and run workflows (aka “virtual experiments”) made up of basic tasks with inter-dependencies Workflow technologies integrated – Taverna, EF used as a third party webservice provider – Moteur, batch Taverna workflows enactor EU Project A-WARE aimed to develop a Grid worlkflow system – UNICORE Grid middleware – BPMN/BPEL EF and Workflows EF + Moteur EF and Workflows EF in A-WARE SG: Visualization Software Provide high-end visualization tools to visualize, work and collaborate with complex / 3D interactive applications EF Remote visualization integrates – RealVNC – TurboVNC and VirtualGL – Nomachine NX 3D Optimization technologies – IBM Deep Computing Visualization (DCV) – HP Remote Graphics Software (RGS) – Sun OpenGL Session Management from the Web Collaboration capabilities via session sharing EF + Visualization IBM DCV EF + Visualization Seamless Interactive Application Integration Portal case study: Remote 3D visualization 28 See demo online! Collaborate Application isolation (users do not need access to command line) Life Science Application Example How many steps you need to build and run your own application in EnginFrame portal? How much development effort it will take? Going practical... Here the steps an EF developer should follow to build and expose his own application The use case is a Survival Analysis service The service performs an analysis on data from different domains and with different tools Step 0: Use case analysis Analyse large microarray datasets for breast cancer prognosis assessment Concatenate clinical data and microarray results Mix of custom and R/Bioconductor programs Automatic analysis and plot creation Demo available at: http://ada.dist.unige.it:8080/enginframe/bioinf Step 1: Prepare Components Choose the pieces you already have – – – – Existing R and Bioconductor analysis scripts Existing CLI tools with parameters A bit of directory structure on the filesystem Bash (or similar) script you have to submit code Nothing is “automagic” but the probability you will be able to recycle existing work is really high – If not, we're talking about ~50 lines of bash script! Step 2: The EF Service Definition File… Step 3: …and the corresponding Web GUI Just custom background ! Submission form Step 4: Monitor Execution Step 5: View Results! The End Thanks for your attention!