Building Science Gateways with EnginFrame Life Science example Maurizio Melato e-mail:

advertisement
Building Science Gateways with EnginFrame
Life Science example
Maurizio Melato
e-mail: maurizio@nice-software.com
At the beginning…

At the beginning
command NFS
line…
Aliases was the
FTP
Scripts
Users
Repositor
y

DOE
Scripting
Library
Versionin
g
CLI
Compute-/Data-Grid
CRAS Middlewares
H!
Linux
IP
Protection
LSF
…but the complexity handled
arose and arose
Diskby users
Window
quota
s
Queue

Convert
Resource
Distributed and heterogeneous
Data Sources
Passwor
Distributeddand heterogeneous
Computing Resources FlexLM
(Grid/Compute/Visualization Farm)
Restart
At first glance, simple
tools
and technology,
Teamwork light…
Executio
n host
Working
directory
The Web (r)evolution…
Users

Web interface to the Grid: Grid Portals

Grid Portal
Scripting
CLI

Compute-/Data-Grid Middlewares


Distributed and heterogeneous
Data Sources
Distributed and heterogeneous
Computing Resources
(Grid/Compute/Visualization Farm)
At first glance, the allpurpose-every-day-doeverything solution
Portals as glue-technology
integrate services, tools and
applications
Users may have various level
of customizations on both
layout and contents
They are *general* purpose
and any specific need
requires to be addressed and
developed.
The Science Gateway perspective…

Users
A community-developed set of tools,
applications, and data that is integrated via a
portal or a suite of applications
Science Gateway

Portal
Scripting
CLI
Applications
&
Tools
Compute-/Data-Grid Middlewares
Distributed and
heterogeneous
Data Sources
Distributed and heterogeneous
Computing Resources
(Grid/Compute/Visualization Farm)


Other Community specific
Data Sources

SGs are specializations of
Portals for specific scientific
communities.
SG is customized to meet the
needs of the targeted
community
SG provides a a common
interface configured for
optimal use.
SG allows researchers to
focus on their research and
fostering collaborations
The Science Gateway perspective…


Gateways are independent projects, each of which has its
own guidelines, requirements and constraints.
But they have similar technological challenges:
–
–
–
–
–

Compute-/Data-Grid integration
Authentication/Authorization
Collaboration mechanisms
Tools & Application integration
…
Does the wheel need to be reinvented every time??
 Need of Scientific Gateway Framework technology
Science Gateway Capabilities
Depend on the needs of the specific community








Authentication and Authorization
Job Execution Services
Domain-Specific Computational Applications
Resource Discovery
Access to Data Collections
Data Movement Tools
Visualization Hardware and Software
Workflows
SG: Authentication and Authorization

Satisfy the authentication and authorization security constraints of the
community
Integrating with the target authentication technology
Providing the proper authorization mechanism

Configurable authentication mechanims


–
–
–
–
–
–
–

NIS
PAM
LDAP
Windows ActiveDirectory
MyProxy
X509 Certificates
Krb5
Built-in Authorization system with extension points
– e.g. custom inheritance of group definitions
SG: Job Execution Services



Preparation, submission, monitoring and result retrieval
Born as abstraction layer and interface on the underlying Job Scheduler
Supports many Job Schedulers
SG: Domain-Specific Computational Applications


Provide high-level vertical services
“Computing Portal” was initially adopted by Industrial “communities”
–
–
–
–
–
–

Automotive
Manufacturing
Electronics
Oil & Gas
Telecommunication
Life Sciences
…and Research Institutions
– INFN - National Institute of Nuclear Physics
– CILEA – Lombard Inter-university Consortium for Automatic Computation
– CERN
A growing number of customers…
10
Energy & Utilities
Addax Petroleum, AECL, Amerada Hess,
British Gas, CC of Water Resources,
Chevron, Conoco-Phillips, DSC-Libya,
ENI/Agip, GazPromNeft, Marathon Oil,
Nexen, Rosneft, Schlumberger, Sibneft,
Sinopec, Slavneft, Sonatrach, Statoil,
Talisman Energy, Telecom Italia, TNK-BP,
TNNC, TOTAL, TyumenNIIGaz, VNIIGaz,
Xinjiang Oil
Life Sciences
LitBio project, DEISA project, Biolab, Swiss
Institute for Bioinformatics, Partners
Healthcare, M.D. Anderson Cancer Center
High Tech
STMicroelectronics, Accent, Samsung SDI,
SensorDynamics, Motorola
Aerospace & Manufacturing
AIRBUS, Air Products and Chemicals,
Procter&Gamble, Galileo Avionica, Hamilton
Sunstrand, Kimberly Clark, Magellan
Aerospace, MTU, Northrop Grumman,
P&W, Raytheon, Simpson Strong-Tie
Automotive & Industrial Equipment
Audi, ARRK, Bridgestone, Bosch, Corus
Automotive, Delphi, Elasis/CRF, Ferrari,
Brawn GP, Jaguar-LandRover, Lear,
Magneti Marelli, McLaren, P+Z, PSA,
RedBull Engineering, Swagelok, Suzuki,
Toyota, TRW, Volkswagen
Research & Education
ASSC, CCLRC, CERN, CILEA, CINECA,
CNR, CNRS/IN2P3, ENEA, FzU, ICI, IFAE,
INFN, ITEP, Harvard Business School, SSCRussia, SDSC, Ferrara Uni, ITU,
T.U.Dresden, Trinity College Dublin,
Huazhong Normal Uni, Yale University
Which applications are used in EnginFrame?
EnginFrame snapshots & Technology Overview

Services are XML description defining
– Input parameters
– The action to accomplish (Unix/Windows script, Java, …)
EnginFrame Customizable Job Submission
13
User friendly,
Application-oriented
Job submission
Flexible and efficient
Input file management
Ties in with dynamic
enterprise data Such as databases
Interactive job submission
Hide complexity of
Underlying scheduler
Monitoring & control
Global Job
monitoring
Cluster & host
monitoring
Job details &
control
Output management
Data lifecycle
managemnet
Comprehensive output
File manipulation
(view, edit, delete, zip, …)
Follow-up actions
support
RESUBMIT jobs –
Rapidly edit input files and
re-submit with same
parameters/settings
SG: Resource Discovery





The ability to dynamically discover resources and
available services
To build an indexed collections of the resources
New defined services are dynamically published according to
authorization settings
EF relies on the underlying Grid middlewares for query the availability of
new hardware or software resources
In A-WARE EU Project custom functionality for dynamic discovering of
third party services.
SG: Access to Data Collections


The ability to access, query and retrieve data collections
and their metadata
EF plugins provide integration with
– gLite Storage and AMGA metadata system
– SRB / iRODS datagrid middlewares

Functionalities
–
–
–
–
–
–
Browse data collections
search metadata
Integrated file-system view
Read and search various audit data
Seamless authentication and user mapping
Define and run rules
SG: Data Movement Tools




The capability to provision the required data to a specific
location considering network, performance, caching
concerns
Browsing of local or remote Grid filesystem can be transparent to users
Specific services can move data accordingly to user’s needs
No analysis is currently performed on performance or network latency
concerns
EF Data Management
Flexible and efficient
Input file management
EF Data Management
Data lifecycle
management
View or stream
Output files
SG: Workflows


The possibility to design and run workflows (aka “virtual
experiments”) made up of basic tasks with inter-dependencies
Workflow technologies integrated
–
Taverna, EF used as a third party webservice provider
– Moteur, batch Taverna workflows enactor

EU Project A-WARE aimed to develop a Grid worlkflow system
– UNICORE Grid middleware
– BPMN/BPEL
EF and Workflows

EF + Moteur
EF and Workflows

EF in A-WARE
SG: Visualization Software


Provide high-end visualization tools to visualize, work and
collaborate with complex / 3D interactive applications
EF Remote visualization integrates
– RealVNC
– TurboVNC and VirtualGL
– Nomachine NX

3D Optimization technologies
– IBM Deep Computing Visualization (DCV)
– HP Remote Graphics Software (RGS)
– Sun OpenGL


Session Management from the Web
Collaboration capabilities via session sharing
EF + Visualization
IBM
DCV
EF + Visualization
Seamless Interactive Application Integration
Portal case study: Remote 3D visualization
28
See demo
online!
Collaborate
Application
isolation
(users do not need access
to command line)
Life Science Application Example





How many steps you need to build and run your own
application in EnginFrame portal?
How much development effort it will take?
Going practical... Here the steps an EF developer
should follow to build and expose his own application
The use case is a Survival Analysis service
The service performs an analysis on data from different
domains and with different tools
Step 0: Use case analysis




Analyse large microarray datasets
for breast cancer prognosis
assessment
Concatenate clinical data and
microarray results
Mix of custom and R/Bioconductor
programs
Automatic analysis and plot
creation
Demo available at:
http://ada.dist.unige.it:8080/enginframe/bioinf
Step 1: Prepare Components

Choose the pieces you already have
–
–
–
–

Existing R and Bioconductor analysis scripts
Existing CLI tools with parameters
A bit of directory structure on the filesystem
Bash (or similar) script you have to submit code
Nothing is “automagic” but the probability you will be
able to recycle existing work is really high
– If not, we're talking about ~50 lines of bash script!
Step 2: The EF Service Definition File…
Step 3: …and the corresponding Web GUI
Just custom background !
Submission form
Step 4: Monitor Execution
Step 5: View Results!
The End
Thanks for your attention!
Download