Interoperability Achieved by GADU in using multiple Grids. OSG, Teragrid and ANL Jazz Mathematics and Computer Science Division Argonne National Laboratory Computational Institute University of Chicago Presented by: Dinanath Sulakhe GADU Applications… Its all about Comparative analysis Insights of Biology are gained by Comparative Analysis: Unknown genes are compared against known. Similar genes tend to perform same functions. Comparative analysis to know what is same and different between two strains of an Organism: Example: What is different a organism living Boiling temperature such as 108 deg Celsius and the one living in extreme freezing conditions. Difference between Pathogenic and non-pathogenic organisms. Mycobecterium Tuberculosis is a Pathogen causing TB, is only 12 genes different from the non-pathogenic BCG used as vaccine against TB. Tools BLAST , Blocks, Chisel, Interpro etc.. An embarrassingly parallel workload. GADU’s evolution .. GADU Just evolved into what it is today. Chiba City at Argonne. Jazz Cluster at Argonne. Grid2003 to OSG Teragrid All of them togeather. Status Some Results and Highlights GADU can successfully use OSG and Teragrid resources simultaneously. Individual clusters such as ANL Jazz is also used parallely. Site Name Site Test MaxNodes Gridcat ASGC_OSG 18 199 Pass FNAL_FERMIGRID 12 12 Pass FNAL_GPFARM 266 749 Pass GRASE-CCR-U2 114 2112 Pass FAIL_TIMEOUT 252 Pass OSG_LIGO_PSU 28 312 Pass Purdue-ITaP 13 1224 Pass Purdue-Physics 14 63 Pass FAIL_TIMEOUT 672 Pass 279 268 Pass FAIL_TIMEOUT 771 Pass 18 154 Inactive UWMadisonCMS FAIL_TIMEOUT 90 Pass grow-UNI-P FAIL_TIMEOUT 17 Pass TG_UC 44 316 NONE TG_NCSA 55 1000 NONE FAIL_FTP 1024 NONE Nebraska Site selection and scheduling across multiple grids. Easily add a new site into the pool of sites. STAR-BNL UFlorida-PG Last Run .. ( Last week) UMATLAS UTA_DPCC Ran 38830 BLAST Jobs 70% OSG 30% Teragrid TG_PURDUE Grid Resources.. Open Science Grid and Teragrid. Authentication. OSG OSG : GADU VOMS Server. DOE Grid Certificates are automatically picked by the Sites. TeraGrid Individual Accounts via Allocations. Manually adding DOE Grid certificates to each site. (gx-map). Application Deployment. OSG OSG variables, $OSG_APP and $OSG_DATA is used to install GADU’s applications and pre-stage the databases such as NR. TeraGrid GADU has a Community space on each of the sites available. Applications and installed within this community space. Resource Independent GADU. GADU uses Pegasus based VDS and Condor-G Abstract Workflow as VDL GADU’s automated Analysis Server, expressing, executing and tracking the scientific workflows on Grid. tc.data Pool.config Pegasus Condor Submit files Information Services DAGMan Condor-G Submit Host Query Interface Database Globus GRAM Interface Controller Remote Resources Gatekeeper JobManager Gatekeeper JobManager Gatekeeper JobManager Job management system Job management system Job management system WN WN WN WN WN WN WN WN WN Resource Independent GADU. GADU uses Pegasus based VDS and Condor-G The Workflow Generator in GADU is responsible for producing a workflow suitable for execution in the Grid environment. This task is accomplished through the use of the “virtual data language” (VDL). Once the VDL for the workflow is written, VDS converts it into condor submit files and a DAG that can be submitted to the site selected by the site selector. TR FileBreaker(input filename, none nodes, output sequences[], none species) { argument = ${species}; argument = ${filename}; argument = ${nodes}; profile globus.maxwalltime = "300"; } TR BLAST( none OutPre, none evalue, input query[], none type ) { argument = ${OutPre}; argument = ${evalue}; profile globus.maxwalltime = "300"; } DV jobNo_1_1separator->FileBreaker ( filename=@{input:"inputfile.1"|rt}, nodes="5", sequences=[@{output:"job1.0":"tmp"}, @{output:"job1.1":"tmp"}, @{output:"job1.2":"tmp"}, @{output:"job1.3":"tmp"}, @{output:"job1.4":"tmp"} ], species="Aeropyrum_Pernix" ) …. VDL for BLAST workflow Resource Independent GADU. 4 Million sequences ATGCATGCA ATGCATGCA 1000 sequences Fig. Example of a Dag representing the workflow. Resource Independent GADU. Representing a Site and the applications on it.. pool ANL_Jazz { lrc "rls://gnare.mcs.anl.gov“ gridftp "gsiftp:// jmayor1.lcrc.anl.gov:2812/soft/apps/gadu" gridlaunch "/soft/apps/gadu/bin/kickstart" workdir "/soft/apps/gadu/vdldata" universe vanilla "jmayor1.lcrc.anl.gov:2121/jobmanager-pbs" universe globus "jmayor1.lcrc.anl.gov:2121/jobmanager-pbs" universe transfer " jmayor1.lcrc.anl.gov:2812/jobmanager-fork" } …. pool.config #SITE Transformation PFN TYPE ANL_Jazz BLAST /soft/apps/BLAST/bin/blastall null ANL_Jazz Blocks /soft/apps/run-Blocks.pl null ANL_Jazz Chisel /soft/apps/chisel/runChisel.pl null ANL_Jazz IPRSCAN /soft/apps/iprscan_wrapper.pl null ANL_Jazz globus-url-copy /soft/apps/packages/globus-2.2.4/bin/globus-url-copy GLOBUS_LOCATION=/soft/apps/packages/globus-2.2.4/; LD_LIBRARY_PATH=/soft/apps/packages/globus-2.2.4/lib; PATH=/soft/apps/packages/globus-2.2.4/bin tc.data Resource Independent GADU. GADU uses Pegasus based VDS and Condor-G Abstract Workflow as VDL GADU’s automated Analysis Server, expressing, executing and tracking the scientific workflows on Grid. tc.data Pool.config Pegasus Condor Submit files Information Services DAGMan Condor-G Submit Host Query Interface Database Globus GRAM Interface Controller Remote Resources Gatekeeper JobManager Gatekeeper JobManager Gatekeeper JobManager Job management system Job management system Job management system WN WN WN WN WN WN WN WN WN Requirements ... Information Services. VDS like System can to provide an Architecture independent mechanism to use different sites (Grids) In order to automatically add a new Grid site, we need information about the site: Information Services at various levels Authentication – To check if the certs are valid at this site. Architecture – Is it an ia-32 cluster or an ia-64 ? Gatekeeper, GridFtp Server. Environment Variables – $OSG_APP, $TG_COMMUNITY Number of CPUs Number of Used CPUs. Number of Idle CPUs. VO (user) specific jobs running at a given site. VO (user) specific jobs sitting in QUEUE at a given site (why?) We a need standards and protocols for these Information Services and identify more information variables that needs to published by the Grids. Gridcat or MDS or something else. Currently GADU uses GridCat to collect site specific information for OSG and manually adds information for TeraGrid and Jazz. We are working on an MDS based information interface on TeraGrid. Another Big Challenge.. Site Selection. GADU has access to 60 OSG Sites and 5 TeraGrid Sites. One challenge in using the Grid reliably for high-throughput analysis is monitoring the state of all Grid sites and how well they have performed for job requests from a given submit host. We view a site as “available” if our submit host can communicate with it, if it is responding to Globus job-submission commands, and if it will run our jobs promptly, with minimal queuing delays site_tester.pl (each child process writes to the site status file below) 3 5 Test job for each site Run parallelly –Forking GRID3 OSG JAZZ PDSF UBuffalo Site Status File: 4 status | test-time* | site 1 10 jazz 0 FAIL pdsf #1 80 sdsc – tg …. * - time in secs. # - manually forced to not to use 1 - working site. 0 - site failed Site_selector.pl get_all_working_sites foreach working_site { get_condor_q details. if (#of jobs in Q == 0) && if ( toal # jobs on the host < max_allowed ) select the site. } ANL ….. SDSC ….. 6 condor_q –global -globus ID | .. .. | manager 1 jazz 1.1 jazz 2 Ubuff ….. 2 | ST R R Q | .. blast.. blast.. blast.. 7 Site Info File: site | #max_nodes | nodes/batch |seqs/node jazz pdsf sdsc ….. 360 500 70 30 40 10 100 100 150 Sequences/batch = nodes/batch x seqs/node Request a site get_selected_site_details Get site with details return (@site_and_details) TeraGrid Blast/Blocks Server GADU Server 1 Another Big Challenge.. Site Selection. GADU has access to 60 OSG Sites and 5 TeraGrid Sites. Web Interface to Control the Selection of Sites for GADU: http://compbio.mcs.anl.gov/sulakhe/cgi-bin/site_selection_new.pl?user=dina Web Interface showing live status of usage: http://compbio.mcs.anl.gov/gaduvo/gadu_jobs.cgi Grid may not worry about this… Next Steps.. • Working with Teragrid Information Services group – MDS based interface. • Continue to improve GADU’s implementation of Site Selection. • Trying to generalize Site Selection using the Information Services such as MDS and Gridcat. • Continue to deploy faster scientific applications for the Bioinformatics Group at Argonne. Acknowledgements Bioinformatics Group: Natalia Maltsev, PI • Alex Rodriguez • Elizabeth Glass • Mark D’ Souza • Mustafa Syed • Yi Zhang Globus and VDS • Mike Wilde • Nika Nefedova • Jens Voeckler • Ian Foster • Rick Stevens • VDT Support. • Condor Support. • Systems at MCS. Open Science Grid • Thanks to Ruth Pordes and OSG team for their wonderful support TeraGrid • Charlie Catlett • Special thanks to David O’Neal, Joeseph Insley, and Sergiu Sanielevici