SC-Workshop - OSG Document Database

advertisement
Interoperability Achieved by GADU in using multiple Grids.
OSG, Teragrid and ANL Jazz
Mathematics and Computer Science Division
Argonne National Laboratory
Computational Institute
University of Chicago
Presented by:
Dinanath Sulakhe
GADU Applications…
Its all about Comparative analysis
Insights of Biology are gained by Comparative Analysis:


Unknown genes are compared against known.
Similar genes tend to perform same functions.
Comparative analysis to know what is same and different between two strains of
an Organism:

Example: What is different a organism living Boiling temperature such as 108 deg
Celsius and the one living in extreme freezing conditions.
Difference between Pathogenic and non-pathogenic organisms.

Mycobecterium Tuberculosis is a Pathogen causing TB, is only 12 genes different from
the non-pathogenic BCG used as vaccine against TB.
Tools
BLAST , Blocks, Chisel, Interpro etc..
An embarrassingly parallel workload.
GADU’s evolution ..
GADU Just evolved into what it is today.
Chiba City at Argonne.
Jazz Cluster at Argonne.
Grid2003 to OSG
Teragrid
All of them togeather.
Status
Some Results and Highlights
GADU can successfully use OSG and Teragrid
resources simultaneously.
Individual clusters such as ANL Jazz is also used
parallely.
Site Name
Site Test
MaxNodes
Gridcat
ASGC_OSG
18
199
Pass
FNAL_FERMIGRID
12
12
Pass
FNAL_GPFARM
266
749
Pass
GRASE-CCR-U2
114
2112
Pass
FAIL_TIMEOUT
252
Pass
OSG_LIGO_PSU
28
312
Pass
Purdue-ITaP
13
1224
Pass
Purdue-Physics
14
63
Pass
FAIL_TIMEOUT
672
Pass
279
268
Pass
FAIL_TIMEOUT
771
Pass
18
154
Inactive
UWMadisonCMS
FAIL_TIMEOUT
90
Pass
grow-UNI-P
FAIL_TIMEOUT
17
Pass
TG_UC
44
316
NONE
TG_NCSA
55
1000
NONE
FAIL_FTP
1024
NONE
Nebraska
Site selection and scheduling across multiple grids.
Easily add a new site into the pool of sites.
STAR-BNL
UFlorida-PG
Last Run .. ( Last week)
UMATLAS
UTA_DPCC
Ran 38830 BLAST Jobs
70% OSG
30% Teragrid
TG_PURDUE
Grid Resources..
Open Science Grid and Teragrid.
Authentication.


OSG
OSG : GADU VOMS Server.
DOE Grid Certificates are automatically picked by the Sites.
TeraGrid
Individual Accounts via Allocations.
Manually adding DOE Grid certificates to each site. (gx-map).
Application Deployment.


OSG
OSG variables, $OSG_APP and $OSG_DATA is used to install GADU’s
applications and pre-stage the databases such as NR.
TeraGrid
GADU has a Community space on each of the sites available.
Applications and installed within this community space.
Resource Independent GADU.
GADU uses Pegasus
based VDS and Condor-G
Abstract Workflow
as VDL
GADU’s automated
Analysis Server,
expressing,
executing and
tracking the scientific
workflows on Grid.
tc.data
Pool.config
Pegasus
Condor Submit
files
Information Services
DAGMan
Condor-G
Submit Host
Query Interface
Database
Globus
GRAM Interface
Controller
Remote
Resources
Gatekeeper
JobManager
Gatekeeper
JobManager
Gatekeeper
JobManager
Job management
system
Job management
system
Job management system
WN
WN
WN
WN
WN
WN
WN
WN
WN
Resource Independent GADU.
GADU uses Pegasus based VDS and Condor-G
The Workflow Generator in GADU is
responsible for producing a
workflow suitable for execution in
the Grid environment. This task is
accomplished through the use of
the “virtual data language” (VDL).
Once the VDL for the workflow is
written, VDS converts it into condor
submit files and a DAG that can be
submitted to the site selected by
the site selector.
TR FileBreaker(input filename, none nodes, output sequences[], none species) {
argument = ${species};
argument = ${filename};
argument = ${nodes};
profile globus.maxwalltime = "300";
}
TR BLAST( none OutPre, none evalue, input query[], none type )
{
argument = ${OutPre};
argument = ${evalue};
profile globus.maxwalltime = "300";
}
DV jobNo_1_1separator->FileBreaker
(
filename=@{input:"inputfile.1"|rt},
nodes="5",
sequences=[@{output:"job1.0":"tmp"},
@{output:"job1.1":"tmp"},
@{output:"job1.2":"tmp"},
@{output:"job1.3":"tmp"},
@{output:"job1.4":"tmp"} ],
species="Aeropyrum_Pernix"
)
….
VDL for BLAST workflow
Resource Independent GADU.
4 Million
sequences
ATGCATGCA
ATGCATGCA
1000
sequences
Fig. Example of a Dag representing the workflow.
Resource Independent GADU.
Representing a Site and the applications on it..
pool ANL_Jazz
{
lrc "rls://gnare.mcs.anl.gov“
gridftp "gsiftp:// jmayor1.lcrc.anl.gov:2812/soft/apps/gadu"
gridlaunch "/soft/apps/gadu/bin/kickstart"
workdir "/soft/apps/gadu/vdldata"
universe vanilla "jmayor1.lcrc.anl.gov:2121/jobmanager-pbs"
universe globus "jmayor1.lcrc.anl.gov:2121/jobmanager-pbs"
universe transfer " jmayor1.lcrc.anl.gov:2812/jobmanager-fork"
}
….
pool.config
#SITE
Transformation
PFN
TYPE
ANL_Jazz
BLAST
/soft/apps/BLAST/bin/blastall null
ANL_Jazz
Blocks
/soft/apps/run-Blocks.pl
null
ANL_Jazz
Chisel
/soft/apps/chisel/runChisel.pl null
ANL_Jazz
IPRSCAN /soft/apps/iprscan_wrapper.pl null
ANL_Jazz
globus-url-copy /soft/apps/packages/globus-2.2.4/bin/globus-url-copy
GLOBUS_LOCATION=/soft/apps/packages/globus-2.2.4/;
LD_LIBRARY_PATH=/soft/apps/packages/globus-2.2.4/lib;
PATH=/soft/apps/packages/globus-2.2.4/bin
tc.data
Resource Independent GADU.
GADU uses Pegasus
based VDS and Condor-G
Abstract Workflow
as VDL
GADU’s automated
Analysis Server,
expressing,
executing and
tracking the scientific
workflows on Grid.
tc.data
Pool.config
Pegasus
Condor Submit
files
Information Services
DAGMan
Condor-G
Submit Host
Query Interface
Database
Globus
GRAM Interface
Controller
Remote
Resources
Gatekeeper
JobManager
Gatekeeper
JobManager
Gatekeeper
JobManager
Job management
system
Job management
system
Job management system
WN
WN
WN
WN
WN
WN
WN
WN
WN
Requirements ... Information Services.
VDS like System can to provide an Architecture independent
mechanism to use different sites (Grids)
In order to automatically add a new Grid site, we need information about the site:
Information Services at various levels
Authentication – To check if the certs are valid at this site.
Architecture – Is it an ia-32 cluster or an ia-64 ?
Gatekeeper, GridFtp Server.
Environment Variables – $OSG_APP, $TG_COMMUNITY
Number of CPUs
Number of Used CPUs.
Number of Idle CPUs.
VO (user) specific jobs running at a given site.
VO (user) specific jobs sitting in QUEUE at a given site (why?)
We a need standards and protocols for these Information Services and identify more
information variables that needs to published by the Grids.
Gridcat or MDS or something else.
Currently GADU uses GridCat to collect site specific information for OSG and manually
adds information for TeraGrid and Jazz. We are working on an MDS based information
interface on TeraGrid.
Another Big Challenge.. Site Selection.
GADU has access to 60 OSG Sites and 5 TeraGrid Sites.
One challenge in using
the Grid reliably for
high-throughput
analysis is monitoring
the state of all Grid
sites and how well
they have performed
for job requests from a
given submit host.
We view a site as
“available” if our
submit host can
communicate with it, if
it is responding to
Globus job-submission
commands, and if it
will run our jobs
promptly, with minimal
queuing delays
site_tester.pl
(each child process writes to
the site status file below)
3
5
Test job for each site
Run parallelly –Forking
GRID3
OSG
JAZZ
PDSF
UBuffalo
Site Status File:
4
status | test-time* | site
1
10
jazz
0
FAIL
pdsf
#1
80
sdsc – tg
….
* - time in secs.
# - manually forced to not to use
1 - working site.
0 - site failed
Site_selector.pl
get_all_working_sites
foreach working_site
{
get_condor_q details.
if (#of jobs in Q == 0)
&&
if ( toal # jobs on the
host
< max_allowed )
select the site.
}
ANL
…..
SDSC
…..
6
condor_q –global -globus
ID | .. .. | manager
1
jazz
1.1
jazz
2
Ubuff
…..
2
| ST
R
R
Q
| ..
blast..
blast..
blast..
7
Site Info File:
site | #max_nodes | nodes/batch |seqs/node
jazz
pdsf
sdsc
…..
360
500
70
30
40
10
100
100
150
Sequences/batch = nodes/batch x seqs/node
Request a site
get_selected_site_details
Get site with details
return (@site_and_details)
TeraGrid
Blast/Blocks
Server
GADU Server
1
Another Big Challenge.. Site Selection.
GADU has access to 60 OSG Sites and 5 TeraGrid Sites.
Web Interface to Control the Selection of Sites for GADU:
http://compbio.mcs.anl.gov/sulakhe/cgi-bin/site_selection_new.pl?user=dina
Web Interface showing live status of usage:
http://compbio.mcs.anl.gov/gaduvo/gadu_jobs.cgi
Grid may not worry about this…
Next Steps..
• Working with Teragrid Information Services group – MDS based
interface.
• Continue to improve GADU’s implementation of Site Selection.
• Trying to generalize Site Selection using the Information Services
such as MDS and Gridcat.
• Continue to deploy faster scientific applications for the
Bioinformatics Group at Argonne.
Acknowledgements
Bioinformatics Group:
Natalia Maltsev, PI
• Alex Rodriguez
• Elizabeth Glass
• Mark D’ Souza
• Mustafa Syed
• Yi Zhang
Globus and VDS
• Mike Wilde
• Nika Nefedova
• Jens Voeckler
• Ian Foster
• Rick Stevens
• VDT Support.
• Condor Support.
• Systems at MCS.
Open Science Grid
• Thanks to Ruth Pordes and OSG team for their wonderful support
TeraGrid
• Charlie Catlett
• Special thanks to David O’Neal, Joeseph Insley, and Sergiu Sanielevici
Download