Experience Story 9: Starting a bio-computation facility in Glasgow Micha Bayer

advertisement
Experience Story 9:
Starting a bio-computation facility in Glasgow
Micha Bayer
National e-Science Centre
Glasgow Hub
Overview
• Use of Condor at NeSC Glasgow is very recent
activity
• Condor used in two projects:
– BRIDGES (Biomedical Research Informatics
Delivered by Grid Enabled Services) – this talk
– ETF Condor testbed (covered in John Kewley’s talk
on Monday)
Condor use in BRIDGES
• BRIDGES project uses grid technology to
support research into the genetics of
cardiovascular disease/hypertension
• Includes compute intensive tasks such as
sequence comparison
• Parallelised BLAST module is part of BRIDGES
toolkit
BLAST
• Very widely used sequence comparison tool
• Takes one or more query sequences as input and
compares them to a set of target sequences
• Compute intensive – target data can be large
• Very easily parallelised – our version partitions input into
subjobs
• One BRIDGES use case involves blasting microarray
chip reporter sequences against human genome: 20,000
input sequences vs 800 mb of target
BRIDGES BLAST Job Submission
ScotGRID
masternode
NESC GT3 Front End
to ScotGRID
end user
PC
GridBLAST client
send job
request
return
result
GT 3 grid service
PBS client
side +
Java
wrapper
BRIDGES
Scheduler
ScotGRID
worker
nodes
PBS server
side
+
BLAST
jobs farmed out to
compute nodes
Condor
schedd +
Java
wrapper
• Establish number of available nodes on each
resource
• Split input file into as many subjobs as the total
number of nodes available
• Send n subjobs to each resource where n is number
of available nodes there
Condor
+
BLAST
Condor Central
Manager
NESC
Condor pool
Data file issues
• BLAST involves input files and target data files
• Target files are large (several gb) and growing
exponentially in size – need to be kept close to
computation
• Currently have the target data sitting on execute nodes –
NFS soon to be installed
• Input + output files are staged in/out from submit server
with each subjob using Condor file transfer
• Scheduler combines results and service returns them in
SOAP message
Our current Condor pool
• One cluster of 21Linux desktop
machines in NeSC training lab
• Currently still experimental - only
3 machines used
• Configured machines so that jobs
run straight away and always (lab
rarely used)
• Limited number of users has
started testing the service recently
– seems to work fine so far
Initial impressions
•
•
•
•
Found Condor very easy to install
Things do work as promised!
Documentation is excellent
A few minor configuration issues involving jobs
not running immediately (negotiator interval,
write permissions, job priorities)
The Future
• Plan to extend the pool to department and
perhaps campus level
• Existing models from other universities are
encouraging precedents
• Depends to some extent on staff available
Download