Using the Grid for Genomics David Boyd CLRC e-Science Centre

advertisement
Using the Grid for
Genomics
David Boyd
CLRC e-Science Centre
d.r.s.boyd@rl.ac.uk
http://www.e-science.clrc.ac.uk/
11 November 2002
BBSRC Genomics meets Grid Workshop
1
Outline
• Brief introduction to the Grid
• UK e-Science Grid
• Grid Support Centre
• BBSRC Grid Support Service
11 November 2002
BBSRC Genomics meets Grid Workshop
2
What is the Grid?
Experiments
The Grid
Computers
Sensors
Data
Scientists
Displays
Technology that enables persistent shared use of distributed resources
– computing, data, visualisation, instruments, networks –
without needing to know in advance where these are or who owns them
11 November 2002
BBSRC Genomics meets Grid Workshop
3
How does the Grid work?
Applications - eg climate modelling, protein simulation, aircraft design
Grid toolkits - eg data discovery, experiment control, visualisation
Grid services - eg resource scheduling, data transfer, security
Grid resources - eg computers, data archives, instruments, networks
11 November 2002
BBSRC Genomics meets Grid Workshop
4
Some components of the Grid
• Globus Toolkit v2 (GT2)
–
–
–
–
security based on PKI X.509 digital certificates (GSI)
directory service to publish information on resources (MDS)
resource allocation and job submission (GRAM)
efficient file transfer process (GridFTP)
• Condor
– distributes and monitors work across a network of machines
– mature workload management system for compute-intensive jobs
– can harvest unused compute cycles with checkpointing
• Storage Resource Broker
– uniform interface to heterogeneous distributed data resources
– incorporates metadata catalogue for attribute-based data location
11 November 2002
BBSRC Genomics meets Grid Workshop
5
Service model of distributed computing
• Web Services
– cross-platform distributed computing model based on Web standards,
particularly XML
– message-passing protocol for interacting with Web services (SOAP)
– operational description of Web services (WSDL)
– registry for publishing and discovering available Web services (UDDI)
– mechanism for combining Web services into a workflow (WSFL)
• And the next version of Globus (GT3) aka . . .
• Grid Services (Open Grid Services Architecture – OGSA)
–
–
–
–
–
extension of Web services incorporating Grid security model
enables dynamic creation and termination of customised services
supports instantiated services with memory for long-lived tasks
can compose Grid services into complex workflows
will include support for accessing structured data in databases (DAI)
11 November 2002
BBSRC Genomics meets Grid Workshop
6
UK e-Science Grid
• Being assembled now - coordinated
by the Grid Engineering Task Force
• Linking computing resources at all
UK e-Science Centres
– National Centre
(Edinburgh+Glasgow)
– 8 Regional Centres
– 2 CLRC Centres (RAL+DL)
– EBI Hinxton
– further Centres joining soon
• Interconnected by the
SuperJANET4 multi-gigabit
backbone and Regional Networks
11 November 2002
Glasgow
Edinburgh
Newcastle
Belfast
Manchester
DL
Oxford
Cardiff
BBSRC Genomics meets Grid Workshop
RAL
Cambridge
Hinxton
London
Southampton
7
UK Core Grid Support Centre
• Part of the e-Science Core
Programme
• Led by CLRC e-Science Centre
• Team of 6 based at CLRC (RAL+DL)
and Edinburgh and Manchester
Universities (but actually providing
access to the expertise of more
than 25 people)
• Helps all e-Science programme
participants to install and use Grid
software quickly, easily and
productively
• Offers to meet all projects to
discuss requirements
11 November 2002
BBSRC Genomics meets Grid Workshop
8
Grid Support Centre Services
• Helpdesk - support@grid-support.ac.uk
– provides access to expert technical support
• Web information resource - http://www.grid-support.ac.uk
– offers Grid awareness and education material
• Grid Starter Kit
– supports self installation of Grid software
• National Grid Directory Service
– supports Grid resource discovery and access to current status
• Certification Authority (CA) – http://www.grid-support.ac.uk/ca
– issues digital certificates to UK e-scientists
– assigns a trustable digital identity to an individual
– you need one to use the Grid!
11 November 2002
BBSRC Genomics meets Grid Workshop
9
BBSRC Grid Support Service (1)
• What it will offer:
– support for all BBSRC researchers and institute staff
– support for learning about, installing and applying Grid technology
– source of digital certificates for using Grid software
– access to technical expertise about all aspects of the Grid
– assistance in developing demonstration Grid applications
– support for sharing high performance Beowulf clusters between
institutes
– support for accessing large-scale data resources distributed across
the Grid
– organisation of customised Grid training courses at NeSC to meet
demand
– skills transfer to BBSRC support staff and research scientists
11 November 2002
BBSRC Genomics meets Grid Workshop
10
BBSRC Grid Support Service (2)
• How it will work:
– dedicated BBSRC Grid support staff at CLRC/RAL
– exploit close links with UK Grid Support Centre
– community workshops to identify requirements
– discussions with institutes and leading research groups to discuss
demonstrator applications of Grid technology
– technical support for installing Grid software and developing Gridbased applications
– close collaboration with BBSRC IT support service
– steering group of representatives from BBSRC institutes and IGF
centres, to be chaired by Roger Gillam
11 November 2002
BBSRC Genomics meets Grid Workshop
11
BBSRC Grid Support Service (3)
• Progress to date:
– consultation meeting at IGF Forum in July
– link established with BBSRC Research Computing Committee
– promoted at BBSRC grant holders workshop at Warwick on 28/29
October
– staff at RAL now in place
• Peter Oliver & Richard Wong
– meeting with IGER and BITS at Aberystwyth
• Helen Ougham (IGER), Colin Edwards (BITS)
• three possible demonstrator projects identified
– in priority order . . .
11 November 2002
BBSRC Genomics meets Grid Workshop
12
BBSRC Grid – demonstrator projects (1)
• Project 1 - remote BLAST jobs
– IGER currently carries out FASTA and BLAST DNA homology searches
using the MoBiCS service provided by BITS (single processor)
– aim is to use the Grid to get faster turnround on multi-processor
resources enabling larger databases to be searched
– initially submit jobs to the Beowulf cluster at RAL
• 32 CPU Beowulf Cluster (wulfkit), 32GB memory, 1TB disk
• RAL setting up static copy of EMBL database and installing BLAST software
– enable client side access
•
•
•
•
using Java CoG kit on Windows
set up Registration Authorities at IGER and BITS (verify user identity)
get user certificates from UK e-Science CA
write Globus scripts to submit jobs from IGER to Beowulf Cluster at RAL
– then install Globus server software at another site
11 November 2002
BBSRC Genomics meets Grid Workshop
13
BBSRC Grid – demonstrator projects (2)
• Project 2 - hyper-spectral image analysis
– DEFRA-funded project on hyperspectral imaging of leaves as a
diagnostic of nutritional and developmental status
– generates 60GB of images files per experiment
– aim is to speed up image analysis and provide access to data archiving
– make use of parallel algorithms running on Beowulf clusters
– Alan Gay (IGER) is contact
• Project 3 - PHYLIP
– phylogenetic analysis of large multigene families using the program
PHYLIP
– large run times – 4-5 days
– aim is to improve turnround using a parallel version of the program
11 November 2002
BBSRC Genomics meets Grid Workshop
14
BBSRC Grid – Some issues
• Windows-only environment
– need to install openssl as well as Java CoG (Commodity Grid) kit
to handle certificate conversion
– most Grid software is implemented on Unix platforms
• Network bandwidth low ~ 2MBits/sec
– many Grid sites now have 1 Gbit/sec
– depends on what is required to support individual Grid projects
• Firewalls
– need to provide access through firewalls to enable remote Gridbased use of machines
– mechanism for this now identified using tables of Grid machine
IP addresses and port numbers
11 November 2002
BBSRC Genomics meets Grid Workshop
15
Conclusions
• The Grid can potentially . . .
– enable bigger and better science
– enhance the capabilities of individual researchers
– provide access to resources you need but don’t have
yourself
– transcend traditional organisational and disciplinary
boundaries to make new collaborations possible
– drive change
• Now is the time to identify if and how the Grid and help the
genomics community
– this is what we are here to discover
11 November 2002
BBSRC Genomics meets Grid Workshop
16
Download