GridQTL: High performance QTL analysis via the Grid University of Edinburgh Roslin Institute

advertisement
GridQTL: High performance
QTL analysis via the Grid
University of Edinburgh
Roslin Institute
Overview
•
•
•
•
•
•
People
Science / biology
Objectives
e-Science: Why the Grid?
Management
Summary
People
University of Edinburgh
Institute of Evolutionary
Biology
Roslin Institute
NeSC
Sara Knott
Dave Berry
Chris Haley
Peter Visscher
Denise Ecklund
Dirk-Jan De Koning
Rationale for QTL analysis
• QTL = quantitative trait locus
• Biology: Understanding genetic variation
by dissecting complex traits
– basic biology
– applications in agriculture
– applications in medicine
Women
700
600
500
400
300
200
Std. Dev = 6.40
100
Mean = 169.1
N = 1785.00
0
145.0
155.0
150.0
Stature
165.0
160.0
175.0
170.0
185.0
180.0
190.0
Dissection of Complex Traits
GridQTL
Genetics
Sib pairs
Chromosome Region
Association Study
Genomics
Physical Mapping/
Sequencing
Candidate Gene Selection/
Polymorphism Detection
Mutation Characterization/
Functional Annotation
QTL Express: User-friendly
web-based software to map QTL
in outbred populations
George Seaton, Sara Knott, Chris Haley, Peter Visscher
http://QTL.cap.ed.ac.uk/
1
Roslin Institute
2
University of Edinburgh 3
Example: Obesity in pigs
120
20
18
16
14
80
12
60
10
8
40
Test statistic (F)
No. of observations
100
6
4
20
2
0
0
0
10
20
30
40
50
60
70
Position (cM)
80
90
100
110
120
[Knott et al., 1998]
Present and future paradigms
# individuals
# phenotypes
# markers
# analyses
models
data sources
Now
Future
100s
10s
100s
100s
simple
homogeneous
1000s
10000s
100000s
O(106 – 108)
complex
heterogeneous
Present statistical algorithms and computer platforms
will be inadequate for future analysis
~400 individuals, ~9000 phenotypes,
~3000 markers
Nature. 2004 430:743-7.
Objectives
To develop & apply a grid-based platform
for robust and fast multiple trait mapping
of multiple quantitative trait loci
in simple and complex pedigrees
and disseminate the results
Objectives: Grid implementation
• Transform existing QTL Express
to be grid compatible
• Deliver an essential analysis component
for integrative biology workflow
• Integrate new analytical approaches
and grid components to deploy GridQTL
Objectives: QTL analysis
• Robust multiple trait algorithms
– Expression QTL
• Methods and algorithms for gene-gene
interactions
• Combining gene-phenotype associations
within and between families
Objectives: dissemination
• Dissemination of the developed grid
applications and QTL mapping algorithms
– The existing QTL Express user-base
– (Inter)national postgraduate courses
• Scientific publications
• e-Science meetings
• Websites and internet postings
Why the Grid?
• Grid helps achieve our key goals to
– Scale-up analysis complexity
• Analyse more individuals, phenotypes and markers
– Provide a ‘growing’ public service to the research community
– Provide a component for integrative biology
• Make QTL analysis services available in a larger workflow
• Using the grid we can leverage
– Essential computation and data storage resources
– Existing middleware to manage these resources
But we need to build on top of the middleware
to get what we need
to effectively support multi-trait analysis
GridQTL Portal – The Challenges
•
Execute QTL analyses on grid computing resources
– Describe parallel computation requirements
– Automatic task-level decomposition of analysis requests
– Schedule, monitor and re-start decomposed tasks
•
Provide a secure and private data space for each researcher
– Synchronise application input and output
– Enable analysis re-start from intermediate results
•
Be a robust public service
GridQTL Portal
Analysis 1
Data
Mgr
Analysis 2
Analysis
Portlet
Analysis 3
Analysis 4
Analysis 5
Meta
Sched
UK e-Science Grid
or NGS Resources
Management
• Research partners
– science partner coordination
• Knott, De Koning, Haley, Visscher
– science & e-science coordination
• All PI
• Seaton and NeSC software engineers
• User group
– current QTL Express users
• Scientific Advisory Board
– science and e-science academics
Summary
GridQTL provides an essential core
component of a future integrated system
incorporating genetic, phenotypic,
transcription and comparative information
to allow prediction from sequence to
consequence
Download