Bioinformatics at The Roslin Institute Andy Law

advertisement
Bioinformatics at
The Roslin Institute
Andy Law
Bioinformatics Activities
• The interface between computer science and
biology
Computer
Scientists
New
algorithms
Biologists
Roslin Bioinformatics Group
New
New
tools
tools
Integrating
Integrating
distincttools
tools
distinct
Providing
access
Providing
access
to/ to/
maintaining
tools
maintaining
tools
Automation
Automation
scripts
scripts
Using tools
and scripts
Our Role
• Grew out of the genomics programmes
• Provide tools, and advice on their use
• Make routine data handling easier
QTL-mapping
Pedigree Records
Trait Records
Genotypes
Analysis
Programs
Analysis Programs
• Law’s First Law
– The first step in designing a new genetic analysis
program is to determine how to make the input file
sufficiently different from all previously defined
genetic analysis programs
Consequences
• Data must be manipulated to suit each and
every analysis program
• Changes made to the data set as a result of one
analysis may not be propagated to the other
copies of the data
1
3
A197 GGQW Z113
AF1
10
1 0 0 1
1 2 1 1
2 0 0 0
1 2 1 2
3 0 0 1
1 1 1 1
4 0 0 0
3 3 1 1
5 2 1 1
1 2 1 2
8 4 3 0
1 3 1 1
11 8 5 1
1 2 1 2
12 8 5 1
1 3 1 1
13 8 5 0
2 3 1 1
14 8 5 0
1 3 1 2
1 5
2 3
4 6
3 3
1 3
3 4
0 0
3 3
1 4
1 4
QTL-mapping
• Other problems
– Sharing Data
• Genotyping lab may be different from the lab that recorded
the traits
• Analysis may performed by a different lab
• Populations may overlap
• Need…
– An easily accessible database
QTL-mapping
Pedigree Records
Trait Records
Genotypes
Analysis
Programs
QTL-mapping
Pedigree Records
Trait Records
Genotypes
resSpecies
Analysis
Programs
Analysis Programs
• Law’s Second Law
– Error messages should not be provided
– If error messages are provided, they must be cryptic
and convey as little information as possible
Consequences
• Errors in the data (e.g. Mendelian inheritance
errors) result in long down-times hunting for the
cause of the problem
QTL-mapping
Pedigree Records
Trait Records
Genotypes
resSpecies
Analysis
Programs
QTL-mapping
Pedigree Records
Trait Records
Genotypes
resSpecies
Analysis
Programs
Code re-use
• Same code as we use in the web-system…
• …is the basis of a stand-alone application
Our Role
• Grew out of the genomics programmes
• Provide tools, and advice on their use
• Make routine data handling easier
Analysis Programs
• Law’s Third Law
– The number of unique identifiers assigned to an
individual is never less than the number of
Institutions involved in the study“
... and is frequently many, many more
A recent example
• Data set of genotypes
– 1152 samples
– 6654 markers
A recent example
• Delivered as a matrix
– Markers in rows
– Samples in columns
– Genotypes as ‘AA’, ‘AB’, ‘BB’ or ‘NC’
• To be converted to a tab-delimited list of
Sample name, marker name, allele1, allele2
A recent example
• But…
– Sample name in matrix file suffixed with ‘.GType’
– Maps to sample name in ‘Sample Sheet’ prefixed
with ‘COL_’
– Maps to ‘Plate/Row/Column’ designation
– Maps to DNA number
– Maps to Tissue Sample Bag number…
A recent example
• But…
– Maps to Tissue Sample Bag number…
– Which – together with slaughterhouse designation –
maps to sample number
Who
•
•
•
•
David Speed
Neil Bartley
Fahad Ifthkar
Phil Devall
• John Bowman
• Jan Aerts
• Wilfrid Carre
• Zen Lu
• Trevor Paterson
Download