Bioinformatics for Farm Animal Genomics at Roslin Andy Law & Alan Archibald

advertisement
Bioinformatics for Farm Animal
Genomics at Roslin
Andy Law & Alan Archibald
Overview
z
Bioinformatics
z
Farm Animal Genomics
z
Databases
z
–
ARKdb
–
resSpecies
Integration
Bioinformatics Activities
z
The interface between computer
science and biology
Computer
Scientists
New
algorithms
Roslin Bioinformatics Group
New
Integrating Providing
Providing
access
Automation
access
to/to/ Automation
New
Integrating
tools distinct
distincttools
tools
maintaining
tools
scripts
maintaining
tools
tools
scripts
Biologists
Using tools
and scripts
Genomics and Bioinformatics
Roslin Bioinformatics Worldwide reputation
z
Construction and population of
–
Resource databases – resSpecies,
radiation hybrid database
–
Genome databases - ARKdb
–
TCAGdb
–
Genetic diversity databases
Genomics and Bioinformatics
Roslin Bioinformatics Worldwide reputation
z
Anubis
–
z
The first web-delivered graphical user
interface
Webintool
–
A web-page scripting program
–
Several years ahead of its time
–
Written by one person
Genomics and Bioinformatics
Roslin Bioinformatics Internal role
z
Support the genomics programmes
z
Provide other, more generalised
assistance (e.g. sequence analysis etc.)
z
Provide tools, and advice on their use
–
z
Don’t provide analysis service
Make routine data handling easier
Genomics and Bioinformatics
Overview
z
Bioinformatics
z
Farm Animal Genomics
z
Databases
z
–
ARKdb
–
resSpecies
Integration
Farm Animal Genomics
Genome Mapping
Quantitative Trait Locus (QTL)
Identification
Causative Gene Identification
(Physiology, Biochemistry, Pathways…)
Genomics and Bioinformatics
Farm Animal Genomics
Genome Mapping
Quantitative Trait Locus (QTL)
Identification
Sequencing?
Causative Gene Identification
(Physiology, Biochemistry, Pathways…)
Genomics and Bioinformatics
Farm Animal Genomics
Genome Mapping
Quantitative Trait Locus (QTL)
Identification
Causative Gene Identification
(Physiology, Biochemistry, Pathways…)
Genomics and Bioinformatics
Genome Mapping
z
What is a Genome Map?
–
A means of identifying points within
the genome
E.g. Chromosome banding patterns
Cytogenetic locations
Genetic Linkage maps
(DNA sequence)
Genomics and Bioinformatics
Cytogenetic Map
Genomics and Bioinformatics
Genetic Linkage Map
ARKdb Maps
Genomics and Bioinformatics
ARKdb presence
z
Main Roslin node
(www.thearkdb.org, roslin.thearkdb.org)
z
Mirrors at Iowa, Texas
(iowa.thearkdb.org, texas.thearkdb.org)
z
New mirror in Australia
(oz.thearkdb.org, angis.thearkdb.org)
Genomics and Bioinformatics
Farm Animal Genomics
Genome Mapping
Quantitative Trait Locus (QTL)
Identification
Causative Gene Identification
(Physiology, Biochemistry, Pathways…)
Genomics and Bioinformatics
QTL Identification
z
Take two lines that differ for the trait
of interest
z
Cross them
z
Cross the F1 animals
z
Analyse the F2 animals
Genomics and Bioinformatics
Roslin Pig QTL Population
Large White
Meishan
Genomics and Bioinformatics
Roslin Chicken QTL
Population
Genomics and Bioinformatics
QTL Identification
z
Analyse the F2 animals
–
Measure the trait
–
Determine genotypings
–
Analyse to associate trait with
inheritance patterns
Genomics and Bioinformatics
QTL-mapping
Pedigree Records
Trait Records
Analysis
Programs
Genotypes
Genomics and Bioinformatics
Further points...
z
Input data file formats
–
Complex
–
Unforgiving
–
Difficult to increment
Genomics and Bioinformatics
Crimap input file
1
3
A197 GGQW Z113
AF1
10
1 0 0 1
1 2 1 1
2 0 0 0
1 2 1 2
3 0 0 1
1 1 1 1
4 0 0 0
3 3 1 1
5 2 1 1
1 2 1 2
8 4 3 0
1 3 1 1
11 8 5 1
1 2 1 2
12 8 5 1
1 3 1 1
13 8 5 0
2 2 1 1
14 8 5 0
1 3 1 2
1 5
2 3
4 6
3 3
1 3
3 4
0 0
3 3
1 4
1 4
Genomics and Bioinformatics
QTL-mapping
z
Other problems
–
Sharing Data
• Genotyping lab may be different from the
lab that recorded the traits
• Analysis may performed by a different lab
• Populations may overlap
z
Need…
–
An easily accessible database
Genomics and Bioinformatics
QTL-mapping
Pedigree Records
Trait Records
Analysis
Programs
Genotypes
Genomics and Bioinformatics
QTL-mapping
Pedigree Records
Trait Records
resSpecies
Analysis
Programs
Genotypes
Genomics and Bioinformatics
Genomics and Bioinformatics
resSpecies
z
Designed to be generic and speciesneutral
–
(for all the species I knew would be
required at the outset)
z
Handles Mapping and QTL
experiments
z
Entirely web-operable
Genomics and Bioinformatics
QTL-mapping
Pedigree Records
Trait Records
resSpecies
Analysis
Programs
Genotypes
Genomics and Bioinformatics
Analysis programs
z
Developed / used by Department of
Genetics & Biometry
Genomics and Bioinformatics
Analysis programs
z
Regression-based methods
–
z
Monte Carlo methods
–
z
Knott & Haley (QTL Express)
Gibbs sampling
Simulation studies
Genomics and Bioinformatics
Identification of QTL
45
Shoulder
Back
Loin
Threshold
40
35
30
25
20
15
10
5
0
Marker 1
Marker 2
Marker 3
Marker 4
Marker 5
Marker 6 Marker 7
Identification of QTL
45
Shoulder
Back
Loin
Threshold
40
35
30
25
20
15
10
5
0
Marker 1
z
Marker 2
Marker 3
Marker 4
Marker 5
Marker 6 Marker 7
What is the actual gene controlling the trait?
Farm Animal Genomics
Genome Mapping
Quantitative Trait Locus (QTL)
Identification
Causative Gene Identification
(Physiology, Biochemistry, Pathways…)
Genomics and Bioinformatics
Identification of QTL gene
z
Positional Candidate
–
Note which markers flank the QTL
–
Use those markers to identify corresponding
region of genetic map
–
Look at the genes known to map to that
region to identify potential candidate genes
Genomics and Bioinformatics
Identification of QTL
45
Shoulder
Back
Loin
Threshold
40
35
30
25
20
15
10
5
0
Marker 1
Marker 2
Marker 3
Marker 4
Marker 5
Marker 6 Marker 7
Identification of QTL gene
z
The QTL region will probably cover at least 30cM
z
Chicken genetic map is approximately 3,500cM
z
Vertebrates have 20-35,000 genes
30cM contains between 175 and 300 genes
Genomics and Bioinformatics
Identification of QTL gene
z
Farm animals have relatively few genes
mapped
z
Mouse and human have thousands of
ESTs and genes mapped
–
… plus evolving sequence assemblies
Genomics and Bioinformatics
Comparative Gene Mapping
Species A
Species B
A
A
B
B
C
C
D
D
E
E
Genomics and Bioinformatics
Identification of QTL gene
Species A
QTL
is in here
somewhere
{
Species B
A
A
B
B
C
C
D
D
E
E
Genomics and Bioinformatics
Identification of QTL gene
Species A
QTL
is in here
somewhere
{
A
Species B
A
Gene 1
Gene 2
B
B
C
C
D
D
E
E
Gene 3
}
Gene 4
These are
potential
candidate
genes
Genomics and Bioinformatics
Comparative Gene Mapping
z
Requirements
–
Some degree of conservation of genomic
order
–
Mapping of a large number of coding regions
in a variety of species
–
Good evidence to confirm homology
between any pair of loci in two species
Genomics and Bioinformatics
Integration
z
Can also add in other data types
Genomics and Bioinformatics
Pig Fat QTL
Genomics and Bioinformatics
Linkage and RH maps
Fat
Trait
location
Linkage
Map
Radiation
Hybrid Map
Genomics and Bioinformatics
Human homology
Pig
Fat
Trait
location
Linkage
Map
Radiation
Hybrid Map
Cytogenetic
Map
Genomics and Bioinformatics
Physical clones
Pig
Human
BAC1
BAC2
Fat
BAC3
Trait
location
Linkage
Map
Radiation
Hybrid Map
Cytogenetic
Map
Physical
Mapping
Genomics and Bioinformatics
Chicken EST homologues
Pig
Chicken
Human
BAC1
EST1
BAC2
Fat
EST2
BAC3
Trait
location
Linkage
Map
Radiation
Hybrid Map
Cytogenetic
Map
Physical
Mapping
Genomics and Bioinformatics
Expression data
Pig
Chicken
Human
BAC1
EST1
BAC2
Fat
EST2
BAC3
Trait
location
Linkage
Map
Radiation
Hybrid Map
Cytogenetic
Map
Physical
Mapping
Expression
Analysis
Genomics and Bioinformatics
Supporting literature
Pig
Chicken
Human
BAC1
EST1
BAC2
Fat
EST2
BAC3
Trait
location
Linkage
Map
Radiation
Hybrid Map
Linked
References
Cytogenetic
Map
Physical
Mapping
Expression
Analysis
Genomics and Bioinformatics
Making the links
z
Different name, same thing…
–
TGF-B1, TGFB1, Tgfb1, Transforming
Growth Factor Beta 1, TGF β1
–
TGF-B1, TGF-B4, TGF-B5
Genomics and Bioinformatics
Making the links
z
Same name, different thing…
–
There are at least 6 different markers
recorded as ‘GH’ within ARKdb-pig
–
Some primer pairs amplify multiple loci
and the same anonymous symbol has
thus been assigned to multiple
chromosomal locations
Genomics and Bioinformatics
Making the links
z
Gene families
–
TGF-B1, TGF-B2, TGF-B3, TGF-B4, TGFB5
–
Chicken, human have 3, Xenopus has 2
Genomics and Bioinformatics
Making the links
z
Fat QTLs
–
Abdominal fat pad, shoulder, back,
interstitial (marbling)
Genomics and Bioinformatics
Identification of QTL
45
Shoulder
Back
Loin
Threshold
40
35
30
25
20
15
10
5
0
Marker 1
Marker 2
Marker 3
Marker 4
Marker 5
Marker 6 Marker 7
Making the links
z
Other phenotypes
–
–
Are chicken wings equivalent to arms
or limbs in general?
What about drosophila wings?
Genomics and Bioinformatics
Making the links
z
Ontologies
–
Graphs of controlled vocabularies
–
Not perfect
–
Current debate in MGED moving
towards references to ontologies and
collections of ontology-ontology
mappings
Genomics and Bioinformatics
Making the links
z
z
Ontologies provide a means to
define hierarchies of attributes and
functions
We need a way to define
relationships between instances of
physical ‘things’ rather than their
functions or attributes
Genomics and Bioinformatics
Making the links
z
Define a vocabulary that describes
links
–
–
A ‘is an alias of’ B
C ‘is contained by’ D
• Ergo D ‘contains’ C
–
–
E ‘is homologous/orthologous to’ F
G ‘differs from’ G1
Genomics and Bioinformatics
Making the links
z
More importantly defines external
data references
–
–
A ‘has a sequence accession of’
AC012345
B ‘is defined at’ http://whatever.com
Genomics and Bioinformatics
Integration
z
Technical issues…
–
Systems developed stand-alone
• Fine for ‘point-and-click’
• Less good for automated/bulk analysis
Genomics and Bioinformatics
Integration
z
Re-engineer systems
–
–
z
Define Application Programming
Interfaces (APIs)
Define Structured Data Interchange
Formats
Use APIs to integrate data from
different systems
Genomics and Bioinformatics
User
resSpecies
ARKdb
Radiation
Hybrid
Database
Diversity
Databases
Genomics and Bioinformatics
Novel Analyses
User
resSpecies
ARKdb
Radiation
Hybrid
Database
Diversity
Databases
Genomics and Bioinformatics
User
resSpecies
Interface
ARKdb
Interface
Radiation
Hybrid
Database
Interface
Diversity
Databases
Interface
Application Programmable Interface
resSpecies
ARKdb
Radiation
Hybrid
Database
Diversity
Databases
Genomics and Bioinformatics
User
Novel Analyses
Application Programming Interface
resSpecies
ARKdb
Radiation
Hybrid
Database
Diversity
Databases
Genomics and Bioinformatics
User
Novel Analyses
Application Programming Interface
resSpecies
ARKdb
Radiation
Hybrid
Database
Array
Diversity
Expression
Databases
Data
Sequence
&
Homology
Genomics and Bioinformatics
?
The GRID!
Application Programming Interface
resSpecies
ARKdb
Radiation
Hybrid
Database
Array
Diversity
Expression
Databases
Data
Sequence
&
Homology
Genomics and Bioinformatics
?
Farm Animal Genomics
z
Ultimate goal is to identify causative
genes
z
Comparative genomics/Data
integration will play a large part
–
z
Complexity, not volume
Need to focus on infrastructure
Genomics and Bioinformatics
Download