Slides

advertisement
Daniel Ence
Yandell Lab
University of Utah

Annotations are descriptions of features of the genome
Structural: exons, introns, UTRs, splice forms etc.
 Coding & non-coding genes


Annotations should include evidence trail


Assists in quality control of genome annotations
Examples of evidence supporting a structural
annotation:
Ab initio gene predictions
 ESTs
 Protein homology


Protein Domains and Families




InterPro
Pfam
GO and other ontologies
Pathways
SUCCESS
>Smg5
MEVTFSSGGSSNASSECAIDGGTNRCRGL
EPNNGTCILSQEVKDLYRSLYTASKQLDD
AKRNVQSVGQLFQHEIEEKRSLLVQLCKQ
IIFKDYQSVGKKVREVMWRRGYYEFIAFV
SUCCESS
>Smg5
MEVTFSSGGSSNASSECAIDGGTNRCRGL
EPNNGTCILSQEVKDLYRSLYTASKQLDD
AKRNVQSVGQLFQHEIEEKRSLLVQLCKQ
IIFKDYQSVGKKVREVMWRRGYYEFIAFV
SUCCESS
>Smg5
MEVTFSSGGSSNASSECAIDGGTNRCRGL
EPNNGTCILSQEVKDLYRSLYTASKQLDD
AKRNVQSVGQLFQHEIEEKRSLLVQLCKQ
IIFKDYQSVGKKVREVMWRRGYYEFIAFV
SUCCESS
>Smg5
MEVTFSSGGSSNASSECAIDGGTNRCRGL
EPNNGTCILSQEVKDLYRSLYTASKQLDD
AKRNVQSVGQLFQHEIEEKRSLLVQLCKQ
IIFKDYQSVGKKVREVMWRRGYYEFIAFV
SUCCESS
>Smg5
MEVTFSSGGSSNASSECAIDGGTNRCRGL
EPNNGTCILSQEVKDLYRSLYTASKQLDD
AKRNVQSVGQLFQHEIEEKRSLLVQLCKQ
IIFKDYQSVGKKVREVMWRRGYYEFIAFV
SUCCESS
>Smg5
MEVTFSSGGSSNASSECAIDGGTNRCRGL
EPNNGTCILSQEVKDLYRSLYTASKQLDD
AKRNVQSVGQLFQHEIEEKRSLLVQLCKQ
IIFKDYQSVGKKVREVMWRRGYYEFIAFV
SUCCESS
>Smg5
MEVTFSSGGSSNASSECAIDGGTNRCRGL
EPNNGTCILSQEVKDLYRSLYTASKQLDD
AKRNVQSVGQLFQHEIEEKRSLLVQLCKQ
IIFKDYQSVGKKVREVMWRRGYYEFIAFV
MAKER
An annotation pipeline and genome-database management
tool for “next-generation” genome projects
MAKER
User
Requirements:
Can be run by a single individual with little bioinformatics
experience
MAKER
User
Requirements:
System
Requirements:
Can be run by a single individual with little bioinformatics
experience
Can run on Linux or Mac OS X based systems
MAKER
User
Requirements:
System
Requirements:
Program
Output:
Can be run by a single individual with little bioinformatics
experience
Can run on Linux or Mac OS X based systems
Output is compatible with popular annotation tools like WebApollo and JBrowse
MAKER
User
Requirements:
System
Requirements:
Program
Output:
Can be run by a single individual with little bioinformatics
experience
Availability:
Free for the academic community (including source code)
Can run on Linux or Mac OS X based systems
Output is compatible with popular annotation tools like WebApollo and JBrowse
•
•
•
mRNA-seq integration
Integrating new evidence into existing
databases
Update/revise legacy annotation sets
Legacy Annotation Set 1
Legacy Annotation Set 2
Legacy Annotation Set n
new data
current assembly
• Identify legacy annotation most consistent with new data
• Automatically revise it in light of new data
• If no existing annotation, create new one
Legacy Annotation Set 1
Legacy Annotation Set 2
Legacy Annotation Set n
new data
current assembly
• Identify legacy annotation most consistent with new data
• Automatically revise it in light of new data
• If no existing annotation, create new one
•
Supports Message Passing
Interface (MPI), a
communication protocol
for computer clusters
which essentially allows
multiple computers to act
like a single powerful
machine.

MAKER-P

MAKER-P

Plant

MAKER-P


Plant
Parallelized

MAKER-P



Plant
Parallelized
Publication
Publication:
MAKER-P: a tool-kit for the rapid creation,
management, and quality control of plant
genome annotations

Campbell, Law, Holt et al., Plant Phys. 2013

Atmosphere




MPI enabled for parallel computation
Maximum instance size 16 CPU
http://www.iplantcollaborative.org
TACC Lonestar
Supercomputer with 22,656 CPU
 MPI enabled for parallel computation
 Can complete entire rice genome in ~2 hrs (1,152
cores)

 96 CPU per chromosome


Currently being integrated into the iPlant Discovery
Environment  http://www.iplantcollaborative.org
XSEDE  https://www.xsede.org
Performance on Zea maize genome (~ 2Gb)



8,640 cpus on TACC
~37 hours with queue (runtime 14 hours 37 minutes)
Throughput of > 1 Gb/hour
Assembly & Annotation at iPlant
Genome Assembly
Conversions tools
Visualiza on
ALLPATHS-LG
maker2jbrowse
JBROWSE
Newbler
maker2zff
Web-Apollo
SOAPdenovo
MAKER
output
SCARF
ABySS
Oasis
Genome
input
SNAP Training
Fathom/Forge
MPI-MAKER
TACC Lonestar
HMM-assembler
Velvet
(22,656 cores)
Ray
Augustus
Post Annota on
SNAP
InterProScan
Exonerate
InterPro2GO
Transcriptome Assembly
De novo:
Trinity
BLAST
Data Commons
RepeatMasker
Reference genomes
Reference annota ons
SNAP HMM models
Repeat Libraries
Transcriptome data
SOAPdenovo-Trans
Velvet/Oasis
Trans-ABySS
Reference-guided:
Tophat
Cufflinks
Evidence
input
Conversions tools
ncRNA Annota on
miRDeep2
tophat2gff
cufflinks2gff
Key:
DE
TACC
in progress



non-coding RNA support
better repeat annotation
better pseudogene annotation

tRNAscan support



Will run from inside MAKER
Doesn’t install automatically
snoScan support



Can supply data file for annotation
Will run from inside automatically
Doesn’t install automatically

In the past:



Custom Repeat library
de novo generated RepeatModeler
Now:



RepeatModeler, but better.
Step-by-step guide available at:
http://weatherby.genetics.utah.edu/MAKER/wiki
/index.php/Repeat_Library_Construction--Basic
To be automated in the future




Expanded ncRNA support
MAKER-EVM
Expanded Augustus/bam support
Better integration with iPlant’s Discovery
environment


More of a feeling than a to-do list
lncRNAs
Haas et al., Genome Biology 2008
Cantarel et al., 2008; Holt and Yandell, 2010
EVM
Cantarel et al., 2008; Holt and Yandell, 2010




MAKER gives Augustus hints
Augustus can take better hints from a bam file
Users will be able to supply a bam file in the
MAKER control file
Bam files open up a world of possibilities!
Assembly & Annotation at iPlant
Genome Assembly
Conversions tools
Visualiza on
ALLPATHS-LG
maker2jbrowse
JBROWSE
Newbler
maker2zff
Web-Apollo
SOAPdenovo
MAKER
output
SCARF
ABySS
Oasis
Genome
input
SNAP Training
Fathom/Forge
MPI-MAKER
TACC Lonestar
HMM-assembler
Velvet
(22,656 cores)
Ray
Augustus
Post Annota on
SNAP
InterProScan
Exonerate
InterPro2GO
Transcriptome Assembly
De novo:
Trinity
BLAST
Data Commons
RepeatMasker
Reference genomes
Reference annota ons
SNAP HMM models
Repeat Libraries
Transcriptome data
SOAPdenovo-Trans
Velvet/Oasis
Trans-ABySS
Reference-guided:
Tophat
Cufflinks
Evidence
input
Conversions tools
ncRNA Annota on
miRDeep2
tophat2gff
cufflinks2gff
Key:
DE
TACC
in progress
•
•
•
•
•
•
Trichmonas vaginalis
Pinus taeda
Apis dorsata
Cronartium quercuum
Common Pigeon
Cardiocondyla
obscurior
•
•
•
•
•
•
•
Southern right whale
Tardigrade
Spotted Gar
Gibbon
Turkey
9 spined stickelback
Golden Eagle
•
I’d like to thank and recognize all contributions from Mark Yandell at the University of Utah, as well as
lab members Barry Moore, Michael Campbell, Daniel Ence, and former lab member Meiyee Law.
•
Special thank you to Scott Cain, Robert Buels, and Amelia Ireland.
•
I would also like to recognize collaborators Ian Korf at UC Davis
•
MAKER-P and integration into iPlant
infrastructure:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Josh Stein (CSHL)
Kevin Childs (MSU)
Gaurav Moghe (MSU)
David Hufnagel (MSU)
Jikai Lei (MSU)
Rujira Achawanantakun (MSU)
Carolyn Lawrence (USDA-ARS CICGRU)
Doreen Ware (CSHL)
Shin-Han Shiu (MSU)
Yanni Sun (MSU)
Ning Jiang (MSU)
Matt Vaughn (TACC)
Dian Jiao (TACC)
Zhenyuan Lu (CSHL)
Nirav Merchant (U. Arizona)
•
Pinus taeda genome project:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Jill Wegrzyn (UConn)
John Liechty (UC Davis)
Kristian Stevens (UC Davis)
Carol Loopstra (Texas A&M)
Hans Vasquez-Gross (UC Davis)
Brian Lin (UC Davis)
Matt Dougherty (UC Davis)
Jacob Zieve (UC Davis)
Pedro J Martinez-Garcia (UC Davis)
James A Yorke (U. Maryland(
Marc Crepeau (UC Davis)
Daniela Puiu (Johns Hopkins)
Steven L Salzberg (Johh Hopkins)
Pieter J. deJong (CHORI-BACPAC Resources Center)
Keithanne Mockaitis (Indiana University)
Dorrie Main (Washington State)
Chuck Langley (UC Davis)
David Neale (UC Davis)
MAKER-devel community
Funding from the NHGRI through an RO1 grant entitled Software for the creation and quality control of
genome annotations.
Mailing List:
maker-devel at yandell-lab.org
Download:
http://yandell-lab.org/software/maker.html
Email me:
dence at genetics.utah.edu
Download