WormBase: A Resource for the Biology & Genome of C. elegans Lincoln D. Stein WormBase Web Site WormBase is a MOD Model Organism Database Repository for reagents – Genetic stocks, vectors, clones Genetic maps Large-scale data sets – Genome, EST sets, microarrays, interactions Literature Meetings, announcements, etc Other MODs FlyBase (Drosophila) WormBase (Caenorhabditis) SGD (Saccharomyces) TAIR (Arabidopsis) MGD (Mus) PlasmoDB (Plasmodium) RatDB (Rattus) C. elegans Fun Facts 1.5 mm length 2 week life span 959 cells 302 neurons 6 chromosomes 100,258,171 bp (95 Ns) 19,000 genes 2,000 mutant strains WormBase Fun Facts 402,076 Sequences 121,671 Proteins 143,708 Clones 24,728 Primer pairs 15,022 Papers 12,552 Loci 2,944 Cells 14 Maps 7,200 RNAi results 332 Transgenes 19,713 Expression Patterns WormBase Tour: Looking for MAP Kinase Kinase mek-2 Studies Found RNAi Phenotype a Genetic Locus: & Exprmek-2 Pattern mek-2 RNAi Phenotype mek-2 Sequence View mek-2 Protein View mek-2 Genome View mek-2 PCR Assays mek-2 Bibliography mek-2 Citation VB1 Neuron VB1 Synapses VBx Neuroanatomy Advanced Searches (1) Advanced Searches (2) Advanced Searches (3) Ad Hoc Queries Bulk FTP Downloads Genomic sequence – DNA (fasta) – Feature files (GFF) – C. briggsae DNA ESTs (fasta) WormPep Non-coding RNAs All the software (Open Source) Recently Added: C. briggsae C. elegans sequencing consortium (WashU + Sanger Center) Whole genome shotgun + 12 Mb previously-finished BACs from WashU 142 scaffolds N50 = 1,450 kb 21,000 predicted genes 11,000 genes orthologous to elegans Accessing briggsae Corresponding region viainelegans briggsae Synteny/Orthology Display WormBase Usage 900,000 800,000 700,000 600,000 500,000 400,000 300,000 200,000 100,000 M ay -0 0 Ju l-0 Se 0 p0 N 0 ov -0 0 Ja n01 M ar -0 M 1 ay -0 1 Ju l-0 Se 1 p01 N ov -0 1 Ja n02 M ar -0 2 0 Total Hits Data Requests Total Hits (fit) Data Requests (fit) WormBase Hits by Domain other 14% ca 5% uk 5% edu 48% de 6% jp 6% net 7% com 9% Major Referrers elegans.bcgsc.bc.ca, 5837 wormbase.sanger.ac.uk, 5848 volvox, 6173 vermicelli.caltech.edu, 6181 elegans.swmed.edu, 37336 google.yahoo.com, 7221 stein.cshl.org, 9560 www.proteome.com, 14682 www.sanger.ac.uk, 35639 www.google.com, 19086 bookmarks, 30747 Top Pages 3% 3% 2%2% 2%2% 3% 3% 40% 5% 7% 7% 9% 12% Sequence Locus Genome Browser Tree Blast Picture Clone Aligner RNAi Protein Paper Biblio XML Expr Profile How WormBase Works Images, Movies Web server Perl scripts You Database access library Genomic Data ACeDB MySQL WormBase Information Workflow CalTech Sanger .ace .ace WashU .ace NCBI .ace CGC .ace WormBase Information Workflow CalTech Sanger .ace .ace WashU .ace Sanger NCBI .ace CGC .ace WormBase Information Workflow CalTech Sanger .ace .ace WashU .ace Sanger CSHL www.wormbase.org NCBI .ace CGC .ace WormBase Information Workflow CalTech Sanger .ace .ace WashU .ace Sanger CalTech CSHL Caltech.wormbase.org www.wormbase.org NCBI .ace CGC .ace Curating a Paper Clipping Service Domain Expert Gene Record Database Entry Cell Record Mutant Record .ACE Files CalTechAce .ACE File Curating the Genome (1) >CHROMOSOME_I gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagc ctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcct aagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaa gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagc ctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcct aagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaa gcctaag… Gene Prediction Repeat Finding EST Alignment List of Features Curating the Genome (2) List of Features CamAce StlAce ACeDB Sequence Editor Curating Other Data Sets Knockout Consortium GO Consortium C. elegans Microarray Consortium RNAi Labs ORFeome Project CSHLAce Build Process CSHLAce StlAce CamAce integrate reconcile BuildAce WormBase CalTechAce The GMOD Project Generic Model Organism Database Generic MOD web site Database schemas Standard operating procedures Annotation tools Analysis tools Visualization tools http://www.gmod.org Released Modules Apollo genome annotation editor GBrowse generic genome browser PubSearch literature curation system LabDoc SOP editor CMap comparative map viewer GOET ontology editor Chado modular database schema GBrowse Zoomed Way In Zoomed Way Way In Zoomed Way Way Out Keyword Search Sequence Search Third Party Annotations Links to 3d Party Web Sites Uploaded Your Own Annotations Sequence dumps & other reports Extensively Customizable End-user – Turn tracks on and off, change order, change packing & labeling attributes (stored in cookie) Data provider – Change fonts, colors, text. – Change overview – genetic map, contigs, coverage, karyotype. – Define new tracks using simple config file. – Tinker with track appearance to hearts content. Adding a New Track (a) Create a GFF file named “deletions.gff” Chr1 targeted deletion 1293224 1294901 . . . Deletion d101k2 Chr1 targeted deletion 8239811 8241116 . . . Deletion d680k2 Chr2 targeted deletion 5866382 5866500 . . . Deletion d007k2 (b) Run the load_gff.pl script > load_gff.pl –d example_database deletions.gff Loading features… Done. 3 features loaded. (c) Add a new track “stanza” to the gbrowse configuration file [Knockout] feature = deletion glyph = span fgcolor = red key = Knockouts link = http://example.org/cgi-bin/knockout_details?$name citation = These are deletion knockouts produced by the example knockout consortium (http://example.org/knockouts.html) Extensively Extensible Plugins gbrowse CGI script Apache Web Server Glyphs Oracle adaptor (alpha test) Bio::Graphics library BioPerl library Bio::DB::GFF adaptor Oracle MySQL Flat File adaptor Chado adaptor Flat Files GBrowse on GenBank! GenBank? Plugins gbrowse CGI script Apache Web Server Glyphs Bio::Graphics library BioPerl library Bio::DB::GFF GenBank adaptor Proxy Adaptor MySQL GenBank B. burgdorferi via GenBank proxy WormBase People CalTech Cold Spring Harbor Paul Sternberg Erich Schwarz Raymond Lee Wen Xiao Lincoln Stein Todd Harris Nansheng Chen Fiona Cunningham Sanger Center Washington University Richard Durbin Daniel Lawson Keith Bradman John Spieth