Less is more Approaches to biologist-driven analysis and next-generation sequencing data Paul Gordon Genome Canada Bioinformatics Platform University of Calgary What am I doing here? Genome Canada Bioinformatics Platform • Next Generation Sequencing • Next Generation Web • Future challenges Better tech: less DNA, more sequence 44μm 70nm PhytoMetaSyn Sprockets: Hierarchical Gene Models from ESTs Developed in collaboration with BASF Plant Sciences Genozymes Hydrocarbon Metagenomics CAVEman • Java 3D-based, world-first complete 3D human body atlas (adult male) – • • 2,335 organs, hierarchical organization following Terminologia Anatomica Numerous applications involving mapping of genetic and disease data More information: http://cave.ucalgary.ca/caveman Pharmacokinetics visualization (Absorption-distribution-metabolismexcretion of Aspirin) Patient MRI stack mapped onto atlas and registered by landmarks Exploring gene expression patterns Basic Research • ING-protein interactions (cancer and ageing-rated proteins) • Archaeal UV-light response • Large-scale human genome organization Research Applications •Desulf.: mechanisms of oil pipeline corrosion and its prevention • Kidney transplants: improved rejection diagnostics in Edmonton •Mad cow disease/chronic wasting disease: live diagnostics DNA Diagnostics Discovery for Mad Cow Preinoculation Preclinical Control animal #6 Ball toy Photo: S. Czub, CFIA Lethbridge Controls Clinical Motif finding (elk dataset) 61 blood samples Next-gen 107 million base pairs 432 billion pairwise alignments (6574312) Decypher hardware accelerator 1082019 25mers or smaller Decypher hardware accelerator Uninfected 152317 Infected 132417 Thousands of animal coverage/timepoint combos (CPU intensive) Infected 3 universal Motif Results Possible mode of action? PrPsc(+?) Infectious agent Activation Feedback Retrovirus PrP Integration Carp et al., EMBO J., 2006 Leblanc et al., EMBO J. 2006 Stengel et al., Biochem. Biophys. Res. Commun. 2006 Lee et al., Biochem. Biophys. Res. Commun. 2006 Etc. Endogenous Retrovirus? ↑ EVI1 Consistent with protein-only evidence… Neurovirulent? (e.g. M.L. Labat 1999) ↑PLZF ↓PLZF-controlled genes Vacuole CNA Export Circulating Nucleic Acids Cell death Nucleoprotein complexes PrP Amyloid fibres Manuelidis et al, PNAS 2007 Protected promoters (Motifs A & B) Virus particles? ~25nm Bettertech: tech:less lessDNA, input, more results Better more sequence Generate Manuscript Now Where are we at? Life Sciences Emerging Technologies Web Bioinformatics Semantic Web Source: Gartner Inc. How software works… (Gene name, DNA sequence, QTL…) Parameters/Input Functions/ Rules Results/ Output (article, allele,…) The problem with the Web 1998 Now Once you label me, you negate me. Søren Kierkegaard Bluejay http://bluejay.ucalgary.ca Comparative genomics Gene expression integration BioMoby linking Waypoints The task at hand (biologist) ACCGT… Sequencer Data File (Binary) Known Proteins (computer scientist) BLAST Report (related proteins) DNASequence NCBI_gi Sequence_Alignment Audience Amoeba God Self-perception of computer skills The need for shoehorns • The current vision of the Semantic Web intends to create a new structure starting up with no reference to its vast, functioning, but more primitive predecessor … things just don’t happen like that All the Web as Workflows Seahawk prompting Proxied Web page Drag ‘n’ drop Seahawk What’s Ahead? The more a man learns, the more he realizes how little he knows Semantic Web http://www.uniprot.org/tissues/229 http://purl.uniprot.org/po/0009009 Take home messages As tech improves, we can ask better questions We will need shoehorns to access existing resources for the foreseeable future