ppt - International Society for Computational Biology

advertisement
Welcome to ISMB/ECCB
Genomics Session
July 31 – August 4, 2004
Ying Xu and Gene W. Myers
Co-Chairs, ISMB/ECCB Genomics session
Computational Genomics

Our genome encodes an enormous amount of
information about our beings
–
–
–
–
–
–
–
our looks
our size
how our bodies work
….
our health
our behaviors
… who we are!
gcgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatcgtgtgggtagtagctgatatgatgcgaggtaggggataggatagcaaca
gatgagcggatgctgagtgcagtggcatgcgatgtcgatgatagcggtaggtagacttcgcgcataaagctgcgcgagatgattgcaaagragtt
agatgagctgatgctagaggtcagtgactgatgatcgatgcatgcatggatgatgcagctgatcgatgtagatgcaataagtcgatgatcgatgatg
atgctagatgatagctagatgtgatcgatggtaggtaggatggtaggtaaattgatagatgctagatcgtaggtagtagctagatgcagggataaac
acacggaggcgagtgatcggtaccgggctgaggtgttagctaatgatgagtacgtatgaggcaggatgagtgacccgatgaggctagatgcgat
ggatggatcgatgatcgatgcatggtgatgcgatgctagatgatgtgtgtcagtaagtaagcgatgcggctgctgagagcgtaggcccgagagga
gagatgtaggaggaaggtttgatggtagttgtagatgattgtgtagttgtagctgatagtgatgatcgtag …….
Computational Genomics

As technologies improve, we are able to extract more and more
community
information encoded in a genome
organs
whole cell
pathways
complexes
proteins
biological data
genes
Computational Genomics

While the ultimate goal of “functional genomics” is to link
behavior of cells, organisms, and populations to the information
encoded in the genome, “computational genomics” is mainly
about identifying and characterizing the parts-lists of complex
biological systems
metabolomics
transcriptomics
proteomics
genomics
Computational Genomics

Genetic parts-list encoded in a genome
–
–
–
–
–
–
–
–
–
–
genome sequence and variations
genomic structures
protein-coding genes
RNA-coding genes
pseudo genes
homologs/orthologs/paralogs
promoters/terminators
regulatory elements/binding motifs
transposable elements
…….
Computational Genomics

To identify and characterize these elements, a large number of
computational techniques have been developed and widely
used in biological research
–
–
–
–
–
–
–
–
–
–
bio-sequence comparison
gene prediction
prediction of orthologous genes
prediction of promoters
prediction transcription factor binding motifs
prediction of operons
prediction of genome rearrangement
prediction of simple and complex repeats
prediction of SNPs and haplotype analysis
…….
Computational Genomics

Computational genomics is playing a increasingly
more important role in modern biology
– suggesting biological functions of predicted genes, through
homology search
• e.g., NF1 regulates Ras in human
– suggesting possible genes associated with a particular
disease, and hence reducing the search space for relevant
genes
• e.g., genes involved in retinal disease
– suggesting an organism’s biology through genome
comparison,
• e.g., M. genitalium produces its macromolecules from preformed
precursors that are transported into its cytoplasm from its eukaryotic
host cell
Computational Genomics
– suggesting component-candidate list and their possible
interaction relationships in a biological pathway/network
• e.g., prediction of operons in microbial genomes
– providing powerful tools for studies of biological evolution
• sequence/genome comparison
• phylogenetic profile analysis
– have played key roles in the human and other genome
projects
• genome assembly
• protein-coding gene prediction
are considered as major milestones in the human
genome project
• genome annotation
Challenges in Computational Genomics

One challenge comes directly from the sheer amount of
sequence data and the rate at which the data is being generated
– 207 genomes have been sequenced
– close to 1,000 genomes are being sequenced
• 506 prokaryotic genomes
• 418 eukaryotic genomes

The amount of information potentially drivable through
comparative genome analysis could be enormous knowing that
functional elements are often conserved among “related” genomes
– how to effectively derive them?!
Challenges in Computational Genomics

Prediction of protein-coding genes still represents a
challenging problem
– accurate prediction of exon/intron boundaries
– prediction of alternatively spliced gene forms

Protein-coding genes account for ~3% of the human
genome. What and where are the other “functional
elements” in the rest of the genome?
– how to identify them?
– how to (help to) predict their functions?
Challenges in Computational Genomics

Identification of RNA-coding genes
– what are the identifiable characteristics of RNA genes?

Particularly, identification of small regulatory RNA
– short interference RNAs (siRNA)
– microRNA (miRNA)
– small modulatory RNA (smRNA)

Identification of regulatory elements/binding sites
– transcription regulatory binding sites
– splice factor binding sites
– other classes of regulatory elements?
Challenges in Computational Genomics

Identification of other types of functional elements
– transposons
– ….

Identification of genome variations – polymorphisms
– identification of SNPs
– prediction of haplotype blocks

Recognition of genome structures
– operons, regulons in microbes
– genomic structures in eukaryotic genomes
Challenges in Computational Genomics

Genome is not a linear sequence; It is a 3D structure!
– accurate identification and characterization of functional
elements by looking at the genome as a 3D DNA structure
…. and many other outstanding challenges!
Papers submitted to Genomics Session

69 papers submitted to the Genomics session

10 papers selected for presentation
– 8 long papers
– 2 short papers

Acceptance rate: 14.5%
Papers submitted to Genomics Session

Papers submitted to ISMB/ECCB (genomics session) provide
hints about the hot research areas in genomics
–
–
–
–
–
haplotype prediction and applications
prediction of non protein-coding genes, particularly RNA genes
prediction of regulatory binding sites
characterization of genomic structures
…….
represent areas with the most number of paper submissions
in the “genomics” area
10 papers selected from 69 submissions

Talk #12: Splice site identification by idlBNs, R Castelo and R Guigo

Talk #14: Improved techniques for the identification of pseudogenes, L Coin and R.
Durbin

Talk #16: Exploiting conserved structure for faster annotation of non-coding RNAs
without loss of accuracy, Z. Weinberg and L. Ruzzo

Talk #18: Functional inference from non-random distribution of conserved predicted
transcription factor binding sites, C. Dieterich, S. Rahmann and M. Vingron

Talk #19: CHAINER: software for comparing genomes, MI Abouelhoda and E.
Ohlebusch

Talk #20: Genomic features in the breakpoint regions between synetenic blocks, P
Trinh, A McLysaght and D. Sankoff

Talk #22: Finding conserved primer pair candidates between two genomes using
scalable genome joins with the MoBIoS, W. Xu, W. Briggs, J. Padolina, W. Liu, CR
Linder and D. Miranker

Talk #24: High density linkage disequilibrium mapping using models of haplotype
block variation, G. Greenspan and D. Geiger

Talk #26: Into the heart of darkness: large scale clustering of human non-coding
DNA, G Bejerano, D Haussler and M. Blanchette

Talk #28: CIS: compound importance sampling method for transcription factor
binding site p-value estimation, Y. Barash, G. Elidan, T. Kaplan, and N. Friedman
Selected Papers

Prediction of non protein-coding genes (3)
– RNA genes (2)
– pseudo genes

Prediction of binding or functional sites (3)
– transcription factor binding sites (2)
– splice sites

Genome comparison and structure analysis (3)
– genome comparison (2)
– characterization of genomic structures

Applications of haplotype blocks
Genomics Papers Presented at ISMB’94

“Computational genomics” papers from ISMB’94 (10 years ago!)
–
–
–
–
–
–
–
–
–
genome assembly
restriction map construction
genetic map construction
genome alignment
multiple sequence alignment
finding repeats in C. elegans
prediction of internal exons in human genome
exon/intron parsing
(protein) gene structure prediction
The field has clearly evolved quite a bit!
ISMB/ECCB Genomics Session

Part I
– Lomond Auditorium starting at 2:50pm of August 1st
– 4 long talks

Part II
– Clyde Auditorium starting at 9:20am of August 2nd
– 2 short talks
– 4 long talks
Acknowledgements

Reviewers for the Genomics Session
–
–
–
–
–
–
–
–
–
–
Andy Baxevanis
Ewan Birney
Jeremy Buhler
Liming Cai
Dannie Durand
Roderic Guigo
Robert Giegerich
Dan Gusfield
Eran Halperin
Daniel Huson
–
–
–
–
–
–
–
–
–
–
Steve Jones
Daphne Koller
Anders Krogh
Shirley Liu
Mihai Pop
Isidore Rigoutsos
Marie-France Sagot
Victor Solovyev
Martin Tompa
Dong Xu
Genomics Session

Enjoy the talks!
Download