wk1_day1_introduction_2010

advertisement
Introduction
• Day 1: Introduction
• Day 2: Sequence Analysis
• Day 3: Databases
• Day 3: Dynamic Programming
mario@sanbi.ac.za
Goals of Bioinformatics
• Understand living cells and how they
function on a molecular level
• Done by analysing molecular sequence
and structural data
• Rationale is the “central dogma” of biology
Genomic Data (2009)
http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome
Genomic Data (2010)
http://www.ncbi.nlm.nih.gov/sites/entrez?db=genome
Bioinformatics Limitations
• Completely relying on the information is
•
dangerous if the info is inaccurate
Quality of bioinformatics predictions depends on
 quality of the data and
 sophistication of the algorithms
• Bioinformatics and experimental biology are
complementary:
 Bioinformatics results need to be consistent
 with experimental biology
Bioinformatics Limitations
• Data (e.g. sequence, expression) may contain
•
•
•
errors
Downstream interpretation of sequence date will
be wrong if the sequences or the annotation
thereof is wrong
Many algorithms lack capability and
sophistication to truly reflect reality
Outcome of computation also depends on
available computing power
Definitions
• Sequence alignment
• Dynamic
•
•
•
•
•
Programming
Global/ Local
Alignment
Sequence Identity
Phylogenetics
Paralog/ homolog
Proteomics
•
•
•
•
•
•
Genomics
Transcriptomics
Annotation
BLAST
Sequence assembly
Contig
‘Omics’
•
•
•
•
•
Genomics
Proteomics
Transcriptomics
Phylolomics etc.
Genomics
 Structural
 Functional
Structural Genomics
• Deals with genome structures
• Focus on study of




Genome mapping
Genome sequencing and assembly
Genome annotation
Genome comparison
Structural Genomics:
Genome mapping
• Identify relative locations of
 Genes
 Mutations or
 Traits
Structural Genomics:
Genome mapping
Increasing Resolution
Cytological Map
Genetic Map
Physical Map
DNA Sequence
Image adapted from “Essential Bioinformatics” by Jin Xiong
For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC
Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196
Structural Genomics:
Genome mapping
Increasing Resolution
Cytological Map
Genetic Map
Physical Map
DNA Sequence
Image adapted from “Essential Bioinformatics” by Jin Xiong
For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC
Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196
Structural Genomics:
Genome mapping
Increasing Resolution
Cytological Map
Genetic Map
Physical Map
*
*
*
*
*
DNA Sequence
Image adapted from “Essential Bioinformatics” by Jin Xiong
For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC
Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196
Structural Genomics:
Genome mapping
Increasing Resolution
Cytological Map
Genetic Map
Physical Map
DNA Sequence
*
*
*
*
*
agctggatttgcgcgcaa
Image adapted from “Essential Bioinformatics” by Jin Xiong
For more info: Look at Chap 5 in “Genomes”, T.A. Brown (572.86 MAL) in the UWC
Library or the online version at http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.6196
Structural Genomics:
Genome sequencing
• Shotgun sequencing
 Genome is fragmented and cloned
 Random sequencing of both ends of cloned
DNA
 High numbers of random sequences
 It statistically ensures the whole genome is
covered
 Software used to assemble the random
fragments into a single, contiguous genome
Structural Genomics:
Shotgun sequencing
http://www.scq.ubc.ca/wp-content/uploads/2006/08/shotgun1.gif
Structural Genomics:
Genome sequencing
• Hierarchical sequencing
 100-300kb genomic cloned into a BAC
 Using a physical map, order and locations of
BAC clones on chromosome can be
determined
 Successive sequencing of adjacent BAC
clones result in coverage of the complete
genome
Structural Genomics:
Hierarchical sequencing
http://www.scq.ubc.ca/wp-content/uploads/2006/08/topdownseq.gif
Structural Genomics:
Shotgun vs Hierarchical
Shotgun
Hierarchical
Structural Genomics:
Genome assembly
• Sequence fragments are stitched together
through the overlapping sequences
between fragments
CCAATAA
CACCATT
TATAAT
AATTGGCA
TTGAATA
Structural Genomics:
Genome annotation
• Happens before submission to database
 Gene prediction: GenScan, FgenesH
• Verify predictions
 BLAST search against sequence database
 Compare to experimentally determined cDNA and
EST sequences: GeneWise, Spidey, SIM4,
EST2Genome
 Manual checking by human curators
• • Functional assignment
 BLAST Homology searching against protein database
 Search protein motif and domain databases: Pfam and
Interpro
Structural Genomics:
Genome annotation
http://hinvlite.sanbi.ac.za
Structural Genomics:
Genome comparison
• Comparison of
 Gene number
 Gene location
 Gene content
• Reveals extent of conservation between
genomes
• Reveals core set of genes crucial for
survival; the “Minimal Genome”
Structural Genomics:
Genome comparison
http://www.sanger.ac.uk/Software/ACT/
Functional Genomics
• Focus on gene function
 On genome level, using
 High throughput methods
• Conducted using
 Sequence-based
 Microarray-based methods
Functional Genomics:
Sequence-based
• Expressed Sequence Tag (EST)
 Provide rough estimate of actively expressed
genes under specific physiological conditions
• Serial Analysis of Gene Expression
(SAGE)
 Provides quantitative analysis of mRNA
expression
 Occurrence and quantity of a specific fragment
indicates level of gene expression
Functional Genomics:
ESTs
• Selected mRNA
•
•
•
sequences are
reverse transcribed
into cDNA clones
cDNA clones are
then sequenced
Obtained from 5’ or
3’ end
Typically 500bp long
http://www.ncbi.nlm.nih.gov/About/primer/est.html
Functional Genomics:
ESTs
• EST Limitations




Often low quality
Contamination (vector)
Chimera
Represent partial genes
• Despite this ESTs are still widely used
(www.ncbi.nlm.nih.gov/dbEST)
Functional Genomics
• EST Gene index construction
 Organise and consolidate ESTs s.t. data can
be used to extract full-length cDNAs
•
•
•
•
Remove contaminants
Mask repeats
Cluster sequences
Within a cluster, assemble overlapping ESTs into
contigs/ consensus sequences
• Annotation: similar to process for genome
 Examples: Unigene, StackPack, TGI
Functional Genomics:
SAGE
• Short DNA
•
fragment (15-20bp)
is cut from a cDNA
and used as
unique marker for
that transcript
Fragments are
concatenated,
cloned and
sequenced
http://www.sagenet.org/protocol/MANUAL1e.pdf
Functional Genomics:
Microarrays
• Immobilised probes (oligonucleotides or
cDNA) are ‘spotted’ on a chip
• Probes are representative of a complete
genome
• Fluorescent cDNA from organism is
allowed to hybridise with the probes
• Intensity of fluorescence per spot reflect
the amount of mRNA present
Proteomics:
Technology
• 2D-Page Gel: Separates proteins based on
charge and mass
 Melanie, CAROL, Comp2Dgel, SWISS-2DPAGE
• Mass Spectrometry (MS): peptide is fragmented,
•
aspirated and the mass-to-charge ratio is
determined
Database searching: Using peptide fingerprint
obtained from MS, a database can be searched
 ExPASY: AAcompIdent, TagIdent, PeptIdent,
CombSearch
 ProFound, Mascot
Proteomics:
Technology
• Differential In-gel Electrophoresis (DIGE)
 Proteins from experimental and control
samples are labeled with different colored
dyes
 Differentially expressed proteins can be coseparated and visualised on the same gel
Proteomics:
Technology
• Protein Microarrays
 Chip contains immobilised proteome
 Used to study protein function
 Assay
•
•
•
•
Protein-protein interaction
Protein-DNA/ RNA interactions
Protein-ligand interactions
Enzyme activity
Proteomics:
Post-translational Modifications
• For activity, many proteins have to be covalently
•
•
•
modified before or after folding process
Proteolytic cleavage, formation of disulfide
bonds, addition of phosphoryl, methyl, acetyl
groups, etc.
Modifications impact protein function
Bioinformatics can predict sites for modification
 AutoMotif, Cysteine, FindModand GlyMod(available
from ExPASY), RESID
Proteomics:
Protein Sorting
• Sub-cellular localisation is integral to
protein function
• Many proteins are only active when after
being transported to specific compartments
• Identifying protein localisation is important
in functional annotation
 SignalP, TargetP, PSORT
Proteomics:
Protein-protein Interactions
• Experimental determination
• Prediction based on





Domain fusion
Gene neighbours
Sequence homology
Phylogenetic information
Hybrid methods
Download