Data to Biological sense

advertisement
Modeling Functional Genomics
Datasets
CVM8890-101
Lesson 1
13 June 2007 Bindu Nanduri
Lesson 1: Data to Biological
sense. What we are trying to
achieve. Introduction to
functional genomics modeling
strategies.
Transcriptomics and Proteomics
Why study gene expression
changes?????
Transcription is predominant form of
regulation
Northern Blots
Mol Vis. 1996 Nov 4;2:11
Microarrays
Basic concept:
Reverse Northern blot on a large scale
High throughput:
hybridize control and experimental samples
simultaneously using distinct fluorescent dyes
many assays can be carried out in parallel
Affymetrix oligo arrays design
Usually the most 3 prime area, often UTR
AAAA..
25mer
25mer
25mer
(11 to 16)
25mer
http://www.affymetrix.com
Genomic Tiling Array Design
Genome
Sequence
5´
3´
Multiple probes
Center-Center Resolution 38 bp
ISB Systems Biology Course 2006
Is mRNA level = Protein level?
Is there a correlation???
Comparison of protein levels (MS, 2D gels) and RNA
levels (SAGE) for 156 genes in yeast
mRNA levels unchanged, but protein levels varied by up to
20X
protein levels unchanged, but mRNA levels varied by up to
30X
Highly expressed mRNAs correlate well with protein levels
Gygi et al. (1999) Mol. Cell. Biol.
ISB Systems Biology Course 2006
ISB Systems Biology Course 2006
Expressed Sequence Tags
ESTs…pieces of DNA sequence (usually 200 to 500 nt)
generated by sequencing either one or both ends of an
expressed gene
Bits of DNA that represent genes expressed in certain
cells, tissues, or organs from different organisms and
Can be useful "tags" to fish a gene out of a portion of
chromosomal DNA by matching base pairs
http://www.ncbi.nlm.nih.gov/About/primer/est.html
EST Sequence Clustering
Gene can be expressed as mRNA many,many times, ESTs
derived from this mRNA may be redundant
many identical, or similar, copies of the same EST
redundancy and overlap means that when someone searches
dbEST for a particular EST, they may retrieve a long list
of tags, many of which may represent the same gene
UniGene database automatically partitions GenBank
sequences into a non-redundant set of gene-oriented
clusters
http://www.ncbi.nlm.nih.gov/About/primer/est.html
ESTs: EST mapping to the genome, annotation
differential expression
Transcriptome: Clustering, differential
expression analysis
Proteome: differential expression analysis
Multiple data analysis platforms
Proteomics
Transcriptomics
EST analysis
LIST of elements
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
#11
#12
#13
#14
#15
#16
#17
#18
#19
#20
#21
#22
#23
#24
#25
#26
#27
#28
#29
#30
#31
#32
#33
Reference
A
ALBU_CHICK Serum albumin precursor (Alpha-livetin) (Allergen Gal d 5)
APA1_CHICK Apolipoprotein A-I precursor (Apo-AI)
FIBA_CHICK Fibrinogen alpha/alpha-E chain precursor [Contains: Fibrinopep
Mol_id: 1; Molecule: Ovotransferrin; Chain: Null; Synonym: Conalbumin; Hete
PB2 protein [Influenza A virus (A/chicken/Taiwan/7-5/99(H6N1))]
C Chain C, Crystal Structure Of Native Chicken Fibrinogen
I50711 complement C3 precursor - chicken
TTHY_CHICK Transthyretin precursor (Prealbumin) (TBPA)
TIM2_CHICK Metalloproteinase inhibitor 2 precursor (TIMP-2) (Tissue inhibito
AAA6469
MYH9_CHICK Myosin heavy chain, nonmuscle (Cellular myosin heavy chain
S19188 myosin-V - chicken
FIBB_CHICK Fibrinogen beta chain precursor [Contains: Fibrinopeptide B]
A Chain A, Crystal Structure Of Wild Type Turkey Delta 1 Crystallin (Eye Le
type I polyketide synthase AVES 2 [Streptomyces avermitilis MA-4680]
Hyperion protein, 419 kD isoform [Gallus gallus] 0
vitronectin [Gallus gallus] ovirus 3]
CA36_CHICK Collagen alpha 3(VI) chain precursor
paired-type homeobox Atx [Gallus gallus] I beta su
I51298 transforming protein sno-N - chicken
TP2A_CHICK DNA topoisomerase II, alpha isozyme
ITA6_CHICK Integrin alpha-6 precursor (VLA-6)
glucose regulated thiol oxidoreductase protein precursor [Gallus gallus]
spectrin alpha chain [Gallus gallus] rsor
ATP-binding cassette transporter 1 [Gallus gallus]
cone-type transducin alpha subunit [Gallus gallus]
condensin complex subunit [Gallus gallus] s] hick
BA2B_CHICK Bromodomain adjacent to zinc finger domain 2B (Extracellular
ryanodine receptor type 3 [Gallus gallus]
type I polyketide synthase AVES 4 [Streptomyces avermitilis MA-4680]
structural muscle protein titin [Gallus gallus] n k
breast cancer susceptibility protein [Gallus gallus]
FAS_CHICK Fatty acid synthase [Includes: EC 2.3.1.38; EC 2.3.1.39; EC 2.
#1
#2
#3
#4
#5
#6
#7
#8
#9
#10
#11
#12
#13
#14
#15
#16
#17
#18
#19
#20
#21
#22
#23
#24
#25
#26
#27
#28
#29
#30
#31
#32
#33
Reference
Accession
Peptides (Hits)
Sc
ALBU_CHICK Serum albumin precursor (Alpha-livetin) (Allergen Gal d 5)
113575 255 (244 6 1 3 1
APA1_CHICK Apolipoprotein A-I precursor (Apo-AI)
113990 94 (84 4 4 2 0)
FIBA_CHICK Fibrinogen alpha/alpha-E chain precursor [Contains: Fibrinopeptide
1706798 40 (37 1 1 0 1)
Mol_id: 1; Molecule: Ovotransferrin; Chain: Null; Synonym: Conalbumin; Hetero1127086 34 (30 3 0 1 0)
PB2 protein [Influenza A virus (A/chicken/Taiwan/7-5/99(H6N1))]
9954387 28 (0 22 3 2 1)
C Chain C, Crystal Structure Of Native Chicken Fibrinogen
8569623 15 (15 0 0 0 0)
I50711 complement C3 precursor - chicken
2118406 11 (9 2 0 0 0)
TTHY_CHICK Transthyretin precursor (Prealbumin) (TBPA)
136463 10 (10 0 0 0 0)
TIM2_CHICK Metalloproteinase inhibitor 2 precursor (TIMP-2) (Tissue inhibitor 3122960 13 (0 11 1 0 1)
AAA6469
3645997 10 (9 0 1 0 0)
MYH9_CHICK Myosin heavy chain, nonmuscle (Cellular myosin heavy chain) (NMMHC)
127759 14 (5 0 5 3 1)
S19188 myosin-V - chicken
104779 13 (4 3 3 2 1)
FIBB_CHICK Fibrinogen beta chain precursor [Contains: Fibrinopeptide B]
399491 9 (8 1 0 0 0)
A Chain A, Crystal Structure Of Wild Type Turkey Delta 1 Crystallin (Eye Lens
14278427 12 (0 2 10 0 0)
type I polyketide synthase AVES 2 [Streptomyces avermitilis MA-4680]
29827480 8 (6 1 0 1 0)
Hyperion protein, 419 kD isoform [Gallus gallus] 0
4582571 11 (0 5 3 3 0)
vitronectin [Gallus gallus] ovirus 3]
1922282 7 (7 0 0 0 0)
CA36_CHICK Collagen alpha 3(VI) chain precursor
1345652 10 (3 2 3 1 1)
paired-type homeobox Atx [Gallus gallus] I beta su
18252581 11 (0 4 5 1 1)
I51298 transforming protein sno-N - chicken
2147397 7 (6 1 0 0 0)
TP2A_CHICK DNA topoisomerase II, alpha isozyme
13959708 7 (5 2 0 0 0)
ITA6_CHICK Integrin alpha-6 precursor (VLA-6)
124948 12 (0 5 1 4 2)
glucose regulated thiol oxidoreductase protein precursor [Gallus gallus]
22651801 11 (1 3 4 0 3)
spectrin alpha chain [Gallus gallus] rsor
1334744 9 (3 2 1 2 1)
ATP-binding cassette transporter 1 [Gallus gallus]
18028983 9 (2 3 2 1 1)
cone-type transducin alpha subunit [Gallus gallus]
11066401 8 (1 6 0 1 0)
condensin complex subunit [Gallus gallus] s] hick
26801168 12 (0 2 4 4 2)
BA2B_CHICK Bromodomain adjacent to zinc finger domain 2B (Extracellular matri
22653663 8 (3 1 3 1 0)
ryanodine receptor type 3 [Gallus gallus]
1212912 9 (0 5 2 1 1)
type I polyketide synthase AVES 4 [Streptomyces avermitilis MA-4680]
29827484 7 (2 4 1 0 0)
structural muscle protein titin [Gallus gallus] n k
7024535 9 (0 5 2 0 2)
breast cancer susceptibility protein [Gallus gallus]
19568157 7 (4 1 1 0 1)
FAS_CHICK Fatty acid synthase [Includes: EC 2.3.1.38; EC 2.3.1.39; EC 2.3.1.41
1345958 8 (1 4 1 1 1)
Modeling Function
Modeling function requires:
knowing the components of the system
(structural annotation)
knowing what these components do & how they
interact
(functional annotation)
http://www.protonet.cs.huji.ac.il/ProToGO/Introduction.html
Where do you begin????
Specifics
Transcriptome Analysis
Clustering
Similar expression patterns = similar regulation?
clustering algorithms help us identify patterns in complex data
Key Goal: identify co-regulated groups of genes
Hierarchical clustering
K-means clustering
Self organizing feature maps
Principal component analysis
Proteomics
Qualitative : total number of identified proteins
data intersections
Quantitative: changes in protein expression
Proteomic data analysis tools
Use GO for…….
Grouping gene products by biological function
Determining which classes of gene products are overrepresented or under-represented
Focusing on particular biological pathways and
functions (hypothesis-driven data interrogation)
Relating a protein’s location to its function
Course Overview
Introduction to functional annotation. Orthologs and homologs;
clusters of orthologous genes (COGs) and the gene ontology
(GO); and how to find what functional annotation is available
Tools for functional annotation. Accessing functional data;
computational strategies to obtain more complete functional
annotation; the AgBase GO annotation pipeline.
Introduction to pathways analysis. Theory and strategies for
pathway analysis modeling in different species and tools for
pathway analysis.
Functional genomics modeling : prokaryotic and eukaryotic
examples
Some Useful Links
http://www.genomesonline.org/ (comprehensive access to information
regarding complete and ongoing genome projects around the world.)
http://www.geneontology.org/ (provides a controlled vocabulary to describe
gene and gene product attributes in any organism)
http://pir.georgetown.edu/ (integrated protein informatics resource for
genomics and proteomics)
http://www.pir.uniprot.org/ (protein database)
http://mips.gsf.de/ (maintains a set of generic databases as well as the
systematic comparative analysis of microbial, fungal, and plant genomes.)
http://www.ncbi.nlm.nih.gov/ (comprehensive resource for public databases,
literature and tools)
http://www.ebi.ac.uk/ensembl/ (System that maintains automatic annotation
of large eukaryotic genomes)
http://expasy.org/ (expert protein analysis system)
http://www.biocyc.org/ (BioCyc is a collection of 260 Pathway/Genome
Databases: metabolic pathways)
http://www.genome.jp/kegg/ (biological systems" database integrating both
molecular building block information and higher-level systemic information)
Some Useful Links
http://pfgrc.tigr.org/index.shtml (functional genomics studies on a variety of
pathogens for which genomic sequence information is currently, or will soon
be, available)
http://www.tigr.org/ (comprehensive resource for microbial genomics)
http://www.cs.ualberta.ca/~bioinfo/PA/ (High throughput proteome
annotations)
http://garnet.arabidopsis.org.uk/systems_biology_tools.htm (Arabidopsis
resources)
http://www.systems-biology.org/002/ (systems biology portal)
http://www.ebi.ac.uk/biomodels/ (mathematical models of biological
interests)
http://www.genmapp.org/current_databases.html (species-specific
collections of genes and annotation)
http://bioinfo.bgu.ac.il/bsu/microarrays/links/ (Microarray analysis resources)
http://david.abcc.ncifcrf.gov/ (Database for Annotation, Visualization and
Integrated Discovery)
http://www.animalgenome.org/pigs/community/links.html (swine genetics
community)
Some Useful Links
http://www.biocarta.com/FeaturedProducts/index.asp (pathways and tools
for analysis)
http://www.genecards.org/index.shtml (database of human genes that
includes automatically-mined genomic, proteomic and transcriptomic
information, as well as orthologies, disease relationships, SNPs, gene
expression, gene function, and service links for ordering assays and
antibodies)
http://www.proteomecommons.org/ (proteomics tools)
http://harvester.embl.de/
http://bioinformatics.org/ (open access institute)
http://www.ihop-net.org/UniPub/iHOP/ (A network of genes and proteins
extends through the scientific literature)
http://www1.jcsg.org/psat/help/document.html (comparative analysis of
protein sequence)
http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi (genome-scale
algorithm for grouping ortholog protein sequences)
http://www.pathogenomics.ca/ortholuge/ (ortholog prediction program)
http://www.gene-regulation.com/pub/databases.html (transcription factor
database)
Some Useful Links
http://www.reactome.org/ (curated knowledgebase of biological
pathways)
http://www.biochemweb.org/systems.shtml(The Virtual Library
of Biochemistry,Moleculer Biology and Cell Biology)
http://genome-www.stanford.edu/ (Stanford genomic resources)
http://www.softberry.com/berry.phtml (collection of tools for
annotation and analysis of sequences)
http://sosui.proteome.bio.tuat.ac.jp/sosuiframe0E.html
(prediction of transmembrane domains in proteins)
http://www.psort.org/psortb/ (subcellular localization predictions)
http://www.ch.embnet.org/software/TMPRED_form.html
(prediction of membrane-spanning regions and their orientation)
http://www.agbase.msstate.edu/ (functional analysis of
agricultural plant and animal gene products)
Download