Genome-wide Functional Linkage Maps

advertisement
Genome-wide Functional Linkage Maps
Methods for inferring functional
linkages: Complexes, Pathways
The Genome-wide functional linkage
Map in M. tb
Assessing accuracy of functional
linkages
Functional linkages in structural
genomics
Analyzing parallel pathways
The DIP and ProLinks databases
4000
3000
TB Gene A
Rosetta stone
Phylogenetic profiles
Gene neighbors
Operon method
(Microarray method)
2000
1000
0
0
1000
2000
TB Gene B
3000
4000
Diphtheria Toxin
Dimer vs. Monomer
Bennett et al., PNAS, Vol. 91, 3127-3131 (1994)
Rosetta Stone Assumption:
Fusion of functionally-linked domains
In organism 1:
A
B
A
In organism 2:
A'
B'
Implies proteins A and B may be functionally linked
Marcotte et al. (1999) Science, 285, 751
PHYLOGENETIC PROFILE METHOD
Pellegrini et al (1999) PNAS 96, 4285
The Gene Neighbor Method
for Inferring Functional Linkages
A
B
A
A
B
genome 1
C
genome 2
B
genome 3
C
A
C
B
C
genome 4
. . .
A statistically significant correlation is observed between the positions of proteins A
and B across multiple genomes. A functional relationship is inferred between proteins
A and B, but not between the other pairs of proteins:
A
C
B
OPERON or GENE CLUSTER
method of inferring functional
linkages in the genome of
Mycobacterium tuberculosis
gene A
bbbb gene B
gene C
Number of predicted
Distance threshold operon groups
# of genes with links
# of functional linkages
0
bp
542
1279
2034
25 bp
792
2071
4442
50 bp
879
2420
5890
75 bp
919
2665
7026
100 bp
933
2870
8468
The 100 bp threshold is chosen because it gives the
broadest coverage consistent with high accuracy
Research of Michael Strong
Network Interaction Map vs.
Genome-Wide Functional
Linkage Map
Whole Genome Functional Linkage Map
(RS, PP, GN, OP overlap)
4000
TB Gene A
3000
2000
1000
vs
0
0
1000
2000
TB Gene B
3000
4000
Functional linkage between
Gene A and Gene B
Strong, Graeber et al. (2003) Nucleic Acid
Research, 31, 7099
Operon Method
Rosetta Stone Method
4000
4000
TB Gene A
3000
TB Gene A
2000
3000
1000
2000
1000
TB Gene B
0
0
1000
2000
0
3000
4000
0
1000
2000
3000
4000
TB Gene B
Conserved Gene Neighbor Method
Phylogenetic Profiles Method
4000
4000
3000
TB Gene A
TB gene A
3000
2000
2000
1000
1000
0
0
0
1000
2000
TB gene B
3000
4000
0
1000
2000
3000
4000
TB Gene B
Figure 7.
M. Strong, T. Graeber et al.
Whole Genome Functional Linkage Map
(RS, PP, GN, OP methods for TB)
4000
TB Gene A
3000
2000
1000
0
0
1000
2000
3000
4000
TB Gene B
Requiring 2 or more functional linkages:
1,865 genes make 9,766 linkages
Whole Genome Functional Linkage Map
Zoom (Genes Rv0001-Rv0051)
50
D
E
TB Gene A
40
C
30
B
20
F
10
A
0
0
10
20
30
TB Gene B
40
50
Whole Genome Functional Linkage Map
Zoom (Genes Rv0001-Rv0051)
50
D
E
TB Gene A
40
Cluster A:
6 genes; 5 annotated
4 linkages
5 genes coding for DNA
replication or repair
The 6th gene inferred
to
B
be involved in DNA
binding, and in fact
encodes a Zn-ribbon
30
20
10
C
F
A
0
0
10
20
30
TB Gene B
40
50
Whole Genome Functional Linkage Map
Zoom (Genes Rv0001-Rv0051)
50
D
E
TB Gene A
40
Cluster A:
6 genes; 5 annotated
5 linkages
5 genes coding for DNA
replication or repair
The 6th gene inferred
to
B
be involved in DNA
binding, and in fact
encodes a Zn-ribbon
30
20
10
C
F
None of the genes is a homolog
A
0
0
10
20
30
TB Gene B
40
50
Whole Genome Functional Linkage Map
Zoom (Genes Rv0001-Rv0051)
50
D
E
Cluster B:
6 genes; 7 linkages
3 genes: Ser/Thr kinase
C
or phophatase activities
2 genes: cell wall biosynth.
1 gene: unannotated
TB Gene A
40
30
B
20
F
10
Gene 14, pknB (a Ser/Thr kinase)
contains PASTA domains (penicillin-binding
serine/threonine kinase associated)
A
0
0
10
20
30
TB Gene B
40
50
Whole Genome Functional Linkage Map
Zoom (Genes Rv0001-Rv0051)
50
D
E
Cluster B:
6 genes; 7 linkages
3 genes: Ser/Thr kinase
C
or phophotase activities
2 genes: cell wall biosynth.
1 gene: unannotated
TB Gene A
40
30
B
20
F
10
Gene 19 is unannotated. It contains
A FHA (Forkhead associated) domain,
which binds phosphothreonine containing proteins.
A
0
0
10
20
30
TB Gene B
40
50
Whole Genome Functional Linkage Map
Zoom (Genes Rv0001-Rv0051)
50
D
E
TB Gene A
40
C
Cluster D: Links gene 50 (a penicillin
binding protein involved in cell wall
synthesis) to gene 51 (an integral
membrane protein).
30
B
20
F
10
A
0
0
10
20
30
TB Gene B
40
50
Whole Genome Functional Linkage Map
Zoom (Genes Rv0001-Rv0051)
50
E is a functional link between
D gene 16 (pbkA in
cell wall biosynthesis) and gene 50 (the penicillin
binding protein involved in cell wall biosynthesis)
E
TB Gene A
40
30
C
20
A
10
F
B
0
0
10
20
30
TB Gene B
40
50
Whole Genome Functional Linkage Map
(RS, PP, GN, OP methods for TB)
4000
TB Gene A
3000
2000
1000
0
0
1000
2000
TB Gene B
3000
4000
Some columns show
similar linkages, so
cluster like columns,
using Eisen et al.(1998)
procedure, CLUSTER
Hierarchical clustering of the TB
Whole Genome Functional Linkage Map
Functional modules range in size
From 2 to > 100 linkages
Dozens of off diagonal functional
linkages
Research of Michael
Strong and Tom Graeber
Degradation of Fatty acids
Polyketide and nonribosomal,Degradation of
Fatty acids, and Energy Metabolism
Energy Metabolism,
oxidoreductases
Polyketide and
non-ribosomal
Peptide synthesis
Detoxification
Research of Michael Strong and Tom Graeber
Cell Envelope, Cell Division
Energy Metabolism TCA
Broad Regulatory, Serine Threonine Protein Kinase
Cell Envelope, Murein Sacculus and Peptidoglycan
Transport/Binding Proteins
Transport/Binding Proteins Cations
Chaperones
Cell Envelope
Energy Metabolism, ATP Proton Motive force
Biosynthesis of cofactors
Cytochrome P450
Two component systems
Energy Metabolism, Anaerobic Respiration
Sugar Metabolism
Purine, Pyrimidine nucleotide biosynthesis
Aromatic Amino Acid Biosynthesis
Novel Group
Biosynthesis of Cofactors, Prosthetic groups
Synthesis and Modif. Of Macromolecules, rpl,rpm, rps
Amino Acid Biosynthesis (Branched)
Degradation of Fatty Acids
Emergy Metab. Respiration Aerobic
Energy Metabolism, oxidoreductase
Fig 4.
M. Strong, T. Graeber et al.
Energy Metabolism, oxidoreductase
Polyketide and non-ribosomal peptide synthesis
Lipid Biosynthesis
Amino acid Biosynthesis
Virulence
Deg. of Fatty Acids
Detoxification
Cell Envelope, Cell Division
Energy Metabolism TCA
Broad Regulatory, Serine Threonine Protein Kinase
Cell Envelope, Murein Sacculus and Peptidoglycan
Transport/Binding Proteins
Transport/Binding Proteins Cations
Chaperones
Cell Envelope
Energy Metabolism, ATP Proton Motive force
Biosynthesis of cofactors
Cytochrome P450
Two component systems
Aromatic Amino Acid Biosynthesis
Novel Group
Energy Metabolism, Anaerobic Respiration
Sugar Metabolism
Purine, Pyrimidine nucleotide biosynthesis
One of 7 modules of unannotated linkages,
perhaps undiscovered pathways or complexes
Biosynthesis of Cofactors, Prosthetic groups
Amino Acid Biosynthesis (Branched)
Degradation of Fatty Acids
Emergy Metab. Respiration Aerobic
Energy Metabolism, oxidoreductase
Energy Metabolism, oxidoreductase
Polyketide and non-ribosomal peptide synthesis
Lipid Biosynthesis
Amino acid Biosynthesis
Virulence
Deg. of Fatty Acids
Detoxification
Pathway Reconstruction from
Functional Linkages
All 9 enzymes of the histidine biosynthesis
pathway are linked, and are clustered
separately from other amino acid synthetic
pathways
HisG
HisF
HisI / HisI2
HisA
HisH
HisB
HisC / HisC2
HisB
HisD
Functional Linkages Among Cytochrome Oxidase Genes
CtaD
CtaE
Functional linkages relate all 3 components
of cytochrome oxidase complex
and also CtaB, the cytochrome
oxidase assembly factor
These genes are at four different chromosomal
locations
Membrane proteins linked to soluble proteins
CtaC
CtaB
Quantitative Assessment of
Inferred Protein Complexes
Research of Edward Marcotte, Matteo Pellegrini,
Michael Thompson and Todd Yeates
Calculating Probabilities of Coevolution
 n  N  n 
 

k mk
Phylogenetic Profile
P(k | n, m, N )  
N
Rosetta Stone
 
N= number of fully sequenced genomes
m
n= number of homologs of protein A
m = number of homologs of protein B
k = number of genomes shared in common
Gene Neighbor
n = intergenic separation
 ln X k
k 0
k!
Pm ( X )  1  Pm ( X )  X 
X= fractional separation of genes
Operon
m 1
P(n)  1  e n
Combining Inferences of CoEvolution from 4 Methods
We use a Bayesian approach to combine the probabilities from the four
methods to arrive at a single probability that two proteins co-evolve:
 4 P( f i | pos)  P( pos)

Opost   
 i 1 P( f i | neg )  P(neg )
where positive pairs are proteins with common pathway annotation
and negative pairs are proteins with different annotation
ProLinks Database
www.dip.doembi.ucla.edu/pronav
~ 10,000,000 Functional Linkages inferred
from 83 fully sequenced genomes
Benchmarking this Approach
Against Known Complexes
Ecocyc: Karp et al. NAR, 30, 56 (2002)
ROC plot
0.4
Research
of Matteo
Pellegrini
Fraction of True Positives
0.35
For high confidence links,
we find 1/3 of true interactions
with only one 1/1000 of the false
positive ones
0.3
0.25
0.2
0.15
0.1
Random
0.05
0
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
Fraction of False Positives
True positive interactions are between subunits of known complexes and false positive
ones are between subunits of different complexes.
Example Complex: NADH
Dehydrogenase I
11 of 13 subunits detected
Example Complex: NADH
Dehydrogenase I
11 of 13 subunits detected
3 false positives
From Inferred Protein
Linkages to Structures of
Complexes
Research of Michael Strong, Shuishu Wang, Markus Kauffman
The Problem of PE and PPE Proteins in M. tb
PE, PE-PGRS, and PPE Proteins in M. tuberculosis
38 PE proteins; 61 PE-PGRS proteins; 68 PPE proteins
Together compromise about 5 % of the genome
No function is known, but some appear to be membrane bound
No structure is known: always insoluble when expressed
Goal: use functional linkages to predict a complex between
a PE and a PPE protein: express complex, and determine
its structure
Research of Shuishu Wang and Michael Strong
Construction of a co-expression vector to test for
protein-protein interactions (Mike Strong)
T7 promoter lac oper.
RBS
gene A
Nde1
RBS
Kpn1
gene B
Thrombin
site
NcoI
His
tag
HindIII
pET 29b(+)
transcription
polycistronic mRNA
translation
protein A
If proteins do not interact
protein A
protein B (with His tag)
protein B (with His tag)
If proteins interact
(protein-protein interaction)
protein A protein B (with His tag)
When co-expressed, the PE and PPE proteins,
inferred to interact, do form a soluble complex,
Mr = 35,200
Sedimentation equilibrium experiments:
Rv2430c + Rv2431c fraction 49, in 20mM HEPES, 150mM NaCl, pH 7.8
Concentration OD280 0.7, 0.45, 0.15
Expected Mr:
Rv 2431c (PE)
10,687
(10563.12 from Mass Spec)
Rv2430c+His tag (PPE) 24,072
(23895.00 from Mass Spec)
Possibly suggests a 1:1 complex between these
two proteins
Crystallization trials of the Complex Between
PE Protein Rv2430c and PPE Protein Rv2431c
Database of Interacting Proteins
www.dip.doe-mbi.ucla.edu
Experimentally detected interactions
from the scientific literature
Currently ~ 44,000 interactions
The DIP Database
DOE-MBI LSBMM, UCLA
Live DIP Gives the States of Proteins
Transitions Documented
*
*
*
ProLinks Database and the
Protein Navigator
• Contains some 10,000,000 inferred
functional linkages from 83 genomes
• Available at www.doe-mbi.ucla.edu
• Soon to be expanded to 250 fully
sequenced genomes
• Eventually to be reconciled with DIP
Summary
Many functional linkages are revealed
from genomic and microarray data (high coverage)
Validity of functional linkages can be assessed by comparison to known complexes, and to expression data, and
by keyword recovery
Clustered genome-wide functional maps can reveal and
organize information on complexes and pathways
Functional linkages can reveal protein complexes
suitable for structural studies
B
C
A protein’s function is defined by
Y
A
X
the cellular context of its linkages
V
Z
Protein Interactions
Analysis of M.tb. Genome
Michael Strong
Whole Genome Interaction Maps
Michael Strong & Tom Graeber
Methods of Inferring Interactions
Edward Marcotte, Matteo Pellegrini, Todd Yeates
Michael Thompson, Richard Llwellyn
Database of Interacting Proteins
Lukasz Salwinski, Joyce Duan, Ioannis Xenarios,
Robert Riley, Christopher Miller
Parallel pathways
Huiying Li
Download