Top-down Characterization of Proteins in Bacteria

advertisement
Top-down characterization
of proteins in bacteria with
unsequenced genomes
Nathan Edwards
Georgetown University Medical Center
Microorganism Identification

Homeland-security/defense applications


Clinical applications in strain identification:


Selection of treatment and/or antibiotics
New applications in microbiome analysis:



Long history of fingerprinting approaches
Bacterial colonies in gut, ....
Chronic wound infections
Compete with genomic approaches?


PCR, Next-gen sequencing
Primary sales-pitch is speed.
2
Microorganism Identifications

Match spectra with proteome (or genome)
sequence for (species) identity



Provides robust match with respect to
instrumentation and sample prep
Many bacteria will never be sequenced or "finished"...
 Pathogen simulants, for example
...but many have – about 2500 to date.
3
Microorganism Identifications

Match spectra with proteome (or genome)
sequence for (species) identity




Provides robust match with respect to
instrumentation and sample prep
Many bacteria will never be sequenced or "finished"...
 Pathogen simulants, for example
...but many have – about 2500 to date.
Can we use the available sequence to identify proteins
from unknown, unsequenced bacteria?
 Yes, for some proteins in some organisms!
4
Intact protein LC-MS/MS

Crude cell lysate

Capilary HPLC

LTQ-Orbitrap XL

Precursor scan:
30,000 @ 400 m/z
Data-dependent
precursor selection:


C8 column





5 most abundant ions
10 second dynamic
exclusion
Charge-state +3 or
greater
CAD product ion scan

15,000 @ 400 m/z
5
[195.00-2000.00]
MS yr_inclusion
60
40
20
CID Protein Fragmentation
Spectrum from Y. rohdei
21.03 21.46
0
19.5
20.0
20.5
21.0
21.5
22.0
22.5
Time (min)
yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4
F: FTMS + p ESI d Full ms2 756.70@cid35.00 [195.00-2000.00]
576.83
z=2
100
23.0
23.5
24.0
24.5
25.0
756.70 +8 MW 6044.11
90
80
70
584.57
z=4 720.39
z=2
60
50
785.41
z=4
40
694.62
z=4
30
20
10
840.16
z=7
200.78 329.71
z=?
z=?
903.81
z=3
928.49
z=4
461.16 559.55
z=4
z=?
992.53
z=3
555.29
z=4
0
200
400
600
800
1118.93
z=?
1000
1253.14
z=? 1345.30
z=?
1200
1400
1804.48
z=?
1491.23 1610.27 1666.89
1883.75
z=?
z=?
z=?
z=?
1600
1800
2000
m/z
6
Enterobacteriaceae
Protein Sequences

Exhaustive set of all Enterobacteriaceae family
protein sequences from


...plus Glimmer3 predictions on RefSeq
Enterobacteriaceae genomes



Swiss-Prot, TrEMBL, RefSeq, Genbank, and [CMR]
Primary and alternative translation start-sites
Filter for intact mass in range 1 kDa – 20 kDa
253,626 distinct protein sequences, 256 species

Derived from "Rapid Microorganism Identification
Database" (RMIDb.org) infrastructure.
7
ProSightPC 2.0

Product ion scan decharging



Absolute mass search mode



Enabled by high-resolution fragment ion
measurements
THRASH algorithm implementation
15 ppm fragment ion match tolerance
250 Da precursor ion match tolerance
"Single-click" analysis of entire LC-MS/MS
datafile.
8
Other tools

Explored using standard search engines:




Decharge and format as charge +1 spectrum
X!Tandem scoring plugin (ProSight, delta M)
OMSSA, Mascot, etc…
MS-Tools:


MS-Deconv, MS-TopDown,
MS-Align, MS-Align+, MS-Align-E!
9
60
CID Protein Fragmentation
Spectrum from Y. rohdei
756.70@cid35.00
[195.00-2000.00]
MS yr_inclusion
40
20
21.03 21.46
0
19.5
20.0
20.5
21.0
21.5
22.0
22.5
Time (min)
yr_inclusion #1937-2437 RT: 19.45-24.36 AV: 21 NL: 4.80E4
F: FTMS + p ESI d Full ms2 756.70@cid35.00 [195.00-2000.00]
576.83
z=2
100
23.0
23.5
24.0
24.5
25.0
756.70 +8 MW 6044.11
90
Match to Y. pestis 50S
Ribosomal Protein L32
80
70
584.57
z=4 720.39
z=2
60
50
785.41
z=4
40
694.62
z=4
30
20
10
840.16
z=7
200.78 329.71
z=?
z=?
903.81
z=3
928.49
z=4
461.16 559.55
z=4
z=?
992.53
z=3
555.29
z=4
0
200
400
600
800
1118.93
z=?
1000
1253.14
z=? 1345.30
z=?
1200
1400
1804.48
z=?
1491.23 1610.27 1666.89
1883.75
z=?
z=?
z=?
z=?
1600
1800
2000
m/z
10
Exact match sequence…
11
Phylogeny: Protein vs DNA
Protein Sequence
16S-rRNA Sequence
12
What about mixtures?
13
Shared Small
Ribosomal Proteins
14
Shared Small
Ribosomal Proteins
15
Identified E. herbicola proteins

30S Ribosomal Protein S19


m/z 686.39, z 15+, E-value 1.96e-16, Δ 0.007
Six proteins identified with |Δ| < 0.02
16
Identified E. herbicola proteins

DNA-binding protein HU-alpha


m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128
Eight proteins identified with "large" |Δ|
17
Identified E. herbicola proteins

DNA-binding protein HU-alpha



m/z 732.71, z 13+, E-value 1.91e-58
Use "Sequence Gazer" to find mass shift
ΔM mode can "tolerate" one shift for free!
18
ProSightPC: ΔM mode
b- and y-ions
Protein Sequence
Experimental
Precursor
ΔM
Also: PIITA - Tsai et al. 2009
19
ProSightPC: ΔM mode
Match a single "blind" mass-shift for free!
ΔM
b'- and y'-ions
b- and y-ions
Protein Sequence
Experimental
Precursor
ΔM
Also: PIITA - Tsai et al. 2009
20
ProSightPC: ΔM mode
Match a single "blind" mass-shift for free!
ΔM
b-, b'-, y- and y'-ions
Protein Sequence
Experimental
Precursor
ΔM
Also: PIITA - Tsai et al. 2009
21
Identified E. herbicola proteins

DNA-binding protein HU-alpha


m/z 732.71, z 13+, E-value 7.5e-26, Δ -14.128
Extract N- and C-terminus sequence supported
by at least 3 b- or y-ions
22
E. herbicola protein sequences
23
E. herbicola sequences
found in other species
24
Phylogenetic placement of
E. herbicola
Phylogram
Cladogram
phylogeny.fr – "One-Click"
25
Genome annotation errors

UniProt: E. coli Cell division protein ZapB
MQFRRGMTMSLEVFEKLEAKVQQAIDTITL…
3
17

(204)
(166)
0
(2)
22
(371) E. coli strains
26
Genome annotation errors

UniProt: E. coli Cell division protein ZapB
MQFRRGMTMSLEVFEKLEAKVQQAIDTITL…
3
17
0


(204)
(166)
(2)
22 (371) E. coli strains
Need ±1500 Da precursor tolerance…
27
Conclusions

Protein identification for unsequenced organisms.

Identification and localization for sequence
mutations and post-translational modifications.

Extraction of confidently established sequence
suitable for phylogenetic analysis.

Genome annotation correction.

New paradigm for phylogenetic analysis?
28
Acknowledgements

Dr. Catherine Fenselau



Dr. Yan Wang


University of Maryland Proteomics Core
Dr. Art Delcher


Avantika Dhabaria, Joe Cannon*, Colin Wynne*
University of Maryland Biochemistry
University of Maryland CBCB
Funding: NIH/NCI
29
Download