proteomics_gpsr - Institute of Microbial Technology

advertisement
Dr G P S Raghava
Institute of Microbial Technology,
Chandigarh, India
Bioinformatics
Informatics
|
Drug Informatics |
Chemoinformatics | Vaccine
Email: raghava@imtech.res.in
http://crdd.osdd.net/
http://www.imtech.res.in/raghava/
Studying Central Dogma with “Omis”
Transcriptomics
Proteomics
Protein
Mature mRNA
mRNA
Genomics
DNA
Metabolomics
(all the
endogenous
metabolites
produced)
Complexity of the system
NEXT GENERATION SEQUENCING
• Sequence full genome of an organism in a few days
at a very low cost.
• Produce high throughput data in form of short reads.
Illumina
ABI’s Solid
Roche’s 454 FLX
Ion torrent
Genome assembly and annotation done
at IMTECH
• Burkholderia sp. SJ98 (Kumar et al. 2012).
• Debaryomyces hansenii MTCC 234 (Kumar et al. 2012).
• Imtechella halotolerans K1T (Kumar et al. 2012).
• Marinilabilia salmonicolor JCM 21150T (Kumar et al. 2012).
• Rhodococcus imtechensis sp. RKJ300 (Vikram et al. 2012).
• Rhodosporidium toruloides MTCC 457 (Kumar et al. 2012).
RNA sequencing
RNA-Seq is a recently developed approach to transcriptome profiling that uses deepsequencing technologies to measure levels of transcripts and their isoforms.
Samples of interest
Condition
(colon tumor)
Isolate RNAs
Generate cDNA, fragment,
size select, add linkers
Sequence ends
Map to genome,
transcriptome, and
predicted exon
junctions
Downstream analysis
100s of millions of paired reads
10s of billions bases of sequence
Methods for Protein Analysis
Gel electrophoresis,
northern/western
blot
(fluorescence/radio
active label)
X-ray
crystallography
Protein microarrays
Mass Spectrometry
Protein arrays
High throughput analysis of
hundreds of thousands of
proteins.
Proteins are immobilized on
glass chip.
Various probes (protein, lipids,
DNA, peptides, etc) are used.
Cons
Require a
priori
knowledge
of the
proteins of
interest.
Availability
of suitable
antibodies.
Measure
only a small
fraction of
the
proteome
Mass Spectrometry
Find a way to “charge” an atom or molecule (ionization).
Place charged atom or molecule in a magnetic field or electric field and measure its speed or radius of
curvature relative to its mass-to-charge ratio (mass analyzer).
Detect ions using microchannel plate or photomultiplier tube(Detection).
Sample
Ion source:
makes ions
Mass
analyzer:
separates
ions
Mass spectrum
Presents
information
Protein Identification by MS
Spot removed
Library
from gel
Theoretical
spectra built
Fragmented
using trypsin
Spectrum of
fragments generated
MATCH
Experimental
spectra
Artificially
trypsinated
Database of
sequences
(i.e. SwissProt)
Instrumentation
High Vacuum System
Inlet
•
•
•
HPLC
Flow
injection
Sample
plate
•
•
Ion
Source
Mass
Analyzer
MALDI
ESI
 Time of flight
(TOF)
 Quadrupole
 Ion Trap
 Magnetic Sector
 FTMS
Detector
•
•
•
Data
System
Microchannel Plate
Electron Multiplier
Hybrid with
photomultiplier
Ion Sources make ions from sample molecules
(Ions are easier to detect than neutral molecules.)
MALDI
Electrospray ionization
Sample plate
Sample Inlet Nozzle
Pressure = 1 atm
Inner tube diam. = 100 um(Lower Voltage)
Partial
vacuum
Laser
MH+
N2
Sample in solution
N2 gas
+
+ ++
++
++++
++
+
+ ++
++
+ + + +
+ +
+
+
+
+
+
+
+
+++
+++
+++
+++ +
++
MH2+
MH+
MH3+
High voltage applied
to metal sheath (~4 kV)
Charged droplets
Grid (0 V)
+/- 20 kV
Mass analyzers separate ions based on their
mass-to-charge ratio (m/z)
Operate under high vacuum (keeps ions from bumping into gas
molecules)
Actually measure mass-to-charge ratio of ions (m/z)
Key specifications are resolution, mass measurement accuracy, and
sensitivity.
Several kinds exist: for bioanalysis, quadrupole, time-of-flight and
ion traps are most used.
Tandem Mass Spectrometry
S#: 1707 RT: 54.44 AV: 1 NL: 2.41E7
F: + c Full ms [ 300.00 - 2000.00]
RT: 0.01 - 80.02
100
90
80
1409
LC
NL:
1.52E8
1991
1615
2149
1621
1411
2147
1387
60
1593
1995
1655
1435
50
1987
1445
1661
40
2001 2177
1937
1779
30
2155
2205
2135
2017
1095
80
75
70
65
60
55
801.0
50
45
40
35
638.9
25
2207
1105
85
30
1307 1313
20
MS
90
Relative Abundance
70
95
Base Peak F: +
c Full ms [
300.00 2000.00]
1611
Relative Abundance
638.0
100
1389
872.3
1275.3
15
1707
687.6
10
2331
10
1173.8
20
2329
944.7
783.3
1212.0
1048.3
1413.9
1617.7
1400
1600
1742.1
1884.5
5
0
200
400
600
800
1000
m/z
0
5
10
15
20
25
30
35
40 45
Time (min)
MS-1
50
55
60
65
70
collision
cell
75
1200
1800
2000
80
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100
95
MS-2
687.3
MS/MS
90
85
588.1
80
75
70
Ion
Source
Relative Abundance
65
60
55
851.4
425.0
50
45
949.4
40
326.0
35
524.9
30
25
Parent
Ions
Fragment
Ions
20
589.2
226.9
1048.6
1049.6
397.1
489.1
15
10
629.0
5
0
200
400
600
800
1000
m/z
1200
1400
1600
1800
2000
What’s in a Mass Spectrum?
Fragment Ions
Derived from
molecular ion
or higher
weight
fragments
“molecular ion”
In molecular ions,
adduct ions,
[M+reagent gas]+
High
mass
Mass, as m/z. Z is the charge, and for doubly charged ions (often seen in
macromolecules), masses show up at half their proper value
Peptide Fragmentation
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
Ri-1
Ri
Ri+1
N-terminus
C-terminus
AA residuei-1 AA residuei AA residuei+1
Collision Induced Dissociation
H+
H...-HN-CH-CO . . . NH-CH-CO-NH-CH-CO-…OH
Ri-1
Prefix Fragment
Ri
Ri+1
Suffix Fragment
• Peptides tend to fragment along the backbone.
• Fragments can also loose neutral chemical groups like NH3
and H2O.
B ions and Y ions
http://www.molgen.mpg.de/101151/Proteomics
Mass spectra searching techniques
Commercial Software
SEQUEST (Yates et al., 1995)
MASCOT (Perkins, Pappin, Creasy, & Cottrell, 1999)
Open Database search tools
Myrimatch
X!Tandem
MSGF
OMSSA
More accurate than Mascot and sequest (Kim & Pevzner, 2014)
Target-Decoy Search Strategy for Mass
Spectrometry-Based Proteomics
• incorrect “decoy” sequences added to the search space will
correspond with incorrect search results that might otherwise be
deemed to be correct.
PSM in target and decoy
450
400
350
PSM matches
300
250
200
Series1, 150
150
100
50
Mass spectrum
0
target
Target and reversed Decoy
database
decoy
Proportion of matches in
decoy database represent
false matches
Applications
• Analyzing Protein Modifications
• Finding all modifications on a single protein
• Proteome wide scanning of modifications
• Protein Profiling
•
Generate large scale proteome maps
•
Annotate and correct genomic sequences
•
Analyze protein expression as a function of cellular state
• Detection of amino acid substitutions
• Protein sample identification/confirmation
• Protein sample purity determination
Major Proteomics Repositories
Repositories
PeptideAtlas
PRIDE
RAW
DATA
Result'
Raw'Data'
Metadata'
Proteome
Central
PASSEL
Other DBs
result'
raw'
JOURNALS
Other DBs
GPMDB
UniProt
NextProt
Major Challenge
iden fied
uniden fied
Large number of
unidentified spectra
May be peptides are missing in the database searched…..
Are all the reference databases complete ???
Proteogenomics
•
•
•
•
Term coined in the literature in 2004.
Genomic for generating customized databases.
Identify novel peptides.
Disease biomarkes based on novel mutation
intensity
Proteomics
Non-Tumor
Sample
Searched
against
intensity
mass/charge
Tumor
Sample
Refseq
Uniprot
Aberrant proteins
Variant proteins
Fusion proteins
mass/charge
NonTumor
Sample
intensity
Proteogenomics
Genome sequencing
Germline variants
Tumor
Sample
intensity
mass/charge
RNA-Seq
mass/charge
Alternative
splicing,
somatic variants,
expression
Tumor Specific
Protein DB
TYPES OF PEPTIDES IDENTIFIED IN PROTEOGENOMICS
Novel Peptides
mapping on Non
Coding Region
5’UTR
Novel Peptides
mapping on Non
Coding Region
3’UTR
Novel Junctions
Intergenic peptides
Alternating Reading
Frame
Methods of generation of customized databases
6 Frame Translation
of
Reference Database
Ab initio gene
prediction.
RNA-seq data
Whole genome
Sequencing
Other
Databases
• Perl or python scripts
• Perl and python scripts
• CustomProdb, Galaxy-P system,
sapFinder
• Peppy
• OMIM, neXtProt, Ecgene, ChimerDB,
COSMIC
What does proteogenomics offer?
Proteomics
Genomics
Transcriptomics
Novel N termini
Novel signal peptide
Novel Exons
Novel Junctions
Variant peptides
Novel ORFs
Novel signal peptide cleavage sites
Open source Web Services
Chemoinformatics and Pharmacoinformatics
Molecular Interactions
Biological Databases
Genome Annotations
Immunoinformatics or Vaccine Informatics
Functional Annotation of Proteins
Proteins Structure Prediction
GPSR: A Resource for Genomics Proteomics and
Systems Biology
• A journey from simple computer programs to
drug/vaccine informatics
• Limitations of existing web services
• History repeats (Web to Standalone)
• Graphics vs command mode
• General purpose programs
• Small programs as building unit
• Integration of methods in GPSR
Types of Prediction Methods
TP
Senstivity=
´100
TP+FN
TN
Specifity=
´100
TN+FP
Accuracy=
TP + TN
´100
TP + TN+FP + FN
(TP ´ TN) - (FP ´ FN)
MCC=
´100
(TP+FP)(TP+FN)(TN+FP)(TN+FN)
Live Server
Live CD
Pkg Repository
Installation
Webserver
Standalone
Galaxy platform
All in ONE
Osddlinux desktop
is ready for use.
Password for sudo : osddlinux root : osddlinux
Osddlinux installation on system hard drive
Operating System for Drug Discovery
Download