predicting peptide charge

advertisement
Example of regression by RBF-ANN
Prediction of charge on peptides after
electron-spray ionization in mass spectrometry
What are the best attributes to predict charge?
Review of molecular biology
DNA sequence determines protein sequence
What are amino
acids?
N-terminus
C-terminus
Side chain
Amino acids
with different
side chains
have
different
names
Glycine
gly
G
alanine
ala
A
valine
val
V
leucine
leu
L
isoleucine
ile
I
methionine
met
M
porline
pro
P
phenylalanine
phe
F
tryptophan
trp
W
serine
ser
S
cysteine
cys
C
threonine
thr
T
glutamine
gln
Q
asparagine
asn
N
histidine
his
H
tyrosine
tyr
Y
glutamic acid
glu
E
aspartic acid
asp
D
lysine
lys
K
arginine
arg
R
chemical
properties
of amino acids
More properties
of amino acids
code
mass
pi
pK1
pK2
charge
Hydrop
hobic?
Polar
?
A
89.09404
6.01
2.35
9.87
0
T
F
R
174.20274
10.76
1.82
8.99
+
F
F
N
132.1190
5.41
2.14
8.72
0
F
T
D
133.10384
2.85
1.99
9.9
-
F
F
C
121.15404
5.05
1.92
10.7
0
F
T
E
146.14594
3.15
2.1
9.47
-
F
F
Q
146.14594
5.65
2.17
9.13
0
F
T
G
75.06714
6.06
2.35
9.78
0
T
F
H
155.15634
7.6
1.8
9.33
+
F
T
I
131.17464
6.05
2.32
9.76
0
T
F
L
131.17464
6.01
2.33
9.74
0
T
F
K
146.18934
9.6
2.16
9.06
+
F
F
M
149.20784
5.74
2.13
9.28
0
T
F
F
165.1918
5.49
2.2
9.31
0
T
F
P
115.13194
6.3
1.95
10.64
0
T
F
S
105.09344
5.68
2..19
9.21
0
F
T
T
119.12034
5.6
2.09
9.1
0
F
T
W
204.22844
5.89
2.46
9.41
0
T
T
Y
181.19124
5.64
2.2
9.21
0
F
T
V
117.14784
6.0
2.39
9.74
0
T
F
Amino Acids Polymerize to Form Proteins (polypeptides)
formation of peptide bond
H 0
H 0
-N-C-C-N-C-C-NHR
HR
H
Proteases: enzymes that cut proteins at the peptide bond
H 0
H 0
-N-C-C-N-C-C-NHR
HR
H
Most proteases have cleavage specificity.
Trypsin cleaves mainly at arginine (R) and lysine (K)
Digestion of a protein with trypsin produces peptides of various length
Analysis of digestion mixture yields information about proteins in sample
Liquid chromatography coupled to mass spectrometry
LC column
Digested protein mixture
peptides are retained
for differing times on
the LC column
Electro-spray
ionization
Mass spectrometer
Peptides may have multiple charges.
Charges in dataset are averages from several runs
First 4 of ~ 23,000 data pairs are
Sequence
AAAAAAPDDVAAQLVVADLDLVGGHVEDAFAR
AAAAADLANR
AAAAAQASASAAAK
AAAAAVAQGGPIEDAER
Charge
2.8
2
1.714286
2
Can peptide sequence be an input?
What inputs can we calculate from the input sequence?
Some suggestions for inputs from properties of amino acids
Length of peptide
Mass of peptide
First amino acid
Last amino acid
Factions of amino acids of each type
Fractions of hydrophobic, polar, and charged residues
Net formal charge
Average isoelectric point
Average disassociation constant
MLP with default options.
600 examples reserved for test set
Poor results
Other regression options
Download