What is Yin Yang site?

advertisement
Improvement of Yin Yang site prediction
by incorporating the interplay between
phosphorylation and O-GlcNAcylation
Chao Ji, Yinxing Guo, Quan Zhang
What is Yin Yang site?
• Yin Yang sites: The reciprocal and dynamic change
between O-GlcNAcylation and phosphorylation at
the same or proximal Ser/Thr
• Mechanisms that might govern Yin Yang
regulation
– Direct competition at a single site
– Competition via steric hindrance by reciprocal
modification at proximal sites
– Affecting the enzymatic efficiency of each other
Current Yin Yang site prediction
• Predict O-GlcNAcylation and phosphorylation
separately
– Netpohs for phosphorylation prediction
– yinOyang for glycosylation prediction
http://www.cbs.dtu.dk/services/YinOYang/output.php
What is investigated in our project
• Collect reliable Yinyang sites from MS data.
– sites shown interplay between Phosphorylation
and glycosylation.
• Is it possible to predict Yinyang sites directly?
– Phosphorylation sites are first predicted by
Phosphorylation predictor.
– For the sites which are predicted as
Phosphorylation sites, the probability of being a
Yinyang site is predicted directly.
Dynamics between O-GlcNAcylation
and phosphorylation
• Samples treated with OA, P/N and both
• Group 3 vs Group1: how phosphorylation change
is response to globally elevated O-GlcNAcylation
• Okadaic acid(OA): ser/thr-specific phosphatase
inhibitor
• PUGNAc and NAG-thiazoline(P/N): nonspecific OGlcNAcase inhibitor
Measurement the dynamics
• Relative Occupancy Ratio(ROR)
• If the phosphorylation level drops
significantly after P/N treatment
for a specific site, the site is a Yin
Yang site.
Overview of the MS data
• sites/proteins detected in MS data
– 342 proteins with 573 sites from MOUSE and RAT.
– 103 sites with ROR decreased significantly after
treated with O-GlcNAcase inhibitor, 470
otherwise.
Collecting training data
• Netphos prediction of 573 sites identified in
MS data.
ROR decreased
significantly
Otherwise
Total
MS
103
470
573
Predicted as
phosphorylation
site
83
326
409
Positive examples, including
68 Ser sites and 15 Thr sites.
Negative examples, including
273 Ser sites and 54 Thr sites.
Evaluation of Yinyang predictor
False positive
False negative
Error
Ser
0.168
0.809
0.297
Thr
0.204
0.800
0.333
Total
0.174
0.807
0.303
Sequence context Sequence context
surrounding Ser sites
Positive Ser sites
Negative Ser sites
Sequence context surrounding Ser
sites
Positive Thr sites
Negative Thr sites
Profile Model
• Select a sequence window centered at the
phosphorylation for each instance in the
training set.
• Models are built on the sequence windows for
positive set and negative set, respectively.
• For a input sequence window, do the model
comparison for the classification.
• 4-fold cross validation is used to evaluate the
model.
Profile Model
• Used the different window size from 3-31 bp
• The minimum error rate is 0.32 when window
size is 9. The max error rate is 0.344 with 29window size. Mean is 0.3350667. However,
there is no obviously difference between
different window size.
Profile Model
• Models are built for Ser sites and Thr sites separately.
• For Ser set, minimum error 0.34 is achieve when
window size = 29. For Thr set, minimum error 0.37 is
achieved when window size = 27.
Artificial Neural Network?
• There’s no simple pattern for Yinyang sites,
therefore the Yinyang sites and nonYinyang
sites are not easily separable.
• ANN is capable of classifying highly complex
sequence pattern where the correlations
between positions are important.
• If there are fuzzy patterns in Yinyang sites, are
they recognizable by ANN?
Structure of ANN
• Standard feed-forward artificial neural
network with sigmoidal nodes and one layer
of hidden unit
• The input layer has n input groups for the
sequence window with length n. Each group
has 21 units, each of which represents 1 of the
amino acids (or spacer).
• The output is the probability of Yinyang site.
Training and testing of ANN
• 4-fold across validation is used to evaluate the
performance of ANN.
• During the training, 75% of the training data is
used for training and 25% of the data is used
to test the performance of the network.
• The network is tested over window size from 3
to 31 and 3 or 5 hidden units
Result of ANN
False positive and False negative rate
Further work
• More positive instances are needed for the
network to learn the patterns.
• Average the results from different networks.
• Incorporate the surface accessibility
– O-glycan is linked post-translationally to Ser or Thr
of a fully folded and assembled protein, and is
thus surface exposed on the protein.
– The sites predicted to be on surface are more
likely to be Yinyang sites, thus the threshold for
those sites could be lowered down.
Download