SupplementaryMaterial

advertisement
Supplementary Material for the study Prediction of Binding Sites of Peptide
Recognition Domains: An Application on Grb2 and SAP SH2 Domains by
William A. McLaughlin, Tingjun Hou, and Wei Wang
The supplementary material consists of the following files:
1. this file: lists the files available in the supplementary material and provides a summary
the comparison between the analyses done using phosphorylated and unphosphorylated
peptides
2. KnownBindingPeptideSequences.txt: lists the peptide sequences known to bind the
Grb2 and SAP SH2 domains used in the study
3. Grb2_KnownOnlyHMM.hmm: the hidden Markov model for searching for potential
binding sites of the Grb2 SH2 domain which was created using the sequences of the
known binding peptides
4. SAP_KnownOnlyHMM.hmm: the hidden Markov model for searching potential for
binding sites of the SAP SH2 domain created using the sequences of the known binding
peptides
5. Grb2_BindingClusterSequences_unphos.txt: the peptide sequences in the binding
cluster generated using sequence and energy characteristics with peptides in the
unphosphorylated state
6. Grb2_BindingClusterHMM_unphos.hmm: the hidden Markov model created using
sequence in the binding cluster generated with peptides in the unphosphorylated state. It
is the primary HMM to be used to search for new binding candidates of the Grb2 SH2
domain.
7. SAP_BindingClusterSequences_unphos.txt: the peptide sequences in the binding
cluster generated using sequence and energy characteristics for clustering with peptides
in the unphosphorylated state
8. SAP_BindingClusterHMM_unphos.hmm: the hidden Markov model created with the
binding cluster created with peptides in the unphosphorylated state (the primary HMM to
be used to search for new binding candidates of the SAP SH2 domain)
9. Grb2_SearchResults_unphos.txt: the top 100 peptide sequences retrieved by database
scan using the binding cluster HMM or Grb2_BindingClusterHMM_unphos.hmm across
the set of human protein tyrosine positions in SWISS-PROT. The corresponding ranks of
each retrieved tyrosine site are given for a comparable search using the SCANSITE tool
and the known only HMM or Grb2_KnownOnlyHMM.hmm.
10. Grb2_SearchResults_phos.txt: similar list as given in
Grb2_SearchResults_unphos.txt but was created by analyzing peptides in the
phosphorylated state. The list is given for comparison purposes and is not considered a
viable list of candidates.
11. SAP_SearchResults_unphos.txt: the top 100 hits retrieved by database scan using the
SAP binding cluster HMM or SAP_BindingClusterHMM_unphos.hmm across the set of
human protein sequences in SWISS-PROT. The corresponding ranks of each retrieved
site are given for a comparable search using the known only HMM
(SAP_KnownOnlyHMM.hmm).
12. SAP_SearchResults_phos.txt: similar list as given in
SAP_SearchResults_unphos.txt but was created by analyzing peptides in the
phosphorylated state. The list is given for comparison purposes and is not considered a
viable list of candidates.
13. Figure_S1.tif: plots showing a comparison between the distribution of known binding
peptides and candidate peptides for peptides in the phosphorylated and unphosphorylated
state. Details of the comparison are below.
14. Table S1.doc: clustering results for the peptides in the phosphorylated state. Details
are given below.
15. Figure_S2.tif: search results presented for the known only HMM, binding cluster
HMM, and control cluster HMM for the Grb2 and SAP domain analyzes using peptides
in the phosphorylated state. Details of the comparison are below.
A summary of the calculations done using phosphorylated peptides
The calculations for the peptides in the phosphorylated state as compared to those
in the unphosphoryated state are being reported here. For the SAP dataset of peptides
there were 11 known binding sequences (Table S1) compared to 1799 candidate peptide
sequences. The mean energies of the known binding peptides were compared to that of
candidate peptides using a t-test. The p-value of associated with the t-test was 6.41 * 10-5
for peptides in the phosphorylated state and 2.31 * 10-9 for the peptides with the
phosphate removed. In Figure S1 (Panels A and B), the distribution of the known
binding peptides and 100 of the candidate peptides are shown. For the Grb2 dataset of
peptides there were 15 known binding sequences (Table S1) and 1400 peptides in the
candidate set of peptides. The p-value of associated with the t-test comparing the means
energies was 6.41 * 10-5 for phosphorylated peptides and 2.31 * 10-9 for the calculations
done using unphosphorylated peptides. In Figure S1 (Panels C and D), the distribution of
the known binding peptides and peptides randomly drawn from the background set of
sequences are shown. Note that the binding peptides were better separated for peptides in
the unphosphorylated state (Panel D) as compared to the phosphorylated state (Panel C).
Clustering was done on the peptides using sequence only, energy only, and
sequence and energy simultaneously.
Clustering results for the peptides in
phosphorylated state are given in Table S1. Clustering done using sequence and energy
together produced the highest overlap score of the known binders in a given cluster for
both datasets. But the overlap score was less that achieved using unphosphorylated
peptides (Compare to Table 1 of the main text).
For the SAP domain dataset, there were four clusters generated using sequence
and energy. The second cluster contained the majority of the known binding peptides
and was assigned as the binding cluster. For the Grb2 dataset, there were five clusters
generated using both sequence and energy. Cluster five was assigned as the binding
cluster. The sequences in these binding clusters were utilized to create a hidden Markov
models. Search results are plotted in Figure S3 and are comparable to Figure 3 of the
main text. The results for the SAP domain analysis are shown in panel A and Grb2
shown in panel B. The p-value associated with the separation of the means log percentile
ranks of the known binding peptides using the control HMM and the binding cluster
HMM for the SAP analysis was 2 * 10-2, indicating no significant different between the
control HMM and the binding cluster HMM at a confidence level of p=0.01. For the
Grb2 analysis the p-value for the same comparison was 1.78 * 10-4, which is less
significant than that achieved for unphosphorylated peptide analysis of 1.62 * 10-4.
The top ranked peptides found by the binding cluster HMM created using
peptides in the phosphorylated state is given in the file SAP_SearchResults_phos.txt for
the SAP analysis. For the Grb2 analysis the top ranked peptides are given in the
Grb2_SearchResults_phos.txt. They are presented for comparison to the lists generated
using the unphosphorylated peptides.
Download