by Gina

advertisement
Solution to Prediction Problem
Gina Cannarozzi
Here is my prediction of the secondary structure for the family of Pathogenesis
related proteins, TPX proteins and venom allergens.
The following rules can be found in the tutorial but are repeated here.
Parse rules (step 2):
2i gap
2ii apc proline
2iii distributed parse
2iv apc glycine
2v combination parse
2vi distributed combination parse
2vii string parse
Interior algorithms (step 4):
4i interior algorithm 1
4ii interior algorithm 2
4iii interior algorithm 3
4iv interior algorithm 4
4v interior algorithm 5
Surface algorithms (step 5)
5i surface algorithm 1
5ii surface algorithm 2
5iii surface algorithm 3
To start, look at the section between 24 and 40. The gaps at 23 (rule 2i) and 41
(rule 2i) parse this section. I see no parses such as conserved or distributed
prolines. The presence of the residues DNG and in one sequence at positions
27-29 is a secondary parse by rule 2vii. We can choose whether or not to use
it.
I will use a 5 state prediction of surface and interior. The five states are: strong
interior (I), weak interior (i), don’t know (.), weak surface (s), strong surface (S).
Predict the segment from 24-40
My predictions of surface and interior from residues 24-40 are:
1 24 I all hydrophobic 4ii
2 25 . don’t know
3 26 I conserved hydrophobic 4i
4 27 s 5ii
5 28 S variable with hydrophilic 5i
6 29 S variable with hydrophilic 5i
7 30 I all hydrophobic 4ii
8 31 I hydrophobic with CHQST
9 32 . don’t know
10 33 . don’t know
11 34 I conserved hydrophobic 4iii
12 35 I 4iii
13 36 s varying with hydrophilic 5ii
14 37 I hydrophobic split 4iii
15 38 I hydrophobic 4iv
16 39 s varying hydrophilic 5ii
17 40 S varying hydrophilic 5i
Now I can put this string (I.IsSSII..IIsIIsS) on a helical wheel and see if it forms
a helix. I use:
http://www.site.uottawa.ca/~turcotte/resources/HelixWheel/
In the pull down menu, you can choose to color surface and
interior residues red and blue, respectively.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
This is clearly a helix. As it turns out from the structure, alignment numbers 25
and 26 are a short beta strand. We may have found this if we had used the
secondary parse.
The next segment to predict is alignment numbers 45-52. The parse at 53 is a
gap.
45 . don’t know; conserved C could be a disulfide bond
46 S varying in two groups with hydrophilics 5i
47 I all hydrophobic 4ii
48 S varying with hydrophilics
49 I 4iii
50 don’t know. 5iii
51 s 5i
52 s 5i
Use the string .SISI.ss
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
One SISI pattern is enough to call it a strand.
Predict the segment from 56-61. Gaps provide the parses here. The
conserved G at position 55 is a parse by rule 2iv.
56 S 5iii
57 S 5iii
58 I 4iv
59 I 4v
60 s
61 i
This is too short to be a helix. I predict a strand because there is one I-S-I
pattern, it has a lot of interior residues and the strength of the predictions was
high compared to the last segment.
The section between 62 and 70 are parses because of the gaps.
Predict Segment 71-88. Note that residues 86-88 have more P’s G’s and
DSN. These are secondary parses and there are a lot of them so I will only
predict from 70-85.
My SI predictions are:
1 71 S varying with hydrophilic 5iii
2 72 s 5i
3 73 I hydrophobic 4v
4 74 I hydrophobic 4iv
5 75 s hydrophilic varying 5i
6 76 s 5i
7 77 . I 4i
8 78 .
9 79 s 5i
10 80 S 5iii
11 81 S 5i
12 82 S 5i
13 83 s 5i
14 84 I conserved hydrophobic
15 85 .
16 86.
The string is ssiissi.sssssi..
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
This segment is also a helix. The slight overlap at the 4th and 11th members of
the segment is normal. The helices may be slightly rotated from one family
member to the next.
88 – 97 is full of gaps and will be considered a parse. Two positions don’t have
gaps but we cant predict anything here other than parse.
Predict the segment from 97 to 117.
This is a long segment. Are there any possible parses to break it up? The
almost conserved G at 111 is a possibility. It is strengthened by the presence
of more Gs in 113. Keep this in mind. Another possibility is the NS at position
106.
The alignment positions 98 (H), 99 (Y), 100 (T), 101(Q), 103 (V), and 104 (W)
are all completely conserved. If you compare this conservation with the rest of
the alignment, you can see that the conservation here is much higher. This
indicates that this might be the active site of the protein meaning that these
residues participate in the chemical reaction that is being catalyzed by this
protein. Some amino acids are more frequently in active sites because they
have side chains that can participate in chemical reactions. These are:
CDEHKNQRSTY. So probably 98-101 is an active site. Around active sites it
is difficult to predict the secondary structure because the conservation is due to
the active site and not for the reasons that indicate which are surface and
interior residues. So I will predict from 102 to 117.
102 I hydrophobic
103 I hydrophobic
104 I hydrophobic conserved
105 S hydrophilic 5ii
106 s 5iii
107 .
108 .
109 S hydrophilic variable 5i
110 I hydrophobic 4ii
111 I 2iv (possible parse)
112 don’t know; impossible to tell; could be C in disulfide bond
113 I (variable in n subgroups with no DEKRENDCHQST ; see slides)
114 .
115 I
116 S
117 . don’t know
IIIss..Sii.I.IS.
Looking at the possibilities with both secondary parses, I have the following:
IIIS and S..Sii.I.IS. With the NS parse
IIIss..Sii and .i.IS. With the G.G parse
On a helical wheel with the G.G parse I get
TIFF
are
QuickTime™
needed
(Uncompressed)
toand
seeathis decompressor
picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
are
TIFF
QuickTime™
needed
(Uncompressed)
toand
seeathis decompressor
picture.
TIFF
are
QuickTime™
needed
(Uncompressed)
toand
seeathis decompressor
picture.
which could be a helix but is not a strong signal. The i.iS. at alignment
positions 113-116 could be a beta strand showing an alternating pattern of
surface and interior residues. From the crystal structure alignment positions 98103 are a helix and 109-117 are a strand. The active site residues have made
it difficult to predict the helix in the right place.
Using the NS parse, I have IIIS which I would predict to be a buried strand
based on length and strength of interior and surface characteristics. The
other segment looks like this on a helical wheel which is could also be a helix.
Areas around active sites are hard to predict because the reasons why certain
amino acids are accepted at these positions are different than those we use to
predict.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Predict the segment from 123 to 129.
I get III.SI. which is too short to be a helix and very structured so I predict a
strand here.
Solution:
The solution to this problem can be found in the literature. In the Journal of
Molecular Biology (1997) 266, 576-593 is the structure of protein a in the
alignment. Protein a is the pathogenesis-related protein P14a. The NMR
structure can be found in the Protein Data Bank with entry number 1CFE.
Here is the sequence.
>1CFE:_|PDBID|CHAIN|SEQUENCEQNSPQDYLAVHNDARAQVGVGPM
SWDANLASRAQNYANSRAGDCNLIHSGAGENLAKGGGDFTGRAAVQLWVSE
RPSYNY
ATNQCVGGKKCRHYTQVVWRNSVRLGCGRARCNNGWWFISCNYDPVGNWIG
QRPY
The structure can be found in Figure 5a from this paper. Notice that
the numbering is the paper is the numbering in the sequence of the
protein discussed in the paper, not our alignment. Since adding gaps
changes the sequence numbering, you should add to your alignment
the sequence numbers of this. The first M in sequence a in our
alignment is the 23rd amino acid in the sequence of that protein. So
the numbers in the paper can be found in our alignment by counting
each amino acid (not gap) from 23 in sequence one. Eg M is 23, S is
24, W is 25, D is 26 etc. The A in front of the first gap is 51 in the
paper and the G after the first gap at alignment position 55 is
sequence position 52 in the paper.
This paper also tries to identify active site residues by looking for
“highly conserved solvent-accessible residues without an obvious role
in the architecture of the protein.” They identify His48, Ser49 and
His93 as being completely conserved (in our alignment Ser49 is not
completely conserved) and in close proximity in three dimensional
space, making them potential active site residues.
Also interesting to notice is that if you look at Figure 5a in the paper
and at figure 6a you can see that the strand D is buried in the middle
of the structure. If you look at the prediction for strand D which is
from sequence position 117-124 which is alignment position 123-130,
then you can see that the prediction is largely interior which tells you
that this strand is buried. Strand B (sequence number 53-58,
alignment number 56-61) is also buried. The strand that I missed
from sequence number 104-111 (alignment number
Download