Try these prediction tools with the following sequence (Sequence 2, Lecture 4 from the website): MLDQQTINIIKATVPVLKEHGVTITTTFYKNLFAKHPEVRPLFDMGRQESLEQPKALAMT VLAAAQNIENLPAILPAVKKIAVKHCQAGVAAAHYPIVGQELLGAIKEVLGDAATDDILD AWGKAYGVIADVFIQVEADLYAQAVE Try to think about the following questions: 1. 2. 3. 4. 5. What sort of secondary structure does this protein have? Is it likely to cross the cell membrane? What sort of tertiary structure is predicted? Are there any proteins of known structure with similar sequence? If so, what functions do they have? 1. Predict secondary structure a. PSIPred results (version 2.3): Conf: Confidence (0=low, 9=high) Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence Conf: 998889999999999997348899999999999858357612352353446789999999 Pred: CCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCHHHHHHCCCCCHHHHHHHHHHHHH AA: MLDQQTINIIKATVPVLKEHGVTITTTFYKNLFAKHPEVRPLFDMGRQESLEQPKALAMT 10 20 30 40 50 60 Conf: 999999741478899999999999988089766789999999999998626136989999 Pred: HHHHHHHHHCHHHHHHHHHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHCCCCCCHHHHH AA: VLAAAQNIENLPAILPAVKKIAVKHCQAGVAAAHYPIVGQELLGAIKEVLGDAATDDILD 70 80 90 100 110 120 Conf: 99999999999999999999976329 Pred: HHHHHHHHHHHHHHHHHHHHHHHHCC AA: AWGKAYGVIADVFIQVEADLYAQAVE 130 140 Helical regions: 4-35 37-42 48-69 71-87 92-109 116-144 The graphical version can be seen below: b. NNPredict results: Secondary structure prediction (H = helix, E = strand, - = no prediction): ------EEEE-----HH-----EEEEHHHHH----------------------HHHHHHH HHHHHHHHH----HHHHHHHHHHHHHHHHHH--------HHHHHHHHHHH-----HHHHH HHH-HHEEEHEHEHHHHHHHHHHH-- Helical regions: 54-69 74-91 100-110 116-123 134-144 Conclusions: This protein appears to be mainly made up from alpha helices. The indication of confidence is high in the results from PSIPred. The results from PSIPred and NNPredict broadly agree, although the results from PSIPred are probably slightly more reliable. 2. Predict transmembrane regions a. TMPred results: The sequence positions in brackets denominate the core region. Only scores above 500 are considered significant. Inside to outside helices : 1 found from to score center 56 ( 58) 74 ( 74) 175 66 Outside to inside helices : 0 found b. TMHMM results: # Sequence Length: 146 # Sequence Number of predicted TMHs: 0 If the whole sequence is labelled as inside or outside, the prediction is that it contains no membrane helices. It is probably not wise to interpret it as a prediction of location. The prediction gives the most probable location and orientation of transmembrane helices in the sequence. c. HMMTOP results: Length: 146 N-terminus: OUT Number of transmembrane helices: 0 Conclusions: Since all three programs agree, and fail to predict any transmembrane regions, there is no evidence to suggest that this protein is located within the cell membrane. 3. Predict tertiary structure a. SwissModel SwissModel has predicted the following structure for the protein: This model is based on the similarity of the query sequence to the following sequences that are in the PDB database: Sequence identity of templates with target: 4vhbA.pdb: 98.85 % identity 2vhbB.pdb: 100 % identity 2vhbA.pdb: 98.85 % identity 4vhbB.pdb: 100 % identity 3vhbA.pdb: 100 % identity 1vhbB.pdb: 100 % identity 3vhbB.pdb: 100 % identity 1vhbA.pdb: 100 % identity 1cqxA.pdb: 50.7 % identity 1cqxB.pdb: 50.7 % identity 1gvhA.pdb: 50.85 % identity 1oj6B.pdb: 26.4 % identity 1oj6C.pdb: 26.4 % identity 1oj6A.pdb: 26.4 % identity 1oj6D.pdb: 26.4 % identity 1q1fA.pdb: 25.3 % identity 1w92A.pdb: 25.3 % identity Searching PDB with the ID numbers of the templates used in the modelling reveals the following results: 1vhb: Bacterial Dimeric Hemoglobin From Vitreoscilla Stercoraria 2vhb: Azide Adduct Of The Bacterial Hemoglobin From Vitreoscilla Stercoraria 3vhb: Imidazole Adduct Of The Bacterial Hemoglobin From Vitreoscilla Sp. 4vhb: Thiocyanate Adduct Of The Bacterial Hemoglobin From Vitreoscilla Sp. This shows that the sequences with the highest similarity are bacterial haemoglobin. Our query sequence may therefore have a function similar to that of haemoglobin, i.e. that of oxygen transport. b. EsyPred Results “A 3D model of your protein has been built using the 3D structure 2GDM chain ' ' as template. This template shares 24.7% identities with your query sequence (using the ALIGN program) The target-template alignment is provided in attachment in the prot_29715354025928.ali file. The 3D model of your protein is provided in attachment in the prot_29715354025928.pdb file.” Searching PDB for 2GDM reveals that the model has been based on the structure of Leghemoglobin. This is also involved with oxygen transport. Despite the much lower level of identity, the model still appears to be similar to that generated by SwissModel.