Computational Approach for Predicting

advertisement
Computational Approach
for Predicting Interaction Sites of
Cytochrome and Photosystem I
W. Chen, A. Sekmen, B. Bruce, K. Nguya, P. Mishra, L. Emujakporue, K. Wehbi
Computer Science,
Tennessee State University
Biochemistry & Cellular & Molecular Biology,
University of Tennessee at Knoxville
Supported by NSF Targeted Infusion Grant (1137484) &
TN-SCORE thrust on Nanostructures for Enhancing Energy Efficiency
BICOB 2013
Outline
 Research Background
 Problem and Challenge
 Methods
Interaction Relation between Cytochrome and
Photosystem I
Prediction Algorithms
 Results and Analysis
 Summary and Future Work
Research Background
Natural Photosynthetic Process
Hydrogen is a particularly useful energy carrier for transportation. However, there are
no sources of molecular hydrogen on the planet. Thus it remains a difficult challenge to
find an efficient and environmentally sustainable way of producing, capturing, storing
highly attractive yet dilute energy source.
Natural Photosynthetic Process is not efficient and quantitative
Research Background
Artificial Photosynthetic Process
The research centers in UTK recently demonstrated that the natural process of
photosynthesis can be redirected to produce molecular hydrogen. They have
characterized and partially optimized protein-metal hybrid complexes that, when
exposed to light, generate hydrogen at a high rate and are temporally and thermally
stable. Specifically they are using mutagenesis to increase the affinity between cyt
c6 and PSI from the thermophilic cyanobacterium Thermosynechococcus elongates
Artificially redirect/engineer the proteins that can donate
and accept large number of electrons by protein
interaction to produce large quantity of energy
Problem and Challenge
 Artificial process requires to remodel the protein-protein interface to include
new residues that are introduced into the native complexes to create binding
sites similar to those found in green algae and higher plants.
 Future improvement involves further kinetic optimization of electron transfer
within photosystem I.
 The lack of a crystal structure for bound binary complex makes traditional
structural biology tools unavailable to date. There has some low resolution
structural approach such as chemical cross-linking that have been used to
investigate this interaction.
Goal of this research
Computationally predicting the interaction sites of protein pairs
(donors and accepters) that tap into photosynthetic processes to
produce efficient and inexpensive energy
Interaction Relation between
Cytochrome c6 and Photosystem I PsaF
Three type of amino acid bonding
1. Electrostatic bonding
2. Hydrogen bonding
3. Hydrophobic bonding
Interaction Relation between
Cytochrome c6 and Photosystem I PsaF
Electrostatic Bond
N  {E , D} and P  {R, H , K }
Re  {( x, y) | x  N and y  P}
C
..
C
..
H:N
.. ·
H
Lose an electron
..
C:O
.. ·
Get an electron


We ( x, y )  0.1
 

if x {E , D} & y {R, K }
if x {E , D} & y  H
if x {E,D} & y {R,K,H }
0.1α 0.1α
α α
α α
.
H:N
.. : H
H
..
Interaction Relation between
Cytochrome c6 and Photosystem I PsaF
Hydrogen Bond
Rh  {( x, y) | x {R,H,K,S,T, N,Q,W,Y }}
 if ( x,y)  Rh
Wh ( x, y)  
0 if ( x, y)  Rh
Approach:
Prediction Algorithms
1. Calculate the score of interaction for each residue subsequences
from PsaF and c6 proteins by Dynamic Programming.
2. Track back to get the k interaction sites with the k top scores.
Algorithm 1 Calculate the score using a window
if x  {E , D} & y  {R, K }

0.1 if x  {E , D} & y  H

0.2 if x  {E , D} & y  {R, K , H } & y ' {R, K }

Wew ( x, y )  
or if x  {E , D} & x' {E , D} & y  {R, K }
0.02 if x  {E , D} & y  {R, K , H } & y '  H

or if x  {E , D} & x' {E , D} & y  H

  Otherwise

c6 **********D*****
We ( D, R )  
PsaF
*******R**********
The score at ( xi , y j )
is decided by the score at ( xi 1 , y j 1 )
if i  0 or j  0
0,
S[i, j ]  
max{ S[i-1,j-1]  W (x i , y j ), 0}, Otherwise
where W ( x, y)  Wew ( x, y) or Wew ( x, y)  Wh ( x, y)
**********D*****
We ( D, H )  0.1
*******H**********
**********D*****
We ( D, Y )  0.2
*******Y*R********
**********D***** We ( D, Y )  0.02
*******Y*H********
Prediction Algorithms
Algorithm 1: Calculate the score using a window (of length 7)
We :   1,   0.22
S
1
A
2
E
3
L
4
M
5
D
6
S
7
E
8
A
9
E
0
0
0
0
0
0
0
0
0
0
1
G
0
0
0.2
0
0
0.2
0
0.2
0
0.2
2
P
0
0
0.2
0
0
0.2
0
0.2
0
0.2
3
R
0
0.2
1
0.4
0.2
1
0.4
1
0.4
1
4
F
0
0
0.4
0.78
0.18
0.4
0.78
0.6
0.78
0.6
5
K
0
0.2
1
0.6
0.98
1.18
0.6
1.78
0.8
1.78
6
Y
0
0
0.4
0.78
0.38
1.18
0.96
0.8
1.56
1
7
K
0
0.2
1
0.6
0.98
1.38
1.38
1.96
1
2.56
8
H
0
0.02
0.3
1.02
0.62
1.08
1.4
1.48
1.98
1.1
Interaction site/sequence with the score 2.56:
DSEAE
RFKYK
Prediction Algorithms
Algorithm 2: Calculate the score allowing gaps (insertion/deletions)
The score at ( xi , y j ) is decided by the score at
( xi 1 , y j 1 ) , ( xi 1 , y j ) , ( xi , y j 1 ) and weight W ( xi , y j )
0,
if i  0 or j  0

S [i, j ]  max{ S [i-1,j-1]  W ( xi ,y j ), S [i-1,j ]  g ,

S [i,j-1]  g , 0}, Otherwise

where W ( x, y)  We ( x, y) or We ( x, y)  Wh ( x, y)
Prediction Algorithms
Algorithm 2: Calculate the score allowing gaps (insertion/deletions)
We :   1,   0.22
S
1
A
2
E
3
L
4
M
5
D
6
S
7
E
A
9
E
0
0
0
0
0
0
0
0
0
0
8
1
G
0
0
0
0
0
0
0
0
0
0
2
P
0
0
0
0
0
0
0
0
0
0
3
R
0
0
1
0
0
1
0
1
0
1
4
F
0
0
0.8
0.78
0.58
0.8
0.78
0.8
0.78
0.8
5
K
0
0
1
0.80
0.60
1.58
1.38
1.78
1.58
1.78
6
Y
0
0
0.8
0.78
0.58
1.38
1.36
1.58
1.56
1.58
7
K
0
0
1
0.78
0.58
1.58
1.38
2.36
2.16
2.56
8
H
0
0
0.8
0.78
0.58
1.38
1.36
2.16
2.14
2.36
Interaction site/sequence with the score 2.36:
ELMDSE
R– FKYK
Prediction Algorithms
Speed-up the prediction by parallelization
Theoretically, the algorithms can be similarly executed
 in O(log m log n) time using O(mn / log m)
processors in CREW PRAM model by A. Apostolico
et al.’s approach, where m = min {|X|, |Y|}, n =
S 1
max{|X|, |Y|} and X and Y are the pair of protein
sequences [11];
2
 in O(1) time using m + n processors in BSR model
[12].
3
Practically, we can use a computer with multiple cores :
Step1: Divide the |X| × |Y| matrix S in to k×k blocks
…
such that each block (|X|/k × |Y|/k elements) can be
calculated in O(|X|/k × |Y|/k ) time by 1 processor.
k
Step 2: First, calculate the blocks in the first diagonal,
then the ones in the second diagonal, until the ones
in the (2k–1)th diagonal.
Time Complexity: the ith diagonal only depends on the
values in the (i–1)th diagonal. Each block on the
same diagonal can be calculated in parallel.
Therefore, the problem can be solved in
O(( 2k  1)( mn / k 2 ))  O(mn / k ) time with k processors,
where 1  k  m .
2
3
…
k
3
…
k
K+1
…
k
K+1
…
k
K+1
…
2k-2
K+1
…
2k-2 2k-1
Results and Analysis
Dataset
 Totally, 86 pairs of protein sequences from cyt c6 and PsaF are used for the
test. The datasets are given from Dr. Bruce’s Lab in UTK and each pair
belongs to the same organism and is able to have electrostatic attractions with
each other.
A pair of PsaF and c6:
PsaF:MRRLFALILAIGLWFNFAPQAQALGANLVPCKDSPAFQALAEN
ARNTTADPESGKKRFDRYSQALCGPEGYPHLIVDGRLDRAGDFLIPSI
LFLYIAGWIGWVGRAYLQAIKKESDTEQKEIQIDLGLALPIISTGFAW
PAAAIKELLSGELTAKDSEIPISPR
c6:MENVGCEENLLRLILVNLLLVIALLCNLTIIYPALAAETSNGSKIFN
ANCAACHIGGANILVEHKTLQKSGLSKYLENYEIEPIQAIINQIQNGK
SAMPAFKNKLSEQEILEVTAYIFQKAETGW
 For each pair of sequences, three interaction sites which have top three scores
and corresponding pairs of interaction subsequences are predicted.
Parameters in Weight Schemes
We :   1,   0.22
Wh :   0.1
Results and Analysis
Result
For each pair of protein sequences, the original sequences, three interaction sites
with the scores, corresponding interaction subsequences, and net charge of each
subsequence are output as follows:
Psaf:MRRLFALILAIGLWFNFAPQAQALGANLVPCKDSPAFQALAENARNTTADPES
GKKRFDRYSQALCGPEGYPHLIVDGRLDRAGDFLIPSILFLYIAGWIGWVGRAYLQ
AIKKESDTEQKEIQIDLGLALPIISTGFAWPAAAIKELLSGELTAKDSEIPISPR
c6:MENVGCEENLLRLILVNLLLVIALLCNLTIIYPALAAETSNGSKIFNANCAACHIGG
ANILVEHKTLQKSGLSKYLENYEIEPIQAIINQIQNGKSAMPAFKNKLSEQEILEVTAYI
FQKAETGW
1st interaction site information:
Interaction score: 2.76
Interaction site location and subsequence in Psaf: 54-59, u = KKRFDR
Interaction site and subsequence in c6: 106-111, v = EQEILE
Net charge:
when ph = 6.25 net charge for u = 3.00395114057246,
net charge for v = -2.98030929177886
when ph = 6.5 net charge for u = 3.00209690387591
net charge for v = -2.98889520136613
……..
Datasets and output: www.tnstate.edu/faculty/wchen/research.aspx
Results and Analysis
Comparison of the algorithm using a window and using gaps
For the simplicity, we consider the electrostatic bond only in the weight schemes.
From the results, we found that the algorithm using gaps tends to give the interaction
sites that have the same number of the positive charged and negative charged
residues. For example, for the pair of protein sequences in the last slide, the first
interaction site and the corresponding interaction residue subsequences predicted
from Algorithm 1 are
PsaF:
54-59, u = KKRFDR
cyt c6: 106-111, v = EQE ILE
from Algorithm 2 are
PsaF: 55-59, K_RFDR
cyt c6: 106-111, EQEI LE.
In the first pair of subsequences, there are four positive charged residues (KKRR) and
three negative charged residues (EEE), and in the second pair of subsequences, there
are three positive (KDR) and three negative (EEE) charged residues. Therefore, the
algorithm should be selected based on the property to be investigated.
Results and Analysis
Comparison of Laboratory and Computational Approaches
Lab approach (Mass Spectrometry)
Model of the electron donor docking sites. Shown are the sites
of the complex between PSI (green ribbons) and cyt c6 (white). In yellow are
the heme group of cyt c6 the Trp pair B627/A651, and the special chlorophyll
pair P700. The distance between the redox cofactors is 14 Å. The
Glu69 and Glu70 of cyt c6 are able to form a strong salt bridge with Lys27 and
Lys23 of PsaF, respectively. Lys20 and Lys16 of PsaF form weaker salt bridges
with Glu71 of cyt c6 and Glu613 of PsaB, respectively. Interestingly, in this
model the conserved positive charge on the northern face of cyt c6 (Arg66)
and the adjacent Asp65 can form a strong salt bridge with the pair Arg623/
Asp624 of PsaB.
Results and Analysis
Comparison of Laboratory and Computational Approaches
Lab approach (Mass Spectrometry)
Psaf:DIAGLTPCSESKAYAKLEKKELKTLEKRLKQYEADSAPAVALKATMERTKARFA
NYAKAGLLCGNDGLPHLIADPGLALKYGHAGEVFIPTFGFLYVAGYIGYVGRQYLIA
VKGEAKPTDKEIIIDVPLATKLAWQGAGWPLAAVQELQRGTLLEKEENITVSPR
c6:ADLALGAQVFNGNCAACHMGGRNSVMPEKTLDKAALEQYLDGGFKVESIIYQV
ENGKGAMPAWADRLSEEEIQAVAEYVFKQATDAAWKY
The laboratory approach shows that the cross-lined interaction happens in following
interaction subsequences:
PsaF: 21-28, ELKTLEKR
cyt c6: 67-81, LSEEEIQAVAEYVFK
Computational Approach
1. Algorithm using a window
PsaF: 22-29, KTLEKRLK
cyt c6: 64-70, DRLSEE_E
2. Algorithm using gaps
PsaF: 15-27, KLEKKELKTLEKR
cyt c6: 64-76, DRLSEEEIQAVAE.
Both algorithms accurately predict the interaction site
Results and Analysis
Distribution of interaction Sites in PsaF
Number of interactions at location i = |S(i)|
where S(i) = {s: s is the predicted interaction site which contains location i }
Results and Analysis
Distribution of interaction score in PsaF
Interactio n score at location i  sS (i ) score of s
where S (i )  {s : s is the predicted interactio n site which contains location i }
Results and Analysis
Distribution of interaction number and score in cyt c6
Results and Analysis
Net charge of PsaF and c6
c-terminal
n-terminal
Cys-Phe-Ile-Glu-Asn-Cys-Pro-His-His-Gly
Side chains
Amino Acid pKa Values
-carboxylic acid
c-terminal
-amino
n-terminal
Alanine A
2.35
9.87
Arginine R
2.01
9.04
Asparagine N
2.02
8.80
Aspartic Acid D
2.10
9.82
3.86
Q-
Cysteine C
2.05
10.25
8.00
Q-
Glutamic Acid E
2.10
9.47
4.07
Q-
Glutamine Q
2.17
9.13
Glycine G
2.35
9.78
Histidine H
1.77
9.18
6.10
Q+
Isoleucine I
2.32
9.76
Leucine L
2.33
9.74
Lysine K
2.18
8.95
Methionine M
2.28
9.21
Phenylalanine F
2.58
9.24
Proline P
2.00
10.60
Serine S
2.21
9.15
Threonine T
2.09
9.10
Tryptophan W
2.38
9.39
Tyrosine Y
2.20
9.11
Valine V
2.29
9.72
Amino Acid
Side chain
12.48 Q+
10.53 Q+
10.07 Q-
Results and Analysis
Net charge of of PsaF and cyt c6
Net charge s(i) of sequence s at location i is calculated from ph = 6.25 to ph = 8 at
each interval 0.25 use a window of length 7 as follows:
s(i) = the net charge of subsequence of s from position i – 3 to i + 3
Net charge NetCh(i) at position i for all 86 proteins of PsaF/c6 is defined as
NetCh(i )  sS s(i ), where S is the set of 86 PsaF/c6 sequences.
Summery and Future Work
 We proposed the mathematical model and
computational approaches for predicting interaction
sites of Cytochrome and Photosystem I. The results
show that the approaches are effective and efficient.
 In the future, we will add more interaction criteria into
the model and algorithms. We will also find more
laboratory results to compare with the results from
computational approaches.
Download