Full text

advertisement
A mathematical model for estimating
anaerobic synthesis rate constants for
Escherichia coli proteins.
Tijdens dit bachelorproject werd een wiskundig model ontwikkeld om de synthese constanten
van eiwitten in de bacterie Escherichia coli onder anaerobe omstandigheden te kunnen
schatten. Dit model gebruikt experimenteel verkregen waarden als startpunt van de
rekenmethode. De startwaarden zijn verkregen gedurende een pulse-chase experiment waarin
eiwitten werden gelabelled. Met behulp van moderne analytische technieken zijn deze
eiwitten geïdentificeerd en zijn een tweetal ratio's bepaald: de Rnew, welke het aantal nieuw
gesynthetiseerde eiwitten weergeeft en de Rtotaal, welke het totaal aantal eiwitten weergeeft.
Deze twee ratio's worden door het model als startpunt gebruikt.
Het model werd ontwikkeld met behulp van het computerprogramma MATLAB en gebruikt
diverse functies en algoritmes om differentiaalvergelijkingen op te lossen welke de Rnew en
de Rtotaal definiëren. Centraal in dit model staat de ∆R formule; deze formule beschrijft de
relatie tussen de gemeten ratio's van Rnew en Rtotaal en de voorspelde waarden welke door het
model worden bepaald. De ratio's zijn opgebouwd uit een drietal parameters; kdeg (de
degradatieconstante) ksyn,aerobic (de aerobe synthese constante) en de ksyn,anaerobic (de anaerobe
synthese constante. De eerste twee parameters zijn vooraf gedefinieerd waardoor alleen de
ksyn,anaerobic een variabele is.
Met behulp van het model kan deze laatste variabele worden geschat.
Wouter W. Woud (5974003)
Bio-Exact
4-07-2012
Supervisor: Prof. Dr. C. G. de Koster
Swammerdam Institute for Life Sciences
0
Index
Introduction...................................................................................................... Page 2
Experimental and Theoretical background...................................................... Page 4
The Model........................................................................................................ Page 9
Results.............................................................................................................. Page 16
Conclusion........................................................................................................ Page 21
APPENDIX…………………………………………………………………… Page 22
- The Source Code…………………………………………………... Page 22
- Proteomic Data Set……………………………………………….... Page 27
References.......................................................................................................... Page 31
1
Introduction
Many structural and functional properties of all living cells are defined by proteins. They are
responsible for executing many different tasks within the cells, as specified in the cell’s
DNA1. Because of this key functionality, proteins make up a large part of a cell’s
constituents. For example, half of the dry mass of an Escherichia coli cell is made up by
proteins. The other half consists mostly of DNA and RNA2.
Important insights about cellular functionality can be gained by studying protein activity and
interactions within the cell. As such, protein research is an active research field. Classic
methods of study mostly consist of protein isolation and/ or labelling. A number of
techniques have been developed to isolate proteins from cells. Most noticeably various types
of chromatography can be used to isolate proteins based on different properties such as
molecular weight, charge or binding affinity3. Labelling techniques such as the radioactive
isotope pulse-chase labelling with 35S-methionine or the use of Fluorescent Proteins (FP) can
be used to make proteins inside the cell visible under a microscope4,5. These techniques are
commonly used by scientists for various research purposes.
Protein isolation and labelling have helped to gain useful information about a wide range of
proteins. The information gained using these techniques mostly concerns protein structure or
functionality. Another part of the research field is more interested in the total ensemble of
proteins present in a cell at certain points in time. These studies generate huge data sets, and
form the field of proteomics, named after the related field of genomics, which handles large
data sets of genetic information. Important techniques in proteomics that are able to deal with
large amounts of proteins are 2D electrophoresis6 and mass-spectrometry7 .
Using 2D electrophoresis thousands of proteins can be separated simultaneously in one
separation procedure whereas mass-spectrometry allows proteome wide high-throughput
sequencing of peptides and identification of proteins. 2D electrophoresis enables the
separation of complex mixtures of proteins according to isoelectric point (pl), molecular mass
(Mr), solubility and relative abundance. Furthermore, it delivers a map of intact proteins
which reflects changes in protein expression level, isoforms or posttranslational
modifications.
This is in contrast to mass-spectrometry based methods which perform analysis on peptides
where Mr and pl information is lost and where stable isotope labelling is required for
quantitative analysis.
The huge amounts of data generated using these and other techniques find its application in
the computational systems biology. This relatively new field aims to develop and use
efficient algorithms, data structures and visualization and communication tools with the goal
of computational modeling of biological systems8 . More specifically, protein data acquired
through, for example, mass-spectrometry of whole cells can be used to create computer
models that can predict protein characteristics in these cells. Characteristics of interest may
include rates of protein synthesis and degradation and related to the latter the protein half-life,
which is the time it takes for half of a given protein population to degrade. When compared to
experimental data an attempt can be made to link these predicted characteristics to protein
functionality in the living cell.
The basis of such a predictive computer model is formed by a set of mathematical equations
that describe the behavior of proteins in the organism of interest. To simplify these equations
certain assumption concerning this behavior can, and sometimes must be made, as is also
shown later in this paper. Arguments for such assumption must be obtained from
experimental research. By optimizing model parameters such as reaction rate constants to
2
match experimental results a valid descriptive and predictive model can be created for the
biological system of interest.
In the case of whole cell proteomics an ideal model would for example be able to estimate
both synthesis and degradation constants as well as half-lives for a variety of proteins. If
experimental research provides errors for measurements, e.g. standard deviations, the model
should be able to process these as well when estimating synthesis and degradation constants.
Ultimately this model should be usable to gain useful insights in the proteome dynamics of
the organism of interest.
3
Experimental and theoretical background
The aim of our research is to build a mathematical model which can be used to estimate
reaction rate constants of both synthesis and degradation as well as half-lifes of given
proteins. Our mathematical model has been created using the computer program MatLab®
and uses a proteomic data set as input. This proteomic data set was generated during a pulsechase labelling experiment using the methionine analog azidohomoalanine (abbreviated as
‘azhal’)9. Whilst our model can be used to estimate both reaction rate constants and half-lifes,
this paper will only discuss the part of the model which estimates the anaerobic reaction rate
constants.
The proteomic data set was generated during a pulse-chase labelling experiment. This
experiment was used to identify and quantify several hundreds of newly synthesized proteins
in Escherichia coli upon pulse labelling with the methionine analogue azidohomoalanine
(azhal). For the first 30 minutes after inoculation, a methionine-auxotrophic E. coli strain
grows equally well on azhal as on methionine under aerobic conditions. Upon a pulse of 10
minutes with the methionine analogue azidohomoalanine (azhal), a change in environmental
conditions by switching from aerobic to anaerobic circumstances and digestion of total
protein, newly synthesized azhal labelled peptides are isolated by a retention time shift
between two reversed chromatographic runs. The retention time shift is induced by a reaction
selective for the azido group in labelled peptides using tris(2-carboxyethyl)phosphine.
Selectively modified peptides are identified by reversed phase liquid chromatography and
online tandem in mass spectrometry. As methionine is the more dominant competitor over
azhal, the medium should be purged of methionine before labelling for maximum
incorporation10. Thus, by using this pulse-chase experimental setup, newly synthesized
proteins can be identified because of their incorporation of the methionine analogue azhal.
Furthermore, this strategy allows us to identify changes in total protein levels on the same
time scale as new protein synthesis. Using these values for total protein level and newly
synthesized proteins two different ratios were obtained. These ratios will be explained shortly.
Model
During the exponential growth phase of E. coli cells, we assume that the number of proteins
increases proportional to cellular mass and that protein synthesis and degradation are firstorder with the number of proteins.
At the beginning of the experiment (t = 0), environmental conditions are altered from aerobic
to anaerobic and only non-labeled proteins are present in E. coli. These unlabelled proteins
are named Pold as they solely exist as pre-existing material, and thus only methionine
containing peptides are present. After 10 minutes of Azhal labelling under anaerobic
conditions the unlabelled proteins have been partly degraded and new proteins which contain
the azhal label have been synthesized. These labeled proteins are named Pnew. The
degradation of Pold is given by the following differential equation:
4
 Pold
 k deg * Pold (1)
t
We assume that protein synthesis is of the first-order with the total amount of protein present
in the cell. Therefore, Pnew is being formed with a first-order rate constant ksyn and degraded
by kdeg. We assume that kdeg has the same value for both labelled and unlabelled proteins, as
well as for aerobic and anaerobic conditions. Arguments for this assumption can be found in
the supplementary data of the publication ‘Proteome-wide alterations in Escherichia coli
translation rates upon anaerobiosis’, Kramer et al9.
As such, the rate of change of azhal containing peptides (Pnew) is defined by the following
differential equation:
 Pnew
 k syn * Pold  (k syn  k deg ) Pnew
t
( 2)
The term ksyn*Pold seems a little strange at first, but recall that we assume that protein
synthesis is of the first-order with the total amount of protein present in the cell. Therefore,
the concentration of newly formed proteins depends on both the pre-existing amount of
protein (unlabelled, Pold) and the newly synthesized amount of protein (labelled, Pnew).
Integration of both differential equations is straightforward and yields the time-dependant
functions Pold(t) and Pnew(t):
Pold (t )  Pold (0) * e
 kdegt
Pnew (t )  Pold (0) * [e
(3)
( k syn  kdeg ) t
e
kdegt
] (4)
After summation of Pold(t) and Pnew(t) the term Ptotal(t) is acquired. This term reflects the total
amount of protein present at any given time and is defined by the following equation:
Ptotal (t )  Pold  Pnew  Pold (0) * e
( ksyn kdeg ) t
(5)
As described earlier our model uses data derived from a proteomic data set as input. This
experimentally acquired data is being used to create two different ratio’s named Rnew,measured
and Rtotal,measured which are consequently being used as input entry in our model.
How exactly are the ratios of Rnew and Rtotal defined? The first ratio is the number of newly
synthesized proteins during the pulse labelling time interval upon transition from aerobic to
anaerobic conditions. The ratio of newly synthesized proteins (Rnew) under anaerobiosis (Pnew,
anaerobic) and aerobiosis (Pnew, aerobic) is defined as:
Rnew 
Pnew, anaerobic(t )
Pnew, aerobic (t )

Pold , anaerobic(0) * [e
Pold , aerobic (0) * [e
5
( k syn, anaerobic  k deg ) t
( k syn, aerobic  k deg ) t
e
e
 k deg t
 k deg t
]
]
(6)
Please note that these parameters strongly represent equation 4. Equation 7 can be reduced to
an easier form by replacing the subtractions made with -1, thus obtaining equation 8. This
substitution can be made as in both the dividend and the divisor the same value is being
subtracted.
Pold , anaerobic(0) e ksyn, anaerobic t  1
Rnew 
*
(7 )
Pold , aerobic(0) e ksyn, aerobic t  1
The second ratio is the copy number of total protein level at the end of the labelling time
between the two environmental conditions. This ratio is derived from the ratios of peptides
that do not contain methionine or azhal. Azhal peptides are excluded since these represent
exclusively newly synthesized material, while methionine peptides are excluded because they
represent only pre-existing material. The non azhal/methionine-containing peptide copy
number equals in the kinetic model the summation of Pold(t) and Pnew(t), as described in
equation 5.
(k
k ) t
( Pold (t )  Pnew (t )) anaerobic Pold , anaerobic(0) e syn, anaerobic deg
Rtotal 

*
(8)
( Pold (t )  Pnew (t )) aerobic
Pold , aerobic (0) e ( ksyn, aerobic kdeg )t
This total protein ratio reflects the overall protein expression between the two environmental
states. In our model these two experimental ratios (Rnew & Rtotal) are predicted by three
different reaction rate constants (ksyn, anaerobic, ksyn, aerobic and kdeg). To compare the predicted
ratios of Rnew and Rtotal with the measured ratios (as is the ultimate goal, as can be seen in
equation 13) several assumptions are made. First of all, as stated before the aerobic culture is
in the exponential growth phase. Thus, the whole of metabolic, proteomic and genomic
activities within the E. coli cells are set for reproduction. When the environmental conditions
are altered (form aerobic to anaerobic) a growth arrest of approximately 10 minutes occurs
within the E. coli cells. During these 10 minutes, the E. coli cells are altering their
transcriptome and proteome in response to the changed environmental conditions.
Secondly, to estimate protein half-lifes from the proteomics data set the value of kdeg is being
varied. More information on how this variation in kdeg occurs can be found in “ A
mathematical model for estimating protein half-lifes in Eschericia coli” by Tom Boer.
From literature it is known that there are approximately two different protein populations
with regard to protein degradation rates within E. coli: a small rapidly degrading population
with half-lifes lower than 10 minutes and a more slowly degrading population with half-lifes
ranging between several hours up to 23 hours11,12. The degradation rate constant kdeg is
inversely connected to protein half-life by the following equation:
k deg 
ln 2

(9)
As kdeg has been defined, the next step would be the estimation of ksyn, aerobiosis. During the ten
minutes labelling time (tp) the optical density (O.D.600 nm) of the E. coli culture increases with
a factor P(tp) divided by P(t0). This implies that biomass increases with the same factor and
6
using the assumption that total protein content scales linearly with biomass the increase of
protein mass also equals P(tp) divided by P(t0). Furthermore, the relative protein composition
does not change during steady state aerobic growth. This means that the increase in copy
number for each protein is also P(tp) divided by P(t0). For each protein, ksyn, aerobiosis is related
to its kdeg during steady state aerobic conditions via:
k syn,aerobiosis 
1 P(t p )
ln
 k deg  r  k deg (10)
t p P (t 0 )
And is related to protein half-life as:
k syn,aerobiosis  r 
ln 2

(11)
The E. coli cells used to generate the data set were exponentially growing and P(tp)/P(t0)
yielded a value of 1.0625. This value was named ‘growth’ in our model. The ‘r’ value as
notated in the equations above was calculated using the pulse time and the growth value, as
described in equation 13. Using this equation, the ‘r’ value thus yields a fixed value of 0.0026.
Note that the ‘r’ value is the same for all proteins.
r
1
* log( growth) (12)
pulsetime
Thus, all of the parameters needed to calculate values of Rnew,calculated and Rtotal,calculated are now
defined. Please notice the difference between Rnew,measured / Rtotal,measured and Rnew,calculated /
Rtotal,calculated, the first being the ratio’s our model uses as input from the proteomic data set
and the later being the ratio’s which will be used to estimate the anaerobic synthesis constant
ksyn,anaerobic.
The anaerobic synthesis constant is being estimated by using the following formula:
R  Rnew,calculated  Rnew,measured   Rtotal,calculated  Rtotal,measured 
(13)
Equation 13 can be seen as the summation of the absolute values of a variable minus a fixed
value. As both Rnew, measured and Rtotal,measured are fixed values (inputted directly from the
proteomic data set) and ∆R is being minimized as much as possible to ascertain our model
matches reality as much as possible, we are able to find a value of ksyn,anaerobic.
For example, let's set ∆R = 0, and use the Rnew, measured and Rtotal,measured values of aldehydealcohol dehydrogenase (ADHE) from the proteomic data set. When both Rnew,calculated and
Rtotal,calculated would be written as equations 7 and 8, equation 13 would look like this:
 Pold , anaerobic(0) e ksyn, anaerobic t  1
  Pold , anaerobic(0) e( ksyn, anaerobic kdeg )t


0
* ksyn, aerobic t
 10,26   
* ( ksyn, aerobic kdeg )t  2,59 
 Pold , aerobic(0) e
  Pold , aerobic(0) e

1

 

7
(14)
As stated before, Rnew,calculated and Rtotal,calculated consist of 3 different parameters: kdeg, ksyn,aerobic
and ksyn,anaerobic. As both kdeg and ksyn,aerobic have been defined only one parameter is subject to
change: the ksyn,anaerobic.
Thus, the Rnew,calculated and Rtotal,calculated values can be used, in combination with the
minimization of ∆R, to obtain values for ksyn,anaerobic. How the estimation of ksyn,anaerobic occurs
exactly will be explained in great detail in chapter 3, ‘The Model’.
8
The Model
Our model runs in various stages, which are presented schematically in figure 1. The first
step is the extraction of data from the proteomic data set. Secondly, normally distributed
random numbers are generated over the extracted data. After those numbers have been
generated, a set of 4 differential equations will be solved in combination with a least square
function called lsqnonlin. The last steps are the calculation of the anaerobic synthesis
constant ksyn,anaerobic and the' re-writing' of the calculated data into the proteomic data set.
Figure 1: A schematic overview of the model.
The various steps of the model described above will be discussed in great detail within this
chapter. We begin this chapter with an introduction to the differential equations.
Our model uses a set of 4 differential equations in order to estimate the anaerobic synthesis
constants ksyn,anaerobic from E. coli proteins. These four different differential equations
(equation 14) represent the amount of unlabelled protein present under aerobic conditions
(differential equation 1, Pold, aerobic), the amount of labelled protein present under aerobic
conditions (differential equation 2, Pnew,aerobic), the amount of unlabelled protein present under
anaerobic conditions (differential equation 3, Pold,anaerobic) and the amount of labelled protein
present under anaerobic conditions (differential equation 4, Pnew,anaerobic). As we can see these
differential equations show strong similarities with equations 1 & 2 from the previous chapter.
9
dy1
dt
dy 2
dt
dy3
dt
dy 4
dt
  k deg *
y1
dt
 k syn,aerobic *
  k deg *
y1
y2
 (k syn,aerobic  k deg ) *
dt
dt
y3
dt
 k syn,anaerobic *
y3
y4
 (k syn,anaerobic  k deg ) *
dt
dt
Paerobic
 y1 y 2
Panaerobic
 y3  y 4
y4
(15)
y2
P
 anaerobic
Paerobic
Rnew, calculated 
Rtotal, calculated
The equations described above (equation 14) are being combined in the same ways as
described in the previous chapter (equation 5). By adding dy1 and dy2 a new equation can be
dt
dt
obtained which represents the total amount of protein present under aerobic conditions. This
adding process can be repeated for dy3 and
dt
dy4 which
dt
returns a function representing the total
amount of protein present under anaerobic conditions. By dividing equation 4 with equation 2
the parameter Rnew, calculated is being obtained. Rtotal,calculated is being obtained by dividing the
total amount of protein present under anaerobic conditions (Panaerobic) through the total
amount of protein present under aerobic conditions (Paerobic).
As stated before, the aim of our research is to build a mathematical model which can estimate
the anaerobic synthesis constants ksyn,anaerobic of various proteins from E. coli. For this purpose
a mathematical model was build using the computer program MatLab®, the MathWorks. This
model heavily relies on the series of equations described thus far, and will be discussed in
great detail within this chapter. Both the source code of the model and the excel file
containing the Rnew,measured and Rtotal,measured values derived from the proteomic data set can be
found in the appendix ‘Proteomic Data Set’, page 26.
Data extraction
Our model starts with the extraction of the Rnew, measured and Rtotal,measured values from the
proteomic data set. The data in the first column of the proteomic data set represents the
protein number, which allows us to easily find the number of proteins our model will use
when operating. The second column represents the protein acesion number, a code with wich
the protein can ben identified. The third column represents the protein names, e.g. ADHE for
aldehyde-alcohol dehydrogenase. The fourth, fith, sixth and seventh columns of the
proteomic data set represent the measured values of Rnew and Rtotal and their corresponding
standarddeviations. The other columns represent the outputted data; more information
provided at the end of this chapter.
The data extraction has been automated in such way that when running the model a pop-up
screen appears which will ask the user to select an excel file of choice. The codes for this
feature are shown below:
10
% read excel data
[filename, pathname] = uigetfile('*.*','Select data file');
path = [pathname,filename];
[num, txt, raw] = xlsread(path);
[lcol, lrow] = size(num);
Matlab automatically separates text and numerical values from the extracted data. Numerical
values are returned in the array num and text will be returned in the array txt. If a cell
contains both numerical values and text it is considered as raw data and will be stored in the
array raw . This feature is extremely handy as it solves the problem of separating text and
numerical values manually and thus saves a lot of time. Also, the size of the numerical array
is being determined. As our proteomic data set has its proteins listed from top to bottom, the
array lcol is a direct indication of the amount of proteins extracted from the excel file.
However, as can be seen in the appendix 'Proteomic Dataset' our model does not use all
numerical values provided by the excel file: the first numerical value encountered, the
'protein number', has no value for the estimation of the anaerobic synthesis constants. Only
the values of Rnew, measured and Rtotal, measured and their corresponding standard deviations are
needed for our calculations. The standard deviations have been obtained by conducting the
pulse-chase experiment several times.
Creating normally distributed values of Rnew & Rtotal
The standard deviations are being used by the following code line to generate a number of
normally distributed values of Rnew,measured and Rtotal,measured around the given means of both
values.
% create n normally distributed randoms around Rnew
randoms(:,1) = random('norm',num(g,4),num(g,5),[n 1]);
% create n normally distributed randoms around Rtotal
randoms(:,2) = random('norm',num(g,6),num(g,7),[n 1]);
These codes have a two folded functional character and operate as follows: the first position
after random is specified for the type of distribution one would like to create. In our model,
we wanted the values of Rnew,measured and Rtotal,measured to be distributed normally and therefore
the options have been set to 'norm'. The second and third positions specify the mean and
standard deviation respectively. As can be seen, these values are being extracted from the num
array which was created previously. For example, the code num(g,4)implies that the
numerical value of row g and colomn 4 are being used as input for the mean value of
Rnew,measured. Since every g represents a protein in our model this code ensures that normally
distributed randomly generated values of both Rnew, measured and Rtotal, measured are being created
for all proteins.
The last position, [n 1], determines the size of the array in which the randomly created
values of Rnew,measured and Rtotal,measured are being stored. This array has the size of a matrix with
n rows and 1 column. Because the n parameter has been set to a value of 300, a [300 1]
matrix will be created, thus resulting in 300 randomly generated normally distributed values
of both Rnew,measured and Rtotal,measured. Since the n parameter has been defined at the start of the
model (appendix: ‘The Source Code’) the number of randoms generated can easily be
adjusted by changing the value of this parameter.
The two folded functional character of these codes are due to the fact that they both generate
random numbers as well as extract the correct data from the num arrays created by the
11
xlsread
function.
The random numbers generated are stored in the arrays Rnewmeasured and Rtotalmeasured
by the following command line:
% optimize ksynanaerobic for all randoms
for p = 1:n
Rnewmeasured = randoms(p,1);
Rtotalmeasured = randoms(p,2);
Note that the random values have changed from an n index to a p index.
Calculation of ksyn,anaerobic
The randomly generated Rnew, measured and Rtotal,measured values are being used as input entry for
the ∆R formula (equation 13, chapter 2 ‘Experimental and theoretical background’). Equation
13 can be seen as the summation of the absolute values of a variable minus a fixed value.
That is, every randomly generated value for Rnew, measured and Rtotal,measured represents a fixed
value. In the specific case of the protein aldehyde-alcohol dehydrogenase (ADHE), the first
protein in the proteomic data set, equation 13 can be written in the following form:
R ADHE  Rnew,calculated  10,26  Rtotal,calculated  2,59
(16)
Please note that the values of Rnew, measured and Rtotal,measured in this formula represent the
standard values as can be found in the appendix ‘Proteomic Data Set’, and not values which
have been randomly generated.
The parameters Rnew,calculated and Rtotal,calculated consist of three variables from which two have
been defined previously (ksyn,aerobic and kdeg), resulting in only one variable left for estimation.
This last variable, ksyn,anaerobic (denoted x in our model), is being estimated by Matlab using
an optimization routine function called ‘lsqnonlin’. This function solves nonlinear leastsquares problems, including nonlinear data-fitting problems. The function lsqnonlin in our
model is being called on by the following command lines:
[x,resnorm] = lsqnonlin(@diff,x0,lb,ub,options);
options = optimset
('Display','off','Largescale','off','Algorithm','levenberg-marquardt');
The function lsqnonlin is designed, as stated before, to solve non-linear least-square
problems. Since the ∆R value has to be as small as possible for our model to reflect reality,
our model faces a non-linear least square problem and thus the lsqnonlin function is being
used.
However, before the operation of the lsqnonlin function can be explained we first have to
address the problem of solving the differential equations as described in equation 15. If we
take a closer look at the lsqnonlin command line one can see that the function tries to find a
minimum in the sum of squares of the functions described in diff. Its starting value for x
(denoted as x0) is set to 0, and both the upper and lower bounds are empty (lb = [] & ub
= []) implying no limits for lsqnonlin to find values of x which result in a minimum of ∆R.
These values for x0,lb,ub are set at the first section of the source code (% define
lsqnonlin parameters).
12
The command lines for the diff function are depicted below. As one can see the diff
section is a rather large part in our model, and will therefore be discussed step by step.
function deltaR = diff(x)
% solve the ode as described in the function 'kinetics'
options = odeset('RelTol', 1e-4);
[t,sol] = ode45(@kinetics,tspan,y0,options,x);
The first step within the diff function is solving the differential equations described in
equation 14. This solving process is carried out by the function ode45, a solver that integrates
ordinary differential equations. The differential equations which are to be solved are situated
in the function kinetics (@kinetics), which can be found at the end of the source code. The
interval of integration is set by tspan and equals the duration of the pulse-chase experiment,
which was ten minutes in our model. Thus, tspan = [0 pulsetime]. The starting values for
the differential equations are set by y0 and are 100, 0, 106.25 and 0 respectively. Recall that
the first and third differential equations represent the amount of unlabelled protein present
under aerobic conditions and the amount of unlabelled protein present under anaerobic
conditions respectively, and thus will only decrease over time. The options only contain a
function that controls the number of digits in all solution components, assuring all outputted
entries have the same length. The only variable in here is x, and all solutions are stored in a
matrix of size [t,sol].
While using this solving routine, problems were encountered in the form of NaN errors.
This problem was solved by removing the solving values at t=0 from the [t,sol] matrices.
% remove y0 from data to prevent NaN errors
sol(1,:)=[];
The next step is to extract the y vectors from the matrices and to assure all vectors have the
same length. These objectives are achieved by the following command lines:
% extract y vectors from the sol matrix
y1 = sol(:,1);
y2 = sol(:,2);
y3 = sol(:,3);
y4 = sol(:,4);
% fix
ny1 =
ny2 =
ny3 =
ny4 =
y vector length to prevent matrix dimension errors
y1(1:z);
y2(1:z);
y3(1:z);
y4(1:z);
The vector lengths are being set by parameter z, which is set to the value of 56. This number
was empirically found the highest value for which our model would operate without any
errors (vector length mismatch).
The last step occurring in diff is the actual calculation of ∆R. The values calculated by the
ode45 function are being combined in the same ways as described in equation 14, resulting in
values of Razhal and Rtotal, also known as Rnew,calculated and Rtotal, calculated. By using
equation 13 our model now is able to calculate ∆R.
13
% calculate the difference between measured en predicted data
Paeroob = ny1 + ny2;
Panaeroob = ny3 + ny4;
Razhal = ny4 ./ ny2;
Rtotal = Panaeroob ./ Paeroob;
deltaR = abs(Razhal- Rnewmeasured)+abs(Rtotal-Rtotalmeasured);
deltaR = deltaR(z);
end
Since we are only interested in the final value of ∆R (that is, the value of ∆R after ten minutes
of labelling), the 56th entry is being taken as the final value of ∆R (deltaR = deltaR(z)).
Thus, the function lsqnonlin solves the differential equations in such way that the ∆R value
becomes a minimum, making our model reflect reality as good as possible.
The anaerobic synthesis constant ksyn,anaerobic is the x value for which the minimum is achieved,
and is being extracted by the following command line:
ksynanaerobic(p) = x;
Please note the p index and recall that the values of Rnew,measured and Rtotal,measured in reality
consist of 300 randomly generated normally distributed values. This means that there are 300
different values of ksyn,anaerobic as well. The final value of ksyn,anaerobic as well as its standard
deviation are being calculated by taking the mean of all 300 values of ksyn,anaerobic and using

n
the formula  n 
i 1
( xi   ) 2
respectively.
n
These calculations are being achieved by the following codes:
ksynanaer(g) = mean(ksynanaerobic);
ksynanaersd(g) = sqrt((sum((ksynanaerobic-mean(ksynanaerobic)).^2))/n);
Returning data to excel file
The final step in our model is the transfer of the calculated data back to the proteomic dataset.
This is being achieved by the following command lines:
% write data to Excel
disp('Writing...')
xlswrite(path, {'ksynanaer'}, 1, 'K1')
xlswrite(path, ksynanaer', 1, 'K2')
xlswrite(path, {'sd'}, 1, 'L1')
xlswrite(path, ksynanaersd', 1, 'L2')
Recall that during the data extraction, the first step in our model, an excel file was chosen and
stored in the array path. The code lines described above use this same path (first entry after
xlswrite) to write both the ksyn,anaerobic and their standard deviations into the proteomic data
set. The second entry defines the outputted data, and the third and fourth entries define the
sheet and range respectively. Note that the second entry contains either {'ksynanaer'} /
{'sd'} or ksynanaer' / ksynanaersd'; the first one (purple) can be seen as a title for the
outputted data whereas the second line represents the actual matrix containing the calculated
data.
14
Additional features
Our model uses a set of codes to help us monitor its calculation progress. This set of codes
represents a timer function, enabling us to monitor the time our model needed to complete its
calculation procedure. These codes use the tic/toc combination and are depicted below:
tic
‘Calculation Zone’
Elapsedtime = toc/60;
str = ['Elapsed time: ', num2str(Elapsedtime), ' minutes'];
disp(str)
The tic code starts a stopwatch timer which runs in seconds, and the toc code stops this
timer. The tic code should be placed at the very start of the model, that is, just before the
model takes its first action. As the first action of our model is the extraction of data from the
proteomic data set, tic should be placed just before the % read excel data
codes.
Just as tic should be placed at the start of the model, toc should be placed at the end of it.
As our model ends with the transfer of the calculated data back to the proteomic data set, toc
should be placed right after these codes. Thus, the ‘Calculation Zone’ represents all the
function and codes in our model which play a role in estimating the anaerobic synthesis
constants. In the codes above, the value of toc is being divided by 60 to acquire the time in
minutes. The calculation time then is being displayed by disp(str), which displays ‘Elapsed
time: (calculation time) minutes’.
15
Results
Effect of the randoms on the ksyn,anaerobic
In order to describe the effect of the number of normally distributed Rnew and Rtotal values
generated by our program on the estimation of ksyn, anaerobic and to find out what the minimal
value of n required is, a computational experiment was done by running our model with a
varying number of randoms whilst estimating the anaerobic synthesis rate constant for 1
protein only. The protein we selected for this experiment was aldehyde-alcohol
dehydrogenase (ADHE).
We ran our model using n = 1, 50, 100, 150, 200, 250 and 300 numbers of randoms over the
values of Rnew, measured and Rtotal, measured. A total of ten simulations for each random number
value (n) were done, and the outcomes of the values of ksyn,anaerobic were plotted against the
number of randoms used to calculate them. Note, the ksyn,anaerobic values calculated this way are
avaraged; e.g. generating 300 values of Rnew,measured and Rtotal,measured still lead to one value of
ksyn,anaerobic per simulation. The results of this experiment are shown in figure 2.
Figure 2: Effect of the n value on the ksyn, anaerobic
Obviously, using only 1 random number leads to a wide variety in ksyn, anaerobic values (values
ranging between 0.087 and 0.178), whereas n values ranging from 50 to 300 all lead to a
very uniform distribution of ksyn, anaerobic values. Thus, the minimum number of randoms
needed to obtain an accurate value of ksyn, anaerobic could be set to 50.
However, if we exclude the date from n =1 and take the mean of all estimated ksyn,anaerobic
values per n, we obtain a new graph which shows the variation in ksyn, anaerobic for the n values
from 50 to 300 in great detail (figure 3). The error bars represent the standard deviation
within the meaned estimated values of ksyn,anaerobic and do not represent the standard
deviations calculated by our model.
16
Figure 3: The effect of the n value on the ksyn, anaerobic, meaned values.
This new graph shows us that the higher the n value used, the smaller the variation in ksyn,
anaerobic becomes; n values exceeding 300 will probably lead to even further narrowing of the
variation. However, we found that the higher the n value becomes the longer it takes for our
model to complete its calculations.
Calculation Time
To see whether we could compose a formula which could estimate the calculation time of our
model in advance we ran our model for the ADHE protein with n values ranging from 1 to
400, intervals of 20 and recorded the time needed for our model to complete its calculations.
The results of this experiment are depicted below.
Figure 4: The effect of the n value on calculation time, 1 protein calculated.
17
As we see the time needed for our model to complete its calculation procedure scales linearly
with the number of randoms used. To gain insight in how the calculation time scales with an
increase of proteins, the experiment was repeated for 2 proteins. Results of this experiment
are depicted below:
Figure 5: The effect of the n value on calculation time, 2 proteins calculated.
As we see the time needed for our model to complete its calculation procedure scales linearly
with the number of randoms used, regardless whether only 1 protein is being estimated or 2.
By comparing both trend line formulas we observe that the trend line formula for 2 proteins is
roughly twice as large as the trend line formula for 1 protein. With this information we were
able to compose a formula which roughly estimates calculation time in advance:
(t )  g * (0.0049 * n)  (1.0018 * g )
In this formula, t stands for time in minutes, g stands for the number of proteins used and n
stands for the number of randoms used. To verify whether this formula also holds for 3 or
more proteins, we conducted a series of runs with our model. In these runs, both the number
of proteins (g) and the number of randoms (n) were being varied:
Parameter settings
Estimated Calculation Time
True Calculation Time
g=5
& n = 250
11,34 minutes
10,81 minutes
g=8
& n = 140
13,50 minutes
12,81 minutes
g=3
& n = 360
8,29 minutes
7,75 minutes
g = 10
& n = 80
13,94 minutes
13,49 minutes
g = 15
& n = 120
23,85 minutes
23,06 minutes
18
Tabel 1: Estimated Calculation Time compared with True Calculation Time.
Note: Parameter values chosen at random.
We conclude that our formula is able to roughly estimate the calculation time needed by our
model.
Verifying the distribution of the randomly generated values
So far we have discussed the effect of the number of randoms on the estimation of ksyn,anaerobic
and its calculation time, but it is still unknown whether all estimated values of ksyn, anaerobic by
fitting of our kinetic model are normally distributed.
To verify whether these values are distributed normally, a simulation for ADHE was done
using an n value of 10.000 (calculation time: over 50 minutes). After the simulation was
complete, we plotted all 10.000 different values of ksyn, anaerobic in a histogram using the
following order of functions:
x = min(ksynanaeroob):.005:max(ksynanaeroob);
hist(ksynanaeroob,x)
title('Random distribution of Ksyn,anaerobic for ADHE')
The first line specifies the location on the x-axis and the number of the bins by taking the
lowest value and the highest value out of the 10.000 estimated ksyn, anaerobic values and divides
that space in steps of 0.005. So the 1st bin in the histogram has the lowest ksyn,anaerobic value on
the left side, and the right side of this bin basically is the minimum value + 0.005. So the
width of every bin is 0.005. The second line creates the histogram by using the 10.000 values
of ksyn, anaerobic and the value set by ‘x’ (thus over how many bins the 10.000 ksyn,anaerobic data
should be distributed). The third line obviously adds a title to the graph. The results of this
simulation can be found in figure 6.
Random distribution of Ksyn,anaerobic for ADHE
1000
900
Bin Count: 926
800
Bin Center: 0.133
Bin Edges: [0.13, 0.135]
700
600
500
400
300
200
100
0
-0.05
0
0.05
0.1
0.15
0.2
Figure 6: Random distribution of ksyn,anaerobic for alcohol-aldehyde
dehydrogenase (ADHE)
19
As we created random values for Rnew,measured and Rtotal,measured by using a normal distribution
(chapter 3: ‘The Model’, Creating normally distributed values of Rnew,measured and Rtotal,measured)
we expected a curve that would look like a Gaussian distribution / bell shaped curve.
However, figure 6 seems to be skewed to the left and thus does not have the characteristics of
a normal distribution. Also, the mean of the distribution in figure 6 is situated at 0.133,
whereas the calculated mean, that is, the mean outputted by our model, has a value of 0.126.
This skewness can be explained due to the fact that the randomly generated values of
Rnew,measured and Rtotal,measured enter the differential equations as depicted in equation 15. Since
these differential equations are not linear, skewness may appear.
The Proteomic Data Set
The complete list of all estimated ksyn,anaerobic values and corresponding standard deviations
per protein can be found in the appendix ‘Proteomic Data Set’. Please note that both the
anaerobic synthesis constants as well as the half-lifes are presented within the proteomic data
set. For information on how the half-life values were obtained please read the paper “A
mathematical model for estimating protein half-lifes in Eschericia coli” by Tom Boer.
20
Conclusion
The aim of our project was to build a model which could predict values of anaerobic
synthesis rate constants of proteins in E. coli under anaerobic conditions. In order to achieve
this goal the software program MATLAB was used as described in the previous chapters, and
a list of anaerobic synthesis constants (denoted as ksyn, anaerobic) and their corresponding
standard deviation was outputted. This complete list can be found in the appendix (Proteomic
Data Set).
In order to output reasonably accurate values an n-value ranging between 50 and 300 should
be used. All values in the appendix were calculated using an n-value of 300. Using n-values
exceeding 300 will lead to more narrowed-down values for the anaerobic synthesis constants.
However, this will lead to an increase in calculation time.
While our model does output values for ksyn, anaerobic we cannot state that the values which are
being outputted are correct as referential material is scarce.
We believe our model can be used to calculate synthesis constants for other organisms as
well as for aerobic conditions by using different sets of differential equations and by
adjusting the Azhal pulse labelling experiment.
21
APPENDIX
The Source Code
function [HL,HLsd,ksynanaer,ksynanaersd] = omegatau
% define differential equation parameters
pulsetime = 10;
growth = 1.0625;
% define lsqnonlin parameters
x0 = 0;
lb = [];
ub = [];
% define ode solver parameters
y0 = [100; 0; 106.25; 0];
% read excel data
[filename, pathname] = uigetfile('*.*','Select data file');
path = [pathname,filename];
%#ok<NASGU> suppressed 'unused variable' underline.
[num, txt, raw] = xlsread(path);
%#ok<NASGU>
[lcol, lrow] = size(num);
%initiate program runtime timer
tic
disp('Reading...')
% define randoms and ode restriction parameters
n = 300;
z = 56;
% preallocate matrices for speed and cleaner m-file
HL = zeros(1,lcol);
HLsd = zeros(1,lcol);
ksynanaer = zeros(1,lcol);
ksynanaersd = zeros(1,lcol);
ksynanaerobic = zeros(1,n);
deltaR4 = zeros(lcol,11);
deltaR5 = zeros(lcol,n);
% initiate progressbar
progressbar('Protein','Randoms','Total HL progress','Calculating HL');
for g = 1:lcol
Rnewmeasured = num(g,4);
Rtotalmeasured = num(g,6);
% progressbar parameter
22
b=1;
% HL optimum is narrowed down using intervals to reduce calculation time
hlrange = 1:100:10001;
deltaR = calcdeltaR;
[minvalue,position] = min(deltaR);
HLtemp = hlrange(position);
hlright = HLtemp+100;
hlleft = HLtemp-100;
if hlleft <= 0
hlleft = 1;
hlright = hlright+1;
end
b=2;
hlrange = hlleft:10:hlright;
deltaR = calcdeltaR;
[minvalue,position] = min(deltaR);
HLtemp = hlrange(position);
hlright = HLtemp+10;
hlleft = HLtemp-10;
if hlleft <= 0
hlleft = 1;
end
b=3;
% run final 20 HL's to find optimal HL
hlrange = hlleft:hlright;
deltaR = calcdeltaR;
[minvalue,position] = min(deltaR);
HL(g) = hlrange(position);
p = 0.5;
% add a decimal point to HL
for h = (HL(g)-p):0.1:(HL(g)+p)
halflife = h;
r = (1 / pulsetime)*log(growth);
kdeg = log(2) / halflife;
ksyn = r + kdeg;
options =
optimset('Display','off','Largescale','off','Algorithm','levenbergmarquardt');
[x,resnorm] = lsqnonlin(@diff,x0,lb,ub,options);
i = (HL(g)-p):0.1:(HL(g)+p);
[a,b] = size(i);
fv = find(i==h);
deltaR4(g,fv) = diff(x);
23
progressbar([],[],3/4,fv/b);
end
[minvalue,position] = min(deltaR4(g,:));
HL(g) = i(position);
halflife = HL(g);
r = (1 / pulsetime)*log(growth);
kdeg = log(2) / halflife;
ksyn = r + kdeg;
% create n normally distributed randoms around Rnew
randoms(:,1) = random('norm',num(g,4),num(g,5),[n 1]);
% create n normally distributed randoms around Rtotal
randoms(:,2) = random('norm',num(g,6),num(g,7),[n 1]);
% optimize ksynanaerobic for all randoms
for p = 1:n
Rnewmeasured = randoms(p,1);
Rtotalmeasured = randoms(p,2);
options =
optimset('Display','off','Largescale','off','Algorithm','levenbergmarquardt');
[x,resnorm] = lsqnonlin(@diff,x0,lb,ub,options);
ksynanaerobic(p) = x;
deltaR5(g,p) = diff(x);
progressbar([],p/n,4/4,[]);
end
ksynanaer(g) = mean(ksynanaerobic);
ksynanaersd(g) = sqrt((sum((ksynanaerobic-mean(ksynanaerobic)).^2))/n);
HLsd(g) = sqrt((sum((deltaR5(g,:)-mean(deltaR5(g,:))).^2))/n);
progressbar(g/lcol,0,0,0);
end
% optimalization routine
function deltaR = calcdeltaR
% indexing parameter
i=0;
for hl = hlrange
i = i+1;
% define differential equation parameters
halflife = hl;
% "
r = (1 / pulsetime)*log(growth);
% "
kdeg = log(2) / halflife;
% "
24
ksyn = r + kdeg;
options =
optimset('Display','off','Largescale','off','Algorithm','levenbergmarquardt');
%#ok<SETNU> optimize differential equations as found in the "kinetics"
function for x
[x,resnorm] = lsqnonlin(@diff,x0,lb,ub,options);
%#ok<AGROW> re-enter x in diff(x) to output(deltaR)
deltaR(i) = diff(x);
% update progress bars
if b==1
progressbar([],[],[],hl/10001);
% "
elseif b==2
[d,e] = size(hlleft:10:hlright);
progressbar([],[],1/4,i/e);
% "
elseif b==3
[d,e] = size(hlleft:hlright);
progressbar([],[],2/4,i/e);
end
end
end
% write data to Excel
disp('Writing...')
xlswrite(path, {'Half-life'}, 1, 'I1')
xlswrite(path, HL', 1, 'I2')
xlswrite(path, {'sd'}, 1, 'J1')
xlswrite(path, HLsd', 1, 'J2')
xlswrite(path, {'ksynanaer'}, 1, 'K1')
xlswrite(path, ksynanaer', 1, 'K2')
xlswrite(path, {'sd'}, 1, 'L1')
xlswrite(path, ksynanaersd', 1, 'L2')
disp('Done!')
beep
% stop and display runtime timer
Elapsedtime = toc/60;
str = ['Elapsed time: ', num2str(Elapsedtime), ' minutes'];
disp(str)
function deltaR = diff(x)
% solve the ode as described in the function 'kinetics'
options = odeset('RelTol', 1e-4);
[t,sol] = ode45(@kinetics,tspan,y0,options,x);
25
% remove y0 from data to prevent NaN errors
sol(1,:)=[];
% extract y vectors from the sol matrix
y1 = sol(:,1);
y2 = sol(:,2);
y3 = sol(:,3);
y4 = sol(:,4);
% fix
ny1 =
ny2 =
ny3 =
ny4 =
y vector lenght to prevent matrix dimension errors
y1(1:z);
y2(1:z);
y3(1:z);
y4(1:z);
% calculate the difference between measured en predicted data
Paeroob = ny1 + ny2;
Panaeroob = ny3 + ny4;
Razhal = ny4 ./ ny2;
Rtotaal = Panaeroob ./ Paeroob;
deltaR = abs(Razhal- Rnewmeasured)+abs(Rtotaal-Rtotalmeasured);
deltaR = deltaR(z);
end
%#ok<INUSL> used by ODE solver
function dydt = kinetics(t,y,x)
% define differential equations
k = [kdeg; ksyn; kdeg; x];
dydt =
[
-k(1)*y(1)
k(2)*y(1) + (k(2)-k(1))*y(2)
-k(3)*y(3)
k(4)*y(3) + (k(4)-k(3))*y(4) ];
26
Proteomic Dataset
protein no.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
acession no.
P0A9Q7
P09373
P0AED0
P68066
P0A796
P0AET8
P37689
P0ABJ9
P0A799
P21599
P0A6T1
P0A8M6
P08839
P0A9B2
P0A6P9
P0AB71
P0A8M3
P08506
P32176
P23843
P69783
P77804
P0A707
P0A6Y8
P23893
P0ABH9
P0AD61
P17169
P0A9A6
protein name
ADHE
PFLB
USPA
GRCA
K6PF1
HDHA
GPMI
CYDA
PGK
KPYK2
G6PI
YEEX
PT1
G3P1
ENO
ALF
SYT
DACC
FDOG
OPPA
PTGA
YDGA
IF3
DNAK
GSA
CLPA
KPYK1
GLMS
FTSZ
Rnew*
10,26
7,76
6,43
6,31
5,15
4,96
4,31
3,38
3,16
3,12
3,01
2,61
2,54
2,46
2,38
2,06
2
1,93
1,93
1,84
1,81
1,77
1,75
1,66
1,63
1,61
1,61
1,57
1,52
S.D.
2,93
4,59
1,85
2,75
1,47
0,61
1,02
0,59
0,94
0,87
0,43
0,25
0,55
0,64
0,8
0,14
0,32
0,12
0,21
0,44
0,4
0,33
0,47
0,42
0,19
0,17
0,1
0,52
0,24
Rtotal**
2,59
2,28
1,5
5,45
1,81
0,95
1,41
1,72
1,34
1,22
1,21
1,37
1,07
1,21
1,14
1,12
1,07
0,94
1,32
1,14
1
0,99
0,76
1,04
1,03
1,62
1,13
1,05
1,28
S.D.
0,68
0,73
0,17
1,55
0,27
0,07
0,1
0,24
0,21
0,38
0,13
0,3
0,13
0,16
0,21
0,13
0,18
0,07
0,24
0,17
0,16
0,1
0,09
0,22
0,06
0,08
0,16
0,1
0,51
Half-life
44,3
38,9
203,1
3
39,4
10000
108
20,8
73,2
287,9
301,6
36,6
10000
118,4
3564,2
10000
10000
10000
20,2
139,1
10000
10000
10000
10000
10000
1,2
88,3
10000
9,9
S.D.
0,51
0,66
0,13
1,61
0,22
0,08
0,09
0,17
0,15
0,22
0,07
0,19
0,08
0,09
0,13
0,07
0,11
0,07
0,15
0,10
0,11
0,08
0,09
0,13
0,05
0,11
0,10
0,07
0,33
ksynanaer
0,123
0,103
0,046
0,433
0,085
0,026
0,044
0,096
0,040
0,023
0,022
0,053
0,014
0,025
0,013
0,012
0,011
0,011
0,065
0,018
0,010
0,010
0,010
0,010
0,009
0,650
0,020
0,009
0,098
S.D.
0,025
0,061
0,012
0,134
0,019
0,003
0,009
0,012
0,010
0,006
0,003
0,004
0,003
0,006
0,004
0,001
0,002
0,001
0,005
0,004
0,002
0,002
0,002
0,002
0,001
0,016
0,001
0,003
0,010
protein no.
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
acession no.
P0AFH8
P0A6H1
P0ABT2
P09169
P0A9X4
P23836
P0A817
P0A905
P0A910
P0C0S1
P06959
P0AEP3
P00509
P0A953
P61714
P0A870
P25665
P69776
P0A825
P0ACF8
P00956
P00562
P0AFG8
P00350
P0ABB4
P0A9P0
P0AEG4
P0AEK4
P0AES4
P0A705
P07813
P23721
P0A8M0
P68919
protein name
OSMY
CLPX
DPS
OMPT
MREB
PHOP
METK
SLYB
OMPA
MSCS
ODP2
GALU
AAT
FABB
RISB
TALB
METE
LPP
GLYA
HNS
SYI
AK2H
ODP1
6PGD
ATPB
DLDH
DSBA
FABI
GYRA
IF2
SYL
SERC
SYN
RL25
Rnew*
1,46
1,42
1,4
1,35
1,3
1,3
1,26
1,12
1,11
1,07
1,04
1,04
1,01
0,97
0,96
0,95
0,95
0,95
0,91
0,9
0,87
0,86
0,84
0,82
0,77
0,77
0,77
0,76
0,76
0,74
0,73
0,72
0,71
0,66
S.D.
0,24
0,37
0,29
0,35
0,47
0,21
0,37
0,23
0,36
0,15
0,39
0,11
0,12
0,26
0,15
0,11
0,3
0,14
0,41
0,21
0,13
0,17
0,23
0,1
0,14
0,12
0,11
0,11
0,07
0,1
0,4
0,1
0,16
0,37
Rtotal**
1,07
0,93
1,38
0,91
0,92
1,03
0,88
0,95
0,99
0,99
1
1,05
1,02
0,99
0,94
0,93
0,6
0,79
1,05
1,08
0,84
0,75
0,98
0,98
1,02
0,97
1,1
0,96
0,93
0,91
1,01
1,12
0,88
0,98
S.D.
0,12
0,15
0,22
0,26
0,14
0,21
0,04
0,13
0,15
0,1
0,14
0,07
0,12
0,07
0,08
0,13
0,04
0,13
0,16
0,26
0,15
0,02
0,16
0,17
0,15
0,16
0,27
0,07
0,1
0,16
0,23
0,23
0,15
0,13
Half-life
10000
10000
1,9
9982,3
10000
10000
10000
9981,2
9930,9
9940,9
1,2
7,8
3,5
3,9
1,2
1,2
1,2
1,2
278,4
10000
1,2
1,2
15,2
17,4
67,4
19,7
10000
17,6
11,6
10,2
58,1
10000
8,8
37,8
S.D.
0,07
0,13
0,21
0,17
0,11
0,12
0,04
0,10
0,11
0,07
0,23
0,06
0,10
0,14
0,10
0,10
0,25
0,15
0,10
0,15
0,11
0,11
0,12
0,11
0,08
0,10
0,15
0,05
0,06
0,10
0,15
0,15
0,11
0,10
ksynanaer
0,008
0,008
0,399
0,008
0,008
0,007
0,007
0,006
0,006
0,006
0,567
0,094
0,198
0,173
0,566
0,565
0,563
0,567
0,007
0,005
0,551
0,545
0,041
0,037
0,012
0,031
0,005
0,034
0,051
0,056
0,012
0,004
0,062
0,016
S.D.
0,001
0,002
0,028
0,002
0,003
0,001
0,002
0,001
0,002
0,001
0,074
0,007
0,012
0,027
0,024
0,018
0,059
0,025
0,003
0,001
0,023
0,033
0,010
0,004
0,002
0,004
0,001
0,004
0,004
0,006
0,007
0,001
0,011
0,009
protein no.
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
acession no.
P0ABB0
P0AFG0
P0AEU7
P13029
P0AE08
P0A8T7
P0A6N1
P30750
P0A6P1
P0AG30
P0ABJ1
P45577
P0A850
P0A7D7
P0A6M8
P0A7V0
P0A8V2
P0AG48
P0A7W1
P05055
P0A6Q3
P62399
P35340
P0AG44
P60438
P0AG67
P0A7X3
P0A7R1
P0AA10
P0AG55
P0A912
P0A7L0
P0A7U7
P02359
protein name
ATPA
NUSG
SKP
KATG
AHPC
RPOC
EFTU
METN
EFTS
RHO
CYOA
PROQ
TIG
PUR7
EFG
RS2
RPOB
RL21
RS5
PNP
FABA
RL5
AHPF
RL17
RL3
RS1
RS9
RL9
RL13
RL6
PAL
RL1
RS20
RS7
Rnew*
0,66
0,66
0,66
0,64
0,64
0,61
0,59
0,58
0,58
0,58
0,55
0,55
0,53
0,53
0,5
0,47
0,45
0,44
0,43
0,43
0,42
0,39
0,37
0,36
0,35
0,35
0,34
0,34
0,32
0,31
0,31
0,31
0,3
0,3
S.D.
0,1
0,04
0,12
0,19
0,46
0,19
0,1
0,07
0,09
0,12
0,05
0,04
0,16
0,09
0,11
0,04
0,07
0,04
0,2
0,11
0,06
0,15
0,16
0,07
0,06
0,05
0,07
0,03
0,06
0,06
0,06
0,09
0,04
0,01
Rtotal**
0,99
0,93
0,95
0,67
0,62
0,97
0,94
0,62
0,94
0,97
1,16
0,81
0,94
0,94
0,97
0,93
0,96
0,9
0,9
0,75
0,93
0,9
0,49
0,83
0,85
1,02
0,92
0,93
0,95
0,94
0,74
0,89
0,83
0,98
S.D.
0,17
0,02
0,08
0,09
0,2
0,19
0,15
0,03
0,11
0,07
0,29
0,08
0,12
0,18
0,15
0,13
0,15
0,26
0,12
0,17
0,1
0,11
0,14
0,11
0,15
0,2
0,18
0,16
0,11
0,08
0,13
0,15
0,17
0,17
Half-life
46,5
18,4
23,7
2,1
1,2
38
26,5
2,2
27,3
42
10000
9,6
31,7
31,7
54,2
33,1
52,6
26,2
26,8
9,5
37,5
29,4
3,3
18,2
21,5
7898,4
40,2
45,2
62,5
55,1
12
31,8
20,8
124,7
S.D.
0,10
0,01
0,05
0,12
0,26
0,12
0,10
0,04
0,06
0,04
0,18
0,05
0,08
0,11
0,08
0,07
0,09
0,15
0,08
0,11
0,06
0,07
0,10
0,07
0,08
0,11
0,11
0,09
0,06
0,05
0,07
0,08
0,10
0,10
ksynanaer
0,013
0,029
0,023
0,268
0,406
0,014
0,019
0,250
0,018
0,013
0,003
0,047
0,015
0,015
0,009
0,013
0,009
0,015
0,014
0,038
0,010
0,012
0,115
0,017
0,014
0,002
0,008
0,007
0,005
0,006
0,023
0,009
0,013
0,003
S.D.
0,002
0,002
0,004
0,037
0,309
0,004
0,003
0,014
0,002
0,003
0,000
0,003
0,004
0,002
0,002
0,001
0,001
0,001
0,006
0,009
0,001
0,004
0,034
0,003
0,002
0,000
0,002
0,001
0,001
0,001
0,004
0,002
0,002
0,000
protein no.
acession no. protein name
Rnew*
S.D.
Rtotal**
S.D.
Half-life
S.D.
ksynanaer
98
P0ABK5
CYSK
0,29
0,02
0,51
0,08
4,8
0,05
0,062
99
P60422
RL2
0,28
0,1
0,86
0,13
26,5
0,08
0,010
100
P0A7R5
RS10
0,26
0,07
0,92
0,17
47,6
0,11
0,005
101
P0A7W7
RS8
0,26
0,01
0,91
0,07
42,6
0,05
0,006
102
P60723
RL4
0,26
0,05
0,86
0,13
27,6
0,08
0,008
103
P0A7J7
RL11
0,21
0,06
0,95
0,27
81
0,16
0,003
* ratio of newly synthesized proteins made during the pulse with azhal, ** ratio of protein level changes following a switch to an anaerobic environment
S.D.
0,003
0,003
0,001
0,000
0,002
0,001
References
1
Lodish, H., Berk, A., Matsudaira, P., Kaiser, C.A., Krieger, M., Scott, M.P., Zipurksy, S.L.,
Darnell, J. , (2004) Molecular Cell Biology
2
Branden, C. and Tooze, J., (1999) Introduction to Protein Structure
3
Murray et al., (2009) Harper's Illustrated Biochemistry
4
Stepanenko, O.V., Verkhusha , V.V., Kuznetsova, I.M., Uversky, V.N., Turoverov, K.K.
(2008)Fluorescent proteins as biomarkers and biosensors: throwing color lights on
molecular and cellular processes
5
Margolin, W. (2000) Green fluorescent protein as a reporter for macromolecular
localization in bacterial cells
6
Görg, A., Weiss, W., Dunn, M.J. (2004) Current two-dimensional electrophoresis
technology for proteomics
7
Conrotto, P., Souchelnytskyi, S. (2008) Proteomic approaches in biological and medical
sciences: principles and applications
8
http://en.wikipedia.org/wiki/Computational_systems_biology, 19/12/2011 14:33
9
Koster, C., Kramer, G., Koning, L., Jong, L., (2010) A mathematical model for the
estimation of protein half-lifes in pulse labeling proteomics experiments.
10
Kramer, G., Sprenger, R., Nessen, M., Roseboom, W., Speijer, D., Jong, L., Teixeira de
Mattos, J., Back, J.,W., Koster, C. (2010) Proteome-wide alterations in Escherichia coli
translation rates upon anaerobiosis.
11
Kramer, G., Sprenger, R., Nessen, M., Jong, L., Back, J., W., Koster, C., Dekker, H.,
Maarseveen, J., Koning, L., Hellingwerf, K., (2009) Identification and Quantitation of Newly
Synthesized Proteins in Escherichia coli by Enrichment of Azidohomoalanine-labeled
Peptides with Diagonal Chromatography.
12
Larrabee K. L., Phillips J. O.,Williams G. J., Larrabee A. R. (1980) The relative rates of
protein synthesis and degradation in growing culture of Escherichia coli
13
Mostellerr, D., Goldstein, R. V., Nishimoto, K. R. (1980). Metabolism of individual
proteins in exponentially growing Escherichia coli.
31
Download