Supplementary Discussion 2: Overview of de convolution

advertisement
Supplementary Discussion 2: Discussion on the fitting
procedure.
The distribution of fluorescence emission intensities of a set of regions of interest
(ROIs, which correspond to labeled sub-cellular components or single antibodies) in an
image shall be denoted  ( x ) , where  ( x ) dx is the probability that a region of interest
(ROI) in the set will exhibit an intensity between x and x+dx. Each of the ROIs can
contain different numbers of fluorescent antibodies, and  ( x ) can be written as a
weighted sum of intensity distributions:
 ( x) 
1
N
M
 Ac c ( x)
(S1)
c 1
where c ( x)dx is the probability that a ROI with c antibodies will exhibit an intensity
emission between x and x+dx; the coefficient Ac is the actual number of ROIs in the set
with c antibodies; N is the total number of ROIs in the data set, and M is the largest
number of antibodies contained by any ROI in the set.
The actual intensity data of a fully labeled sample will be a set of values that can
be converted into a histogram with elements yi , the number of ROIs whose emission
intensity falls into the ith bin. The c ( x) can be expressed as normalized histograms
with elements f c (i ) and these will be referred to as calibration curves.
With these changes we can write
M
yi   Ac f c (i )
c 1
L
(S2)
N   yi
i 1
1
where L is the number of bins in the histograms (binned calibration curves). The
calibration curves can be obtained from the single-antibody intensity distribution, 1 ( x ) .
For the formation of calibration curves, the f c (i ) are obtained by multiplying each
intensity in the set of the measured single-antibody intensities by c and then forming a
normalized histogram from the resulting set of intensity values1,2.
The intensity distribution for fully labeled organelles is modeled by
M
yi   ac f c (i )
c 1
M
(S3)
N    ac
c 1
where the coefficients ac are the adjustable parameters that represent an estimate of the
number of ROIs containing c antibodies, yi is the model’s estimate of yi , and N  is the
estimate of N, the number of ROIs in the set. The absolute value of the coefficients is
used to simplify the fitting procedure.
Noise in the experimentally measured calibration function can cause the best fit
results to depend on the particular choice of bin boundaries.2 The dependence of the best
fit results on the size and location of the histogram bin boundaries can be overcome by
fitting the data using multiple sets of logarithmic bins. Several sets of bin boundaries are
created (as describe below). Basis and data histograms are created for each set of bin
boundaries, and a global fit is performed to where the reduced chi-squrared3 ( 2 )
includes the residuals from all of the sets of histograms for a given set of the adjustable
parameters. This global 2 is minimized to obtain the best fit protein numbers for the
data.
2
To generate a set of logarithmic bins an initial bin width ( w0 ) and repeat length (
N R ) are chosen. The first 2  N R bins (starting from zero intensity) have width w0 . The
next N R bins have width 2  w0 . The next N R bins have width 4  w0 , and so on. For
example, if w0  50 , and N R  16 , then bins 1 through 32 would each have a width of
50, and span the range of intensities from 0 to 1600. Bins 33 through 48 would each have
a width of 100, and span the range from 1601 to 3200. Bins 49 through 64 would each
have a width of 200, and span the range from 3201 to 6400, and so on. This results in
larger bins at the largest intensities, which will reduce the number of empty bins at the
larger intensities, while at the same time keeping the bins at lower intensities narrow
enough to describe the shape of the peak of the intensity distribution.
To create different sets of logarithmic bins N B different values of w0 are chosen
and the boundaries for the N B different histograms are calculated. For each value of w0 ,
histograms of the ROI intensity data and the basis vectors are made using the
corresponding logarithmic bins. Each of the N B different data sets have a different value
of w0 . The point is to ensure that few, if any of the bin boundaries in one data set are the
same as the bin boundaries in any other set. For the work here we set N R  16 , though
other choices will work. An example of seven different w0 which has been used is (40,
43, 46, 49, 52, 55, 58).
For each of the N B values of w0 in a given bin set, the bin boundaries are
calculated and the histograms are formed for both the data and the basis vectors. There
will be N B different data vectors and N B different sets of basis vectors. The number of
3
bins in the Bth histogram is denoted by LB and is set large enough to include all the
histogram bins of both the data and the calibration functions used which are non-zero.
The equation for 2 is
N B  LB y  y 2 
2
iB
iB 
   N  N 

yiB
B 1  i 1

2    
 
2


2
(S4)
 NB

  LB   M
 B 1 
The term with  is added to constrain the fit to have the correct number of spots. Typical
values for  are 103 , 102 , and 101 . The smallest value of  required to constrain the
best fit value of the number of ROI ( N  ) to equal the actual number of spots (N) is used.
A typical value of N B is seven. The fitting program searches for a single set of ac which
provides a global best fit to all N B data sets by minimizing 2 . For our work we used
the minimization function AMEBSA.1,4
References
1.
Mutch, S. A. et al. Deconvolving single-molecule intensity distributions for
quantitative microscopy measurements. Biophys J 92, 2926-43 (2007).
2.
Mutch, S. A. et al. Protein quantification at the single vesicle level reveals a
subset of synaptic vesicle proteins are trafficked with high precision. J Neurosci 31,
1461-1470 (2011).
3.
Bevington, P. R. Data reduction and error analysis for the physical sciences
(McGraw-Hill, New York, 1969).
4
4.
Press, W. H. et al. Numerical recipes in Fortran 77 (Cambridge University Press,
New York, 2001).
5
Download