Gaussian profile estimation in two dimensions Nathan Hagen1,* and Eustace L. Dereniak2 1 Fitzpatrick Institute for Photonics, Duke University, Durham, North Carolina 27708, USA 2 College of Optical Sciences, University of Arizona, Tucson, Arizona 85721, USA *Corresponding author: nhagen@optics.arizona.edu Received 31 January 2008; revised 7 June 2008; accepted 20 October 2008; posted 5 November 2008 (Doc. ID 92236); published 15 December 2008 We extend recent results for estimating the parameters of a one-dimensional Gaussian profile to twodimensional profiles, deriving the exact covariance matrix of the estimated parameters. While the exact form is easy to compute, we provide a set of close approximations that allow the covariance to take on a simple analytic form. This not only provides new insight into the behavior of the estimation parameters, but also lays a foundation for clarifying previously published work. We also show how to calculate the parameter variances for the case of truncated sampling, where the profile lies near the edge of the array detector. Finally, we calculate expressions for the bias in the classical formulation of the problem and provide an approach for its removal. This allows us to show how the bias affects the problem of choosing an optimal pixel size for minimizing parameter variances. © 2008 Optical Society of America OCIS codes: 000.5490, 100.2960, 300.3700. 1. Introduction In a previous paper [1], we outlined an approach for using maximum-likelihood estimation (MLE) of onedimensional (1D) Gaussian profile parameters from data corrupted by noise. Here we extend the method to the two-dimensional (2D) case—estimating a 2D Gaussian profile from an image—to incorporate added parameters while retaining a fast algorithm. This estimation procedure can be useful in fields such as Gaussian beam characterization [2], astrometry [3], wavefront sensing [4], bioimaging [5,6], and calibrations of computational sensors, such as computed tomography instruments [7]. The initial approach we use here makes no assumptions on the sampling—uniform or nonuniform—of the profile. Bad samples, such as from inoperative pixels, are easily taken care of by simply deleting them from the data set. The Gaussian function being measured need not even have most of its volume in the sampled region (i.e., its peak may be located off the edge of the detector array), though the nonlinear optimization procedure may have dif0003-6935/08/366842-10$15.00/0 © 2008 Optical Society of America 6842 APPLIED OPTICS / Vol. 47, No. 36 / 20 December 2008 ficulty in locating the global maximum. A good initial guess is required in such a case. The estimation procedure we use is the same as that for the 1D problem, namely: 1. Assuming Gaussian- or Poisson-distributed additive noise, we first form the log-likelihood function ℓ ¼ ln prðgjθÞ and then calculate its gradient (the “score”) ∇ℓ and its Hessian matrix H. 2. These three functions are used in a nonlinear optimization routine (such as Newton iteration) to solve for the parameter set θ, which maximizes the likelihood, via θðkþ1Þ ¼ θðkÞ − ðHðkÞ Þ−1 ∇ℓðkÞ ; where a superscript (k) indicates the iteration index and HðkÞ is the Hessian matrix of ℓ evaluated at θðkÞ . 3. To determine the accuracy of the estimates, we use the Cramér–Rao bound to calculate the covariance matrix K of parameter estimators, obtained from the exact Fisher information matrix F by K ¼ F−1 . 4. Alternatively, an analytic approximation to K can be obtained by first approximating F and then performing an analytic inverse on the resulting simplified matrix. 5. Since each model is actually biased due to the conventional approximation of the pixel response functions as δ functions, we also provide an analytic expression for the bias under the rect-sampling model. Bias correction can then be performed by using the ML estimate to calculate the bias, which is then subtracted from the raw data for a second iteration of the estimation algorithm. For each of the 2D Gaussian models discussed in this paper—the symmetric Gaussian (circular cross section), the asymmetric separable Gaussian (elliptical cross section), and the general case (elliptical cross section unaligned to coordinate axes)—we provide the expressions for the above steps. 2. Symmetric 2D Gaussian Profile Model The simplest type of 2D Gaussian profile is a rotationally symmetric object function, which we model as 2 2 f ðrÞ ¼ Ae−ðr − rÞ =2w ; where A is the peak amplitude, r ¼ ðx; yÞ is the position of the peak (the “center”), and w is the Gaussian width. The object is a continuous function, sampled by the detection process to produce a discrete data vector g, where m ¼ δx δy Qm g Z ∞ −∞ Z ∞ −∞ ¼ δx δy Qm Ae−ðrm −rÞ f ðxÞδðx − xm Þδðy − ym Þdxdy 2 =2w2 : ð1Þ The 2D image data is thus mapped into a 1D vector, where δð·Þ is the Dirac delta function, m is the pixel index (1 ≤ m ≤ M), rm gives the abscissa position vector for the center of pixel m, and δx δy gives the x and y dimensions of the pixels. An example g is illustrated in Fig. 1. In Eq. (1), the quantity Qm is the gain for pixel m, giving the number of digital counts per detected photoelectron. (Note that detector gain is often quoted as photoelectrons per digital count—the inverse of Qm here.) In general, Qm can vary significantly with position on the detector array. We model the measurement result g as the sum of a noiseless discrete data vector and a zero-mean noise vector, g ¼ g þ n, where the noise vector n is a Gaussian-distributed random variable such that 1 2 2 prðgm Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffi e−ðgm −gm Þ =2σ m ; 2 2πσ m ð2Þ where σ 2m is the variance of the noise at pixel m, given in units of digital counts. The likelihood of a given parameter set θ ¼ ðA; x; y; wÞ producing a measured image g is defined as LðθjgÞ ¼ prðgjθÞ: Taking the logarithm, ℓ ¼ log L, it is easy to derive (see Eq. 6 in [1]) that the gradient has the form X 1 ∂ℓ ∂g m Þ m : ¼ ðgm − g 2 ∂θi ∂θi σm ð3Þ So far, we have assumed a Gaussian-distributed noise model. If we rederive the likelihood from a Poisson-noise model, we find that we again obtain Eq. (3), m =Qm. Thus, although we but with σ 2m replaced by g continue to work with a likelihood function derived from a Gaussian-noise model below, it is left understood that the same results can be achieved under m =Qm. Poisson noise by setting σ 2m ¼ g For the symmetric Gaussian profile, the noise model gives ∂ℓ X ¼ γm; ∂A ∂ℓ A X ¼ 2 γ m ηm ; ∂ y w ∂ℓ A X ¼ 2 γ m ξm ; ∂ x w ∂ℓ A X ¼ 3 γ m ρ2m ; ∂w w where we define ξm ¼ ðxm − xÞ; qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ρm ¼ ξ2m þ η2m ; ηm ¼ ðym − yÞ; γm ¼ δx δy m ÞQm Em ; ðgm − g σ 2m Em ¼ e−ðξm þηm Þ=2w : 2 Fig. 1. Example noiseless symmetric 2D Gaussian profile, sampled on a pixel grid. 2 2 All of the sums are taken over the pixel index, 1 ≤ m ≤ M. The Hessian matrix is 20 December 2008 / Vol. 47, No. 36 / APPLIED OPTICS 6843 0 H¼ ∂2 ℓ ∂A2 B ∂2 ℓ B ∂ x ∂A B 2 B ∂ℓ @ ∂ y ∂A ∂2 ℓ ∂w∂A ∂2 ℓ ∂A∂ x ∂2 ℓ 2 ∂ x ∂2 ℓ ∂ y ∂ x ∂2 ℓ ∂w∂ x ∂2 ℓ ∂A∂ y ∂2 ℓ ∂ x ∂ y ∂2 ℓ 2 ∂ y ∂2 ℓ ∂w∂ y 1 duce approximations that simplify the elements of F prior to taking its inverse. These approximations are: ∂2 ℓ ∂A∂w ∂2 ℓ C C ∂ x ∂w C : ∂2 ℓ C ∂ y ∂w A ∂2 ℓ ∂w2 H is symmetric, so it is only necessary to calculate the upper triangular portion. Performing the partial derivative calculations gives, for the matrix elements, X 1 1 X Q2m E2m H 12 ¼ 2 Γ m ξm 2 σm w 1 X 1 X ¼ 2 H 14 ¼ 3 Γm ηm Γm ρ2m w w A X A X ¼ 4 Γm ξ2m − γ m w2 H 23 ¼ 4 Γm ξm ηm w w A X ¼ 5 Γm ξm ρ2m − 2γ m ξm w2 w A X ¼ 4 Γm η2m − γ m w2 w A X ¼ 5 Γm ηm ρ2m − 2γ m ηm w2 w A X ¼ 6 Γm ρ4m − 3γ m ρ2m w2 ; w H 11 ¼ −δ2x δ2y H 13 H 22 H 24 H 33 H 34 H 44 1. Flat noise: σ m ¼ σ, i.e., all pixels share the same noise variance [9]. 2. Uniform gain: Qm ¼ Q. 3. Uniform sampling: The spacing ðδx ; δy Þ between sample locations ðxm ; ym Þ is constant. 4. Complete sampling: The data region xmin ≤ x ≤ xmax , ymin ≤ y ≤ ymax is sufficient such that the Gaussian f ðrÞ is approximately zero outside of the sampled region. 5. The profile is well sampled: Since each of the F ij are sums of sampled Gaussian functions, we can approximate the sums with integrals and then replace the integrals with known analytic results. With these approximations, and noting that δx ¼ δξ; δy ¼ δη , we can write X 1 X −ðξ2m þη2m Þ=w2 δξδη e δx δy Z ∞ Z ∞ 1 πw2 2 2 2 2 ≈ e−ξ =w dξ e−η =w dη ¼ δx δy δx δy −∞ −∞ E2m ¼ and, similarly, 9 P 2 ξm ≈ 0 > > P Em > E 2 ξm η m ≈ 0 > > > P m πw4 = 2 2 Em ξm ≈ 2δx δy P 2 2 πw4 >: Em ρm ≈ δx δy > > > > P 2 4 2πw 6 > ; E ρ ≈ gm Qm Em and γ m is as dewhere Γm ¼ δx δy gmσ−2 2 m fined above. The Fisher information matrix is obtained as the negative of the elementwise expectation of the Hessian matrix [8], with the expectation taken over the data: Z F ij ¼ −EfH ij g ¼ − ∂2 ℓ ℓðθÞ dM g; ∂θi ∂θj ð4Þ m m P ¼ Rm P ¼ wA2 Rm ξm P ¼ wA2 Rm ηm P ¼ wA3 Rm ρ2m P A2 Rm ξ2m ¼w 4 0 F 23 F 24 F 33 F 34 F 44 P 9 A2 R m ξm η m > ¼w 4 > P > A2 Rm ρ2m ξm > ¼w > 5 = P A2 2 R ¼w η 4 m m > P > A2 Rm ρ2m ηm > ¼w > 5 > ; P 2 A 4 ¼ w6 Rm ρm KFlat ≈ 6844 APPLIED OPTICS / Vol. 47, No. 36 / 20 December 2008 2 w2 σ2 B B 0 2B πδx δy Q @ 0 −1 Aw ð5Þ for Rm ¼ δ2x δ2y Q2m E2m =σ 2m. Note that the notation dM g expresses the differential for an M-dimensional integral over all data elements gm . When the Cramér–Rao bound is met (asymptotically for a large number of measurements M), the covariance matrix K is given by the inverse of the above matrix, which can be obtained either analytically by Cramer’s rule or numerically. The analytic expressions for K are quite involved and so provide limited insight into the algorithm. To gain a better understanding of the relationship between the estimator variances and the object parameters, we intro- δx δy Constructing an approximate form of F from these expressions, we can analytically calculate its inverse to give the parameter covariance matrix K: producing F 11 F 12 F 13 F 14 F 22 ð6Þ 0 0 0 2 A2 −1 Aw 0 0 2 A2 0 0 1 C C C: A ð7Þ 1 A2 0 Alternatively, if we assume that the noise variance is not the same at all pixels (i.e., not flat) but rather is determined from the Poisson distribution of the m at each pixel, then we can substitute mean signal g m Qm ¼ δx δy AQ2 Em and, recalculating F, obσ 2m ¼ g tain a modified covariance matrix [10]: 0 KPoisson ≈ 2A 2 Bw 1 B0 B 2π @ 0 −1 2w 0 1 A 0 0 0 0 −1 2w 0 1 4A 1 A 1 C 0C C: 0A ð8Þ The parameter variances can thus be written out explicitly as ^ ≈ varðAÞ ðFlatÞ 2β=w2 Forming the log-likelihood function ℓ, its gradient ∇ℓ, and the Hessian matrix H for this model under additive Gaussian noise, we obtain ; A=ð2πw2 Þ ðPoissonÞ ðFlatÞ 2β=A2 varð^ ; xÞ ¼ varð^yÞ ≈ 1=ð2πAÞ ðPoissonÞ ðFlatÞ β=A2 ^ ≈ ; varðwÞ 1=ð8πAÞ ðPoissonÞ ð9Þ where β ¼ πδ σδ Q2 . x y To calculate the volume U under the Gaussian spot, we first form its estimator. Since Z Z 2 2 2 U ¼ A dξ dη e−ðξ þη Þ=2w ¼ 2πAw2 ; ½∇ℓ1 ¼ ∂ℓ ¼ Σγ m ∂A ½∇ℓ2 ¼ ∂ℓ A ¼ 2 Σγ m ξm ∂ x wx ½∇ℓ3 ¼ ∂ℓ A ¼ Σγ η ∂ y w2y m m ½∇ℓ4 ¼ ∂ℓ A ¼ Σγ ξ2 ∂wx w3x m m ½∇ℓ5 ¼ ∂ℓ A ¼ 3 Σγ m η2m ; ∂wy wy 2 where Em ¼ eξm =2wx e−ηm =2wy , ξm ¼ ðxm − xÞ, δ δ m ÞQm Em . ðym − yÞ, and γ m ¼ σx2 y ðgm − g 2 ∂U ∂U ≈ K T θ ∂θ ∂θ 8π 2 βw2 2πAw2 H 11 ¼ −δ2x δ2y ðFlatÞ : ðPoissonÞ H 13 H 14 H 15 H 22 3. Separable 2D Gaussian Profile Model For a separable Gaussian model function with nonequal widths wx and wy (Fig. 2), we have, using the same nomenclature as Section 3 above, 2 =2w2 x e−ðy−yÞ 2 =2w2 y H 23 H 24 ; H 25 m ¼ δx δy Qm f ðxm ; ym Þ; gm ¼ g m þ nðxm ; ym Þ where g and the vector of parameters is now θ ¼ ðA; x; y; wx ; wy Þ. H 33 H 34 H 35 H 44 H 45 H 55 where Γm ¼ Fig. 2. Example noiseless asymmetric 2D Gaussian profile, sampled on a pixel grid. 2 ηm ¼ X Q2m E2m σ 2m 1 X Γ m ξm w2x 1 X ¼ 2 Γm ηm wy 1 X ¼ 3 Γm ξ2m wx 1 X ¼ 3 Γm η2m wy A X ¼ 4 Γm ξ2m − γ m w2x wx A X ¼ 2 2 Γm ξm ηm wx wy A X ¼ 5 Γm ξ3m − 2γ m ξm w2x wx A X ¼ 2 3 Γm ξm η2m wx wy A X ¼ 4 Γm η2m − γ m w2y wy A X ¼ 3 2 Γm ξ2m ηm wx wy A X ¼ 5 Γm η3m − 2γ m ηm w2y wy A X ¼ 6 Γm ξ4m − 3γ m ξ2m w2x wx A X ¼ 3 3 Γm ξ2m η2m wx wy A X ¼ 6 Γm η4m − 3γ m η2m w2y ; wy H 12 ¼ Note, again, that these simple expressions for the parameter variances hold only when the assumptions of completely, uniformly, and well-sampled data exists. f ðrÞ ¼ Ae−ðx−xÞ 2 m ^w ^ ¼ 2π A ^ 2 . Using the elements of the estimator is U the covariance matrix to estimate the variance of ^ under the flat-noise and Poisson-noise approximaU tions ([8], p. 45–46) ^ ¼ varðUÞ 2 δ2x δ2y Q2m σ 2m δx δy σ 2m m ÞQm Em . Since εfΓm g ¼ ðgm − 2g AE2m and Efγ m g ¼ 0, the Fisher information − matrix F here is 20 December 2008 / Vol. 47, No. 36 / APPLIED OPTICS 6845 P F 11 ¼ Rm P F 12 ¼ wA2 Rm ξm x P F 13 ¼ wA2 Rm ηm y P F 14 ¼ wA3 Rm ξ2m x P F 15 ¼ wA3 Rm η2m y P A2 Rm ξ2m F 22 ¼ w 4 x P 2 F 23 ¼ wA2 w2 Rm ξm ηm x y P A2 Rm ξ3m F 24 ¼ w 5 F 25 ¼ F 33 ¼ F 34 ¼ P A2 Rm ξm η2m w2x w3y P 2 A Rm η2m w4y P A2 R m ξ2 η m w3 w2y x 2 P y A2 P A F 35 ¼ w 5 Rm η3m ; Rm ξ4m P ¼ w3 w3 Rm ξ2m η2m x y P A2 Rm η4m ¼w 6 F 44 ¼ w6 x F 45 F 55 x A2 4. y πwx wy δx δy P 2 2 πw3x wy Em ξm ≈ 2δx δy P 2 2 πwx w3y Em ηm ≈ 2δx δy P 2 2 2 πw3x w3y Em ξm ηm ≈ 4δx δy P 2 4 3πw5x wy Em ξm ≈ 4δx δy P 2 4 3πwx w5y Em ηm ≈ 4δx δy E2m ≈ 9 > > > > > > > > > > > > = > > > > > > > > > > > > ; : ð10Þ KFlat B B 0 B 2 B σ B 0 ≈ 2B πδx δy Q B −1 B Awy @ 0 −1 Awy 2wx A2 wy 0 0 0 2wy A2 wx −1 Awx 3A B wx wy 0 0 0 0 0 KPoisson B 0 B 1 B B 0 ≈ 2π B B −1 B wy @ −1 wx 0 0 −1 wy 0 0 0 0 2wx 3Awy 1 3A 0 wx Awy 0 wy Awx 0 0 −1 Awx 1 C 0 C C C 0 C C; C 0 C A 0 2wx A2 wy 0 where C−1 ¼ 0 2 wx wy 1 m ¼ Aδx δy Qm exp − ðrm − rÞT C−1 ðrm − rÞ g 2 ≡ Aδx δy Em ; (All sums containing odd powers of ξm or ηm are zero.) From these we readily obtain the approximate form of the covariance matrix K for both flat and for Poisson noise: 0 General 2D Gaussian Model A general 2D Gaussian model function (Fig. 3) is with Rm ¼ δ2x δ2y Q2m E2m =σ 2m . This is the exact form of F. Introducing the same approximations as used in Section 3, we obtain the following approximations of the above sums: P Note that the separable (i.e., five-parameter) Gaussian model can also be used for a completely general 2D Gaussian profile (Section 4) when the azimuth angle of the Gaussian’s elliptical cross section is known a priori. The data g used by the estimation algorithm is unmodified in this case, and the x and y abscissa inputs need only be rotated by the appropriate angle prior to estimation. c1 c3 c3 ; c2 and θ ¼ ðA; x; y; c1 ; c2 ; c3 Þ. The matrix elements c1 and c2 are equivalent to the 1=w2y ; 1=w2x , from the separable 2D profile, and c3 represents the “covariance term.” The tilt angle α of the Gaussian profile relative to the coordinate axes is then given by α ¼ pffiffiffiffiffiffiffiffiffi 1 2 arctan½−2c3 c1 c2 =ðc1 − c2 Þ.[11] The score vector ∇ℓ, Hessian matrix H, and Fisher information matrix F for this model are 2wy A2 wx −1 wx 1 C 0 C C C 0 C: C 1 C 3A C A 2w y 3Awx ^ ^x; ^y; w ^ y Þ are ^ x; w The parameter variances for θ^ ¼ ðA; found on the diagonal of the covariance matrix K. The estimator for the energy of the Gaussian is ^w ^ ¼ 2π A ^ xw ^ y and its variance is U ^ ≈ varðUÞ 6846 8πσ2 wx wy δx δy Q2 2πAwx wy ðFlatÞ : ðPoissonÞ APPLIED OPTICS / Vol. 47, No. 36 / 20 December 2008 Fig. 3. Example noiseless general 2D Gaussian profile, sampled on a pixel grid. (Note that the axis of the Gaussian is not aligned to the grid.) ∂ℓ ¼ Σγ m ∂A ∂ℓ ¼ AΣγ m ðc1 ξm þ c3 ηm Þ ½∇ℓ2 ¼ ∂ x ∂ℓ ¼ AΣγ m ðc2 ηm þ c3 ξm Þ ½∇ℓ3 ¼ ∂ y for γ m ¼ H 13 ¼ ∂ℓ A ¼ Σγ m ξ2m ∂c1 2 ½∇ℓ5 ¼ ∂ℓ A ¼ − Σγ m η2m ∂c2 2 ½∇ℓ6 ¼ ∂ℓ ¼ −AΣγ m ξm ηm ; ∂c3 F 24 ¼ − δx δy σ 2m X X H 23 H 24 H 25 H 26 H 33 H 34 H 35 H 36 H 44 H 46 H 56 F 25 m ÞQm Em, and ðgm − g F 26 F 33 X Q2m E2m σ 2m Γm ðc1 ξm þ c3 ηm Þ Γm ðc2 ηm þ c3 ξm Þ 1X Γm ξ2m 2 X ¼− Γm ξm ηm H 14 ¼ − 1X H 16 Γm η2m 2 X ¼A Γm ðc1 ξm þ c3 ηm Þ2 − c1 γ m X ¼A Γm ðc1 ξm þ c3 ηm Þðc2 ηm þ c3 ξm Þ − c3 γ m 1 X Γm ðc1 ξm þ c3 ηm Þξ2m − 2γ m ξm ¼− A 2 1 X Γm ðc1 ξm þ c3 ηm Þη2m ¼− A 2 X ¼ −A Γm ðc1 ξm þ c3 ηm Þξm ηm − γ m ηm X ¼A Γm ðc2 ηm þ c3 ξm Þ2 − c2 γ m 1 X Γm ðc2 ηm þ c3 ξm Þξ2m ¼− A 2 1 X Γm ðc2 ηm þ c3 ξm Þη2m − 2γ m ηm ¼− A 2 X ¼ −A Γm ðc2 ηm þ c3 ξm Þξm ηm − γ m ξm AX AX ¼ H 45 ¼ Γm ξ4m Γm ξ2m η2m 4 4 AX AX ¼ H 55 ¼ Γm ξ3m ηm Γm η4m 2 4 X AX ¼ H 66 ¼ A Γm ξ2m η2m ; Γm ξm η3m 2 where Γm ¼ δx δy σ 2m m ÞQm Em . ðgm − 2g Once again, EfΓm g ¼ − so that δ2x δ2y σ 2m A2 X Rm ðc1 ξm þ c3 ηm Þξ2m 2 A2 X ¼− Rm ðc1 ξm þ c3 ηm Þη2m 2 X ¼ −A2 Rm ðc1 ξm þ c3 ηm Þξm ηm X ¼ A2 Rm ðc2 ηm þ c3 ξm Þ2 A2 X Rm ðc2 ηm þ c3 ξm Þξ2m 2 A2 X Rm ðc2 ηm þ c3 ξm Þη2m ¼− 2 X ¼ −A2 Rm ðc2 ηm þ c3 ξm Þξm ηm F 34 ¼ − H 15 ¼ − H 22 X ½∇ℓ4 ¼ H 11 ¼ −δ2x δ2y H 12 ¼ F 11 ¼ Rm X F 12 ¼ A Rm ðc1 ξm þ c3 ηm Þ X Rm ðc2 ηm þ c3 ξm Þ F 13 ¼ A X A AX F 14 ¼ − F 15 ¼ − Rm ξ2m Rm η2m 2 2 X F 16 ¼ −A Rm ξm ηm X F 22 ¼ A2 Rm ðc1 ξm þ c3 ηm Þ2 X F 23 ¼ A2 Rm ðc1 ξm þ c3 ηm Þðc2 ηm þ c3 ξm Þ ½∇ℓ1 ¼ AQ2m E2m and εfγ m g ¼ 0, F 35 F 36 A2 X Rm ξ4m 4 A2 X Rm ξ3m ηm ¼ 2 A2 X ¼ Rm ξm η3m 2 A2 X Rm ξ2m η2m 4 A2 X Rm η4m ¼ 4 X ¼ A2 Rm ξ2m η2m ; F 44 ¼ F 45 ¼ F 46 F 55 F 56 F 66 where, again, Rm ¼ δ2x δ2y Q2m E2m =σ 2m . We can again use integral expressions to approximate F to give analytic expressions for K ¼ F−1 under flat or Poisson noise: 0 □ B 0 B B 0 2σ 2 B B□ KF ≈ πδx δy Q2 B B A B□ @ A 0 2A B 0 B B 0 μ B B KP ≈ B 2π B c1 B Bc @ 2 c3 0 0 □ A □ A □ A2 □ A2 □ A2 □ A2 □ A2 □ A2 c2 A2 μ c3 A2 μ c3 A2 μ c1 A2 μ □ A 0 0 0 0 0 0 0 0 c1 0 0 c2 0 0 2c21 A 2c23 A 2c1 c3 A 2c23 A 2c22 A 2c2 c3 A c2 Aμ2 c3 Aμ2 c3 Aμ2 c1 Aμ2 0 0 0 0 0 0 0 0 0 0 □ A 1 0C C 0C C C □ C; 2 A C □C A A2 □ A2 1 c3 0 0 C C C C C 2c1 c3 C C; A C 2c2 c3 C C A A c c þc2 1 2 A 3 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where μ ¼ c1 c2 − c23 . Each instance of the symbol “□” in the equation above represents a term that 20 December 2008 / Vol. 47, No. 36 / APPLIED OPTICS 6847 has no compact analytic expression but that depends solely on the matrix elements c1 , c2 , and ^ (the estimate of voc3 . Calculating the variance of U lume under the Gaussian profile), using the same procedure as in Sections 2 and 3, gives ^ ≈ varðUÞ □ δ δσ Q2 x y □A 2 ðFlatÞ : ðPoissonÞ ^ to be independent of the Thus, we again find varðUÞ input parameters x and y, and for the case of flat noise, the variance is also independent of the profile height A. 6. Bias in the Standard Model As mentioned in the previous paper [1], the model used here is biased, in the sense that use of an overly simple model for the measurement gm produces a biased estimator of the true Gaussian: ^f unbiased ðx; yÞ ¼ ^f biased ðx; yÞ þ bðx; yÞ: If we model the pixel response with rectanglefunctions over which the signal is integrated, then Z m ¼ Qm g ∞ Z ∞ f ðx; yÞ −∞ x − xm y − ym rect dxdy; × rect δx δy −∞ ð11Þ 5. Truncated Sampling Occasionally, it is necessary to estimate a Gaussian profile that has been truncated by lying at the edge of the sampled region (see Fig. 4) and we can expect that the estimation accuracy will fall when the profile has been asymmetrically sampled in this way. The numerical evaluation of the exact Fisher information matrix terms allows us to obtain the CramérRao bound for this specific situation, as well. A simulation illustrating the behavior of the estimation parameters as the Gaussian profile sampling is increasingly truncated is shown in Fig. 5. The simulation uses a symmetric 2D profile whose center ð x; yÞ is at first well inside the sampling region (such that the “truncation parameter” t ¼ 0), then approaches and goes beyond the vertical edge of the sampling region (t ¼ 0:6, i.e., 60% of the volume under the profile is unsampled). For the example shown, the profile is sampled at a rate of 1=7 the Gaussian width (that is, an 8 × 8 pixel region contains ∼68% of the volume under the profile). The results show that the variance of the profile peak estimate is little affected until the truncation becomes quite large. And, along the truncation direction, the profile center estimate suffers much more than does the accuracy orthogonal to the truncation direction. where ðδx ; δy Þ is the physical dimension of the pixel and the rectangle function is defined as rectðx=LÞ ¼ 1 ∶ jxj < L=2 : 0 ∶ jxj > L=2 The bias results from using a δ-sampling model, m ¼ f ðxm ; ym Þ, whereas the detection process i.e., g is better approximated as an integral over the region of the pixel’s response, i.e., Eq. (11). If we take a Taylor expansion of f ðx; yÞ about the pixel’s center ðxm ; ym Þ, we find that the two models coincide up to the linear term of the Taylor series. Thus, we can approximate the bias by integrating the second-order term across the pixel’s response. The Taylor expansion of a scalar-valued function of two variables has the form ∂f f ðx; yÞ ≈ f ðxm ; ym Þ þ ðx − xm Þ ∂x þ ðy − ym Þ ∂f ∂y 0 x ¼ xm y ¼ ym x ¼ xm y ¼ ym 2 1B ∂ f þ @ðx − xm Þ2 xm 2 ∂x2 xy ¼ ¼y m ∂ f þ 2ðx − xm Þðy − ym Þ ∂x∂y x ¼ xm þ ðy − ym Þ2 ∂2 f ∂y2 2 1 x ¼ xm y ¼ ym A: y ¼ ym ð12Þ Therefore, the bias is given by Z Fig. 4. Truncated sampling of a Gaussian profile, showing a case in which 60% of the volume under the profile lies outside of the region falling on the array detector. 6848 APPLIED OPTICS / Vol. 47, No. 36 / 20 December 2008 bðrm Þ ≈ xm þ12δx xm −12δx Z ym þ12δy ym −12δy ½second-order termdxdy: ð13Þ Fig. 5. The change in variance of estimation parameters as the sampling of the profile is increasingly truncated. The truncation parameter t is defined simply as the fraction of volume under the Gaussian profile which lies outside of the sampling region. In the figures, lines indicate the parameter variances as calculated numerically from the exact Fisher information matrix. (The analytic approximations are not valid for the case of a truncated profile.) Dots indicate variance estimates from a Monte Carlo simulation. The variances have been normalized to the completely sampled data’s parameter variances. In both figures, the value of x is allowed to vary while all other model parameters are held constant. 1 2 The integral over the ðx − xm Þ2 term gives 12 δx ; like1 2 2 wise, the integral over ðy − ym Þ gives 12 δy , while the cross term integrates to zero. Substituting these into Eq. (13) gives the equation of the bias as bðrm Þ ≈ AQm Eðrm Þ ðxm − xÞ2 − 1 δ2x 24w2 w2 ðym − yÞ2 þ − 1 δ2y : w2 Following the same procedure for the five-parameter Gaussian model (the separable 2D profile) gives an expression for the bias of 2 AQm Eðrm Þ ðxm − xÞ2 δ bðrm Þ ≈ − 1 x2 2 24 wx wx 2 2 δy ðym − yÞ þ −1 2 : w2y wy ing the variance of the position estimator, x^. In this situation, where we allow the detector pixel size to vary while the system magnification is kept constant, we need to be careful to enforce a constant energy U, via the relations 2πAw2 ð4 paramsÞ U ¼ 2πAwx wy ð5 paramsÞ ; 2πA=μ ð6 paramsÞ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where, again, μ ¼ c1 c2 − c23 . (The volume U is equal to the number of photoelectrons collected on the detector array.) The expressions for the position estimate variances become ð4 paramsÞ : varð^xÞ ≈ ð5 paramsÞ : varð^xÞ ≈ And, for the six-parameter model, AQm Eðrm Þ bðrm Þ ≈ ½ð½c1 ðx − xÞ þ c3 ðy − yÞ2 − c1 Þδ2x 24 þ ð½c2 ðy − yÞ þ c3 ðx − xÞ2 − c2 Þδ2y : Knowing the value of the bias provides a means of further improving the estimation, by applying a correction to the input data and iterating the estimation until the bias becomes negligible. Note that the bias here is expressed in units of digital counts, not photoelectrons. 7. Optimal Pixel Size As in the previous paper, for the 2D case, we can compare previous research with the results obtained here to evaluate whether there is an optimal magnification for the Gaussian profile or, equivalently, an optimal detector pixel size. Reference [12] reports an optimal choice of δx ¼ δy ¼ 1:5 w ∼ 2:5 w for minimiz- ð6 paramsÞ : varð^xÞ ≈ 8πσ 2 w4 δx δy Q2 U 2 4w2 U ðFlatÞ 8πσ 2 w3x wy δx δy Q2 U 2 4w2x ðFlatÞ U 2πσ 2 c2 δx δy μ3 Q2 U 2 2 c1 U ðPoissonÞ ð14Þ ðPoissonÞ ðFlatÞ ðPoissonÞ : (The variances for ^y can easily be obtained from the above by symmetry arguments.) These results indicate that, for Poisson noise, varð^xÞ is independent of detector pixel size, whereas the flat-noise model shows that the variance decreases with larger pixel sizes. However, if the bias has not been corrected for within the algorithm, then we need to account for its effect on the measurement, as well. Figure 6 shows a comparison of the squared error in ^x due to bias with varð^xÞ. (Note that the bias is a function of x relative to the sample locations xm ; if, for example, x coincides with a sample location, then the bias has no effect on the error in estimating x.) For w ≲ 0:7δx, Fig. 6 indicates that the estimation error is dominated by bias effects. For w ≥ 0:7δx, the 20 December 2008 / Vol. 47, No. 36 / APPLIED OPTICS 6849 F ij ≈ ^ ^ Þ and bias2 ðx Þ, given in units of Fig. 6. A comparison of varðx ðpixelsÞ2 , for an example symmetric Gaussian profile as a function of the profile width w. Keeping a constant value for U, we increase the profile width (and thus reduce the profile height A) while maintaining the same sampling rate. The approximate variance is calculated by Eq. (9) and the exact variance by taking the inverse of Eq. (5). The two curves shown for the variance are for two different values of U. error is predominantly due to estimator variance. This behavior would suggest that an optimal pixel size should be δx ≤ 1:43w. Note that although the exact location of the optimal tradeoff will vary with each parameter set, the bias to the position estimator ^ x is relatively insensitive to the parameters A and y. 8. Discussion Real-life detector array data has noise properties that are neither purely Gaussian nor Poisson but are well approximated by a combined function in which the electronic noise is given by a Gaussian distribution and the photoelectron noise by a Poisson distribution (see [13], Eq. 3.16). Under this combined distribution, the Fisher information matrix is approximately (Eq. 3.17 of [13]) M X σ2 m¼1 G m ∂g m 1 ∂g ; m Qm ∂θi ∂θj þg where σ 2G describes the variance of Gaussian-distributed readout noise (with all pixels sharing the same m Qm gives the variance of the Poisvariance) and g son-distributed shot noise. To make use of this Gaussian–Poisson mixed model in our system, we need m Qm Þ for σ 2m in the exact expresonly insert ðσ 2G þ g sions for F (i.e., Eq. (5) and the corresponding equations for the five- and six-parameter models). Unfortunately, there does not appear to be a way of deriving the corresponding approximate analytic expressions for this noise model. One difficulty that can thwart accurate estimation is the failure of the optimization routine to locate a global maximum to the likelihood function. For wellsampled Gaussian profile models, the likelihood function is typically smooth and well behaved (an example is shown in Fig. 7(a)]. For difficult models, in which the width parameter w is less than a pixel width, multiple local maxima develop in the likelihood function [see Fig. 7(b)]. For such models, local optimization techniques, such as Newton iteration, become unreliable and global optimization methods, though slow, become important. Finally, Gaussian profile estimation is often used in systems where other optical effects are present, such as a slowly varying background signal (due, for example, to stray light, scatter, or to background signals). Ignoring these effects would skew the estimation and, yet, they are not of interest to the observer and, thus, are termed nuisance parameters. Ideally, the estimation algorithm should incorporate the presence of these parameters by marginalizing the likelihood over the conditional probability density of the set of nuisance parameters. If this conditional density is unavailable, then an alternative approach is to estimate the nuisance parameters and subtract their effect from the data prior to use Fig. 7. Contour maps showing slices through the 4D likelihood function for (a) a well-behaved model and (b) a poorly behaved one. The parameters are ðA; x; y; wÞ ¼ ð100; 0:25; 0; 3Þ and (10,0.25,0,0.5) respectively, where x, y, and w are given in units of pixel widths. 6850 APPLIED OPTICS / Vol. 47, No. 36 / 20 December 2008 in the Gaussian profile estimation algorithm. This second approach would allow use of the algorithm presented in this paper. The authors have posted a set of programs, written in the IDL [14] language, that perform the estimation algorithm discussed in this paper. These are located at the url http://www.ittvis.com/codebank/ index.asp. (The file is gauss_mle2.pro under the heading “Statistics”.) Interested readers are encouraged to download, distribute, and use the code freely. This work was supported in part by Department of Defense (DOD) contract DAAE07-02-C-L011. We would like to thank an anonymous reviewer for a very careful reading and valuable comments. References and Notes 1. N. Hagen, M. Kupinski, and E. L. Dereniak, “Gaussian profile estimation in one dimension,” Appl. Opt. 46, 5374–5383 (2007). 2. L. G. Kazovsky, “Beam position estimation by means of detector arrays,” Opt. Quantum Electron. 13, 201–208 (1981). 3. L. H. Auer and W. F. van Altena, “Digital image centering II,” Astron. J. 83, 531–537 (1978). 4. R. Irwan and R. G. Lane, “Analysis of optimal centroid estimation applied to Shack–Hartmann sensing,” Appl. Opt. 38, 6737–6743 (1999). 5. M. K. Cheezum, W. F. Walker, and W. H. Guilford, “Quantitative comparison of algorithms for tracking single fluorescent particles,” Biophys. J. 81, 2378–2388 (2001). 6. R. E. Thompson, D. R. Larson, and W. W. Webb, “Precise nanometer localization analysis for individual fluorescent probes,” Biophys. J. 82, 2775–2783 (2002). 7. Y.-C. Chen, L. R. Furenlid, D. W. Wilson, and H. H. Barrett, “Calibration of scintillation cameras and pinhole SPECT imaging systems,” in Small-Animal SPECT Imaging, M. A. Kupinski and H. H. Barrett, eds. (Springer, 2005), Chap. 12, pp. 195–202. 8. S. V. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory (Prentice Hall, 1993), p. 34. 9. What we call “flat noise” here was labeled “uniform noise” in [1]. This has been changed in recognition that this can easily be mistaken as referring to noise obtained from the uniform probability distribution. m =Qm is the appropriate substitution 10. It may seem that σ 2m ¼ g here, but since the raw measurements gm are scaled versions of the Poisson-distributed photoelectrons, the variance increases as the square of the scale parameter, Qm . 11. S. Brandt, Data Analysis, 3rd ed., (Springer 1999), p. 113–114. 12. K. A. Winick, “Cramér–Rao lower bounds on the performance of charge-coupled-device optical position estimators,” J. Opt. Soc. Am. A 3, 1809–1815 (1986). 13. H. H. Barrett, C. Dainty, and D. Lara, “Maximum-likelihood methods in wavefront sensing: stochastic models and likelihood functions,” J. Opt. Soc. Am. A 24, 391–414 (2007). 14. The Interactive Data Language software is developed by ITT Visual Information Systems, http://www.ittvis.com/ index.asp. 20 December 2008 / Vol. 47, No. 36 / APPLIED OPTICS 6851