Introduction to Power Spectrum Estimation Lloyd Knox (UC Davis) CCAPP, 23 June 2010 Goal of Talk Take someone who is starting from zero in power spectrum estimation to where they have some intuition for what the issues are, and they know where to go in the literature to begin estimating power spectra in practice. Outline • • • • Motivating the use of the power spectrum Estimation Under Ideal Conditions Impact of various non-idealities Estimation under non-ideal conditions Power Spectra: Useful for studying statistical properties of statistically homogeneous random fields • Statistical homogeneity: statistical properties of the field are independent of location. • Examples: CMB temperature maps, cosmic shear maps, galaxy number count maps*, … *cosmological evolution actually breaks homogeneity in radial direction, but one can study 2-D slices, or try to correct 3-D map for evolution Power spectrum/spectra Random field(s) QuickTime™ and a decompressor are needed to see this picture. Nolta et al. (2009) Power Spectra Examples Map at 150 GHz + Map at 220 GHz Data plus modeled contributions from four different statistically isotropic components (Hall et al. 2010) Power Spectrum Example Power spectrum Nolta et al. (2009) Random field QuickTime™ and a decompressor are needed to see this picture. T(,) = lmalm Ylm(,) } Cl ll’mm’ = <alma*l’m’> Power spectrum <…> = ensemble average Consequence of statistical homogeneity (isotropy in this case) Nolta et al. (2009) Power Spectrum Interpretation T(,) = lmalm Ylm(,) Cl = <alma*lm> <== Large angular scales small angular scales ==> C() = <T(,) T(’,’) > =l (l+1/2)/(2) Cl Pl(cos()) } 2 = C(0) = l (l+1/2)/(2) Cl = s d(lnl) l(l+1/2)Cl/(2) Contribution to variance from a logarithmic interval in l Why is the power spectrum useful? • For Gaussian homogeneous random fields, it captures all the information not in the mean. • Even for non-Gaussian fields, it can be a highly informative statistic. There will be additional information in other statistics, but the power spectrum is usually a sensible place to start. C() =l (l+1/2)/(2) Cl Pl(cos()) Why Cl Instead of the Correlation Function, C()? • They are linear transformations of each other, carrying the same information. • For Gaussian fields, the covariance structure of power spectrum estimates is much simpler. • For linear perturbation theory, time evolution of a single Fourier mode is simple and decoupled from other modes ==> simple physical interpretation of the power spectrum. • Nonlinearity of evolution, and/or non-Gaussianity, weakens these two advantages. PS Estimation: Simplest Case of Uniform Full-sky Coverage with no noise alm = s d T Ylm alm = alms signal Each alm provides an unbiased estimate of Cl. For each l there are 2l+1 values of m so we can average them all together to get This is both the minimum-variance and ^ 2 Cl = m |alm| /(2l+1) maximum-likelihood estmator. Note that despite no noise, there is uncertainty in the true value of Cl ^ - C )2> = 2/(2l+1)(C )2 <(C l l l PS Estimation: Uniform Full-sky Coverage With Noise alm = s d T Ylm alm = alms + almn signal noise If noise is uncorrelated from pixel to pixel and homogeneous, then <|anlm|2> = w-1 where w is the statistical weight per solid angle, w = (1/2pix)/pix , and this “noise bias” needs to be subtracted from our estmate: ^ Cl = m |alm|2/(2l+1) - w-1 ^ - Cs )2> = 2/(2l+1)(Cs +w-1)2 <(C l l l PS Estimation: Uniform Full-sky Coverage With Noise and Finite Resolution alm = s d T Ylm alm = alms + almn signal noise Convolution of the sky signal with the response function of the telescope, B(,), is a multiplication in the spherical harmonic domain by Bl = s d Yl0 B(,). We need to compensate by dividing the map alm by Bl so that ^ Cl = m |alm|2/(2l+1)Bl-2 -Bl-2w-1 ^ - Cs )2> = 2/(2l+1)(Cs +B -2w-1)2 <(C l l l l WMAP Power Spectrum Errors Few samples per l value; i.e., [2/(2l+1)] factor large Beam-deconvolved noise large <(Cl - Csl)2>1/2 = [2/(2l+1)]1/2(Csl +Bl-2w-1) PS Estimation with Partial Sky Coverage, Finite Resolution and Inhomogeneous Correlated Noise One approach: Optimal methods (ssuming Gaussian random field) P(T | Cl) \propto M-1/2 exp(-Ti M-1ij Tj/2) with Mij = S ij(Cl) + Nij By Bayes’ Theorem P(Cl | T) \propto P(T | Cl) \propto M-1/2 exp(-Ti M-1ij Tj/2) But calculation is computationally intractable for maps greater than tens to hundreds of thousands of pixels Quadratic estimator, likelihood approximations, Gibbs sampling + Blackwell-Rao Estimator (see references at end) PS Estimation with Partial Sky Coverage, Finite Resolution and Inhomogeneous Correlated Noise Another approach: Pseudo-Cl methods Sub-optimal, but good enough and fast Basic idea is to use the simple estimator, and then a combination of analytic and Monte Carlo methods to estimate the offset and gain relating the simple estimator (the pseudo-Cl) and the real Cl. Pseudo-Cl Power spectrum Random field QuickTime™ and a decompressor are needed to see this picture. T(,) = lmalm Ylm(,) ~ alm = s d Ylm(,) [W(,) T(,)] W = mask that’s zero in galactic plane, and smoothly goes to one outside of it Multiplication in real space is convolution in Fourier space alm will have contributions from al’m’ for l’ near l Pseudo-Cl That convolution has an analytically calculable effect on the ensemble average of the pseudo-Cl ~ ~ 2 <Cl> = l’Mll’ Bl’ Cl’ + <Nl> Noise bias Effect of mask Beam Noise bias can be calculated via noise-only Monte-Carlo simulations Estimate Cl by subtracting noise-bias and then deconvolving. Estimate Cl errors by noise + signal Monte-Carlo simulation Eliminating Noise Bias 0 ~ ~ 2 <Cl> = l’Mll’ Bl’ Cl’ + <Nl> Form a~lm from two different maps, each with noise, but noise that is not correlated from one map to the next. Reduces sensitivity to knowing noise level imperfectly. Zoom in on 2 mm map ~ 4 deg2 of actual SPT data In addition to large-scale masks (due to partial sky coverage, or the galaxy) need to mask point sources too! Zoom in on 2 mm map ~ 4 deg2 of actual data All these “large-scale” fluctuations are primary CMB. ~15-sigma SZ cluster detection Lots of bright emissive sources Point-source Masking T(,) = lmalm Ylm(,) ~ alm = s d Ylm(,) [W(,) T(,)] W = mask that’s zero near a point source and smoothly goes to 1 away from point source Multiplication in real space is convolution in Fourier space The resulting transfer of power over large l can cause problems. If mask is over very small area, alm will have contributions from al’m’ for l’ far from l Use fat masks (very simple) or prewhiten your data References Quadratic estimator: Bond, Jaffe & Knox (1998) Approximate likelihoods: Bond, Jaffe & Knox (2000), Verde et al. (2003) Gibbs sampling: Wandelt et al. (2004), Eriksen et al. (2004), Chu et al. (2005) Pseudo-Cl method: Hivon et al. (2002) Point Source Masking and Pre-whitening: Das, Hajian & Spergel (2009) Summary • The power spectrum is a very useful summary statistic for comparing data with theory. • Optimal estimation, assuming Gaussianity, is difficult and for most applications (not all) it is also pointless. • Approximate and fast schemes exist that handle a variety of non-idealities -- in principle, via Monte Carlo, can handle them all. Power Spectra Examples Some of the 9*(9+1)/2 = 45 power spectra Random fields Song & Knox 2003 Nine shear maps: 8 from galaxies in eight photometric redshift bins and one reconstructed from the CMB 1100-1100 0.2-0.2