Weak Lensing Data Analysis Challenges GRavitational lEnsing Accuracy Testing Thomas Kitching Weak lensing: Qualitative dark matter DETF (2006) Foreground galaxies Weak Lensing: Quantitative Matrix distortion of each galaxy image The Promise and the Problem Weak lensing has the potential to become one of the most powerful cosmological probes PS1 Weak Lensing Dark Energy Task Force WFMOS BAO “the method with the greatest potential for constraining dark DES energy” Planck CMB Have a “wall” of data arriving soon (next 5 years and beyond) SNAP SNAIa Pan-STARRS (20,000) DES (5000) 9 galaxies Combined Euclid (20,000), ≈3x10 Small Effects, Big Issues Weak lensing has inherent systematic effect Instrumental, theoretical, astrophysical However these can all be potentially removed Intrinsic alignments Photometric redshift errors Shape measurement systematics The bias in the shape measurement needs to be of order 10-3 For Euclid/LSST-like need Q=10-4/MSE <~ 1000 Current methods have Q ~> 100 Why is shape measurement hard? Galaxies are not circles or ellipses! (complex shapes) Galaxy orientations may align during formation Intrinsic alignments Telescope and atmosphere convolve image point spread function (psf) spatially varying time varying CCD responsivity, cosmic rays, meteors, unresolved sources, variable atmosphere, saturated stars Pixelisation of images (~sum of light over pixel) Partial and patchy sky coverage We don’t have galaxy distances only uncertain redshifts Typical galaxy used for cosmic shear analysis Typical star Used for finding Convolution kernel Gravitational Lensing Galaxies seen through dark matter distribution analogous to light seen through your bathroom window Cosmic Lensing gi~0.2 Real data: gi~0.03 Atmosphere and Telescope Convolution with kernel Real data: Kernel size ~ Galaxy size Pixelisation Sum light in each square Real data: Pixel size ~ Kernel size /2 Noise Mostly Poisson. Some Gaussian and bad pixels. Uncertainty on total light ~ 5 per cent 16/19 History Weak lensing community set a series of blind challenges 2004-2008 Constant shear, PSF unknown, object detection Shear TEsting Programme (STEP) STEP 1 simplistic galaxy shapes (Heymans et al 2005) STEP 2 more realistic galaxies (Massey et al 2006) STEP 3 difficult (space) PSF’s (Rhodes et al 2009) STEP Results Variance with PSF Type Bias But for the future we require 0.03% Heymans et al 2005 → Results on current data are reliable Community Methods Two broad classes Model independent Kaiser Squires & Broadhurst 1995 (KSB) Q ~ 10 – Quadrupole moments of image – Works surprisingly well despite many flaws Model fitting Shapelets Q ~ 10-50 – – – – Refregier, Bacon, Bernstein, Jarvis, Nakajima et al A basis set for galaxies “Quantum Mechanics” inspired basis set Polynomials times a Gaussian (Laguerre polynomials) Lensfit v1.0 (T. Kitching, L. Miller) Q >~120 Fits realistic galaxy shapes (exponential profiles) Bayesian estimator to remove bias Works on individual exposures (PSFs), combines optimally Currently best method that works on individual galaxies More than good enough for PS1-2, DES, KIDs, CFHTS Beyond STEP Pressure on Time Large volume of data is imminent Pressure on Resources Weak lensing community is relatively small There are many unsolved weak lensing problems Must prepare for the data wall Shape measurement There should exist an optimal method(s) to measure shear to the required accuracy Statistical inference and image processing problem (no cosmology or astronomy required) Bring in people from outside weak lensing and astronomy who are “experts” We want to Motivate and excite people about weak lensing and cosmology GREAT08 Gravitational lEnsing Accuracy Testing 2008 Open up the shape measurement problem to The computational learning Statistical inference Wider astronomical community The 2008 PASCAL challenge EU network of computational learning community Set it as a PASCAL challenge Formulate the problem Back to basics - constant shear, PSF Known Be open and transparent about everything No astronomy (!) No jargon PIs : S. Bridle, J. Shaw-Taylor Star images Convolution kernel (as function of position) Galaxy images Shear estimate per galaxy Apply statistic e.g. correlation function Predict statistic from cosmological theory Goodness of fit e.g. χ2 Dark energy, cosmology The challenge Run as a competition October 2008 to April 2009 Results submitted to a live leader board Users downloaded 150 Gb of simulated images Analysed 30 million galaxy images Accuracy and Speed are issues Similar in statistical (not actual) scale to large surveys Achievable accuracy matched to Euclid/LSST Q=1000 possible GREAT08 Each star, galaxy placed ~in centre of a separate image →No overlapping objects →No object detection question Told which are stars →No classification question PSF same for set of images Shear same for set of images Should be enough to Find g to 0.03% ! GREAT08 Results (Bridle et al., 2009 in prep) GREAT08 Results (Bridle et al., 2009 in prep) Present and Future Weak lensing is maturing into potentially the most powerful cosmological probe Present methods are currently good enough and may improve But need much higher accuracy in the future GREAT08 Improvement by a factor of two Q=1000 accuracy achieved in some regimes However winners use of some unrealistic aspects (stacking) Need a roadmap of staged simulated challenges so that we reach required accuracy GREAT++ step 1/2/3 06 g08 g10 10 g12 g14 g16 15 20 Only 4 more challenges to solve problem (2009=half way) GREAT10 (PIs: T. Kitching; CoIs: A. Amara, A. Storkey, S. Bridle) More realistic & more matched to CompSci Power spectrum of shear varying across the field PSF/convolution kernel must be determined Object detection Challenge will be launched End 2009 / Jan 2010 PASCAL2 Challenge Conclusion Weak lensing can be one of the most powerful cosmological probes The shape measurement problem OK for now DES/PS1 NOT solved for Euclid/LSST This is a computational problem GREAT challenges ideally matched to e-Science theme Many other e-Science matched problems in weak lensing Photo-z’s, Spectra Parameter Estimation PetaByte Simulations (ref Andy T talk)