Database and archive challenges in the WASP project Richard West University of Leicester Overview of WASP • WASP: Wide‐Angle Search for Planets – worlds leading exoplanet transit survey – announced the discovery of 18 exoplanets • Consor@um involves six UK universi@es – QUB, Leicester, St Andrews, Keele, OU, Warwick – collabora@ve links to IAC/ING & Geneva Overview of WASP • Project operates two fully robo@c wide‐field telescopes – La Palma and Sutherland • 8 cameras, Canon 200m f/1.8 lens + Andor 2k x 2k BI CCD • OMI robo@c mount WASP survey • Survey design – cycle through strip of fields at fixed dec, 1h intervals in RA – 30s exp + 30s slew – cadence ~8min • Up to 12000 images per night (~100GB) • Observing since 2004 – – – – 7 million images 30 million stars 2×1011 photometric points (15TB) projected 100‐150TB by survey comple@on WASP data flow WASP‐North La Palma Internet QUB JANET Pipeline processing WASP‐South SAAO DLT Keele RAL ADS JANET JANET Leicester Archival, data‐mining Non‐technical challenges • Resources – WASP was not a “PPARC approved” project – capital funding came from Universi@es – irregular, one‐shot ini@a@ves – no clear avenue for on‐going support – compute h/w has limited shelf‐life • With discoveries came the resources Technical challenges • Data volume – projected total of ~20‐25TB processed data – projected total 3×1011 photometric points • Adopted a hybrid solu@on of FITS files indexed by RDBMS (MySQL) – divide sky into 5×5° @les, one file per @le per night – web service to form l/c’s on‐the‐fly Technical challenges • Evolving user requirements – breaking new ground, understanding of op@mal access pakerns has evolved – ini@ally architected assuming single‐object light‐ curve extract, distributed processing • Later added full field/season extractor, with all high‐level processing at Leicester Technical challenges • Archive is only part of WASP system with a global view – full‐season de‐trending – important role in long‐term data quality assessment – during pipeline development, many delete/re‐ ingest cycles Technical challenges • Whole‐archive data‐mining – transit search, period search computa@onally expensive – divide work between dedicated WASP cluster and departmental HPC cluster – co‐ordinated by shared blackboard system Goals for next‐genera@on survey • Closer integra@on of telescope control, pipeline and archive – improve overall develop‐and‐test turn‐around during pipeline development – pipeline meta‐data, instrument health become first‐class ci@zens in the archive Fin